How to read a text/html file in java regardless of its encoding?

Hi All,
How to read a text/html file in java regardless of its encoding?
1. Is there any way to identify that a file (read using FileInputStream/or any other means with java.io package) has been saved with which type of encoding i.e. whether the file is using ANSI encoding or Unicode encoding or other?
2. Is there any standard way to read an encoded file (i.e. files having UTF-16 format for Asian locales character support) and un-encoded file (i.e. files having ordinary ANSI format) correctly without knowing the user input?
The problem is that while creating an instance of 'InputStreamReader' (ISR) we can pass the encoding type used (otherwise it takes the system's default encoding type), and the ISR expects the file to be in the same encoding format otherwise it reads it as some junk. But we don't know which file the user is going to pass whether it is Unicode (for Asian locales file should be in Unicode) with or ANSI coded (for non-Asian / English locales user generally uses ANSI encoding).
Regards,
Sam

1. There is no reliable way of guessing the encoding of a file without that information. Thats why XML for example has very strict rules wrt. it's encoding (short form: use UTF-8 or UTF-16, if you use anything else, you'll have to specify it)
2. you might be able to make an educated guess if the possible range of encodings is limited, but it will probably never be 100% certain
3. The HTML file might have a header entry "<meta http-equiv..." that tells you about it's encoding. You could try to read the start of the file and see if you find that, then if you found it re-read the entire file.
regards

Similar Messages

  • How to read a text delimited file using 2 dimentional array in java ??

    hi,
    I am new to java programming.. I have to do a task where in i have to read a text delimeted file in an array.. For example.. If the file is as follows
    Name place Value
    adi goa 20
    shri mumbai 30
    riya bangalr 45
    I want it to be read in java so as to get an array[row][columns]
    This is something i am currently upto, but cant get any further.
    import java.io.BufferedReader;
    import java.io.FileReader;
    public class generateGML{
    public static void main(String[] argv)
    throws Exception{
    BufferedReader fh = new BufferedReader(new FileReader("filename.txt"));
    String s;
    while ((s=fh.readLine())!=null){
    String[] columns = s.split("\t");
    String name = columns[0];
    String place = columns[1];
    String value = columns[2];
    It reads columns,But I want it two dimentionally,as in something like matrix[row_num][column_num].
    Can anyone please suggest me..

    You could do the following:
    String[][] array = new String[rows][];
    int row_num = 0;
    while ((s=fh.readLine())!=null) {
       array[row_num++] = s.split("\t");
    }However, you need to know ahead of time how many rows to allocate. If you allocate more than needed, you'll need to copy to a new array, or you'll need to keep track of how much is actually populated. If you allocate less than needed, you'll get an ArrayIndexOutOfBoundsException.
    Another (likely better) approach is:
    Do you really need it as a 2-dimensional array? Can you make a List of objects that have a name, place, and value? Then you don't need to know how big of a list to allocate ahead of time, assuming you use a list that grows itself (like ArrayList or LinkedList). Your code would be much easier to read if you could say:
    String name = list.get(10).getName();instead of
    String name = array[10][0];

  • How to reduce size of html files with JAVA?

    We have html files full of tab char, carriage return, blank space between tags etc. We need to reduce the size of this files.
    HTML files are automatic generated by an engine and we cannot operate on it.
    Those files are in a solaris environment and we need to launch or to schedule something that can clean the files in this environment. The only tools we found are for Win environment so we toughth to make some java classes that parse HTML and clean the files.
    Does anyone know how some tool or the way to clean a file in java?
    Thank You

    Something like this can reduce the number of spaces between tags in the body of the file:public static final String readTextFromFile (File f)
            StringBuffer fileText = new StringBuffer();
            if (f != null && f.exists() && f.isFile())
                try
                    FileReader fr = new FileReader(f);
                    BufferedReader br = new BufferedReader(fr);
                    String s;
                    char c;
                    boolean inTag = false;
                    boolean lastWasSpace = false; // so we don't have a million spaces in a row
                    boolean inBody = false;
                    while ((s = br.readLine()) != null)
                        s += " ";
                        s = searchReplace(s, " ", " ");
                        if (!inBody)
                            int bodyStartPos = s.indexOf("<body");
                            // if not in body yet, reloop
                            if (bodyStartPos == -1)
                                continue;
                            // start it off
                            else
                                inBody = true;
                                s = s.substring(bodyStartPos);
                        for (int i = 0; i < s.length(); i++)
                            c = s.charAt(i);
                            if (c == '>')
                                inTag = false;
                            else if (c == '<')
                                inTag = true;
                            else if (!inTag)
                                if (!(c == ' ' && lastWasSpace))
                                    fileText.append(c);
                                if (c == ' ')
                                    lastWasSpace = true;
                                else
                                    lastWasSpace = false;
                    if (br != null)
                        br.close();
                    if (fr != null)
                        fr.close();
                catch (Exception e)
                    System.err.println(f + ": Error reading file");
            return fileText.toString();
        }

  • How to Read and Generate XML file from java code.

    hi guys,
    how to read the xml file (Condition :we know only DTD or Shema only).
    How to Generate the new xml file ?(using Shema )
    And one more how directly Generate the xml from DB?
    Pleas with code or any URL

    Using XMLbeans you can generate Java objects from an XSD schema (perhaps DTDs aswell)
    Then you can create an instance of the Document object and ask it to write itself.
    This will create an XML document complient to the schema.
    XMLBeans generates a "type" safe DOM where you can only ever have a structure compilent to you schema.
    matfud

  • How to Read excel or .csv files in java

    I am writing a program which takes input as excel or .csv file.
    How to read these files.
    Any API's are existed or need to use the third party jar.
    Please suggest me.
    Thanks & Regards

    Did you search in google? Did you search here? There are so many excel related questions here, including answers about third party libraries.
    I have the impression that you didn't research at all.
    _[How to ask questions|http://faq.javaranch.com/view?HowToAskQuestionsOnJavaRanch]_ It's the same here.

  • How to read several text files at a time

    Dear all
          Read and write one text file is not a problem, but  what confusies me is how to read several text files at one time, in the meanwhile,
    is it possible to display the name of the text file?
    For example, assuming I want to load file" cha 1, cha 2 , cha 3, " at one time and show their names, how to hadle with it
    I have reviewed some files and it is not helpful

    Either with a 'for' loop like in the lib you have attached, or like this attached VI
    that's it
    Message Edited by devchander on 05-30-2006 05:11 AM
    Attachments:
    MULTIPLE READ.vi ‏44 KB

  • How to Read the "text file and csv file" through powershell Scripts

    Hi All
    i need to add a multiple users in a particular Group through powershell Script how to read the text and CSV files in powershell
    am completly new to Powershell scripts any one pls respond ASAP.with step by step process pls
    Regards:
    Rajeshreddy.k

    Hi Rajeshreddy.k,
    To add multiple users to one group, I wouldn't use a .csv file since the only value you need from a list is the users to be added.
    To start create a list of users that should be added to the group, import this list in a variable called $users, the group distinguishedName in a variable called $Group and simply call the ActiveDirectory cmdlet Add-GroupMember.
    $Users = Get-Content -Path 'C:\ListOfUsernames.txt'
    $Group = 'CN=MyGroup,OU=MyOrg,DC=domain,DC=lcl'
    Add-ADGroupMember -Identity $Group -Members $Users

  • How to read a text file using Java

    Guys,
    Good day!
    Please help me how to read a text file using Java and create/convert that text file into XML.
    Thanks and God Bless.
    Regards,
    I-Talk

         public void fileRead(){
                 File aFile =new File("myFile.txt");
             BufferedReader input = null;
             try {
               input = new BufferedReader( new FileReader(aFile) );
               String line = null;
               while (( line = input.readLine()) != null){
             catch (FileNotFoundException ex) {
               ex.printStackTrace();
             catch (IOException ex){
               ex.printStackTrace();
         }This code is to read a text file. But there is no such thing that will convert your text file to xml file. You have to have a defined XML format. Then you can read your data from text files and insert them inside your xml text. Or you may like to read xml tags from text files and insert your own data. The file format of .txt and .xml is far too different.
    cheers
    Mohammed Jubaer Arif.

  • How to read a text file through pl/sql

    How to read a text file through pl/sql

    pl/sql runs inside the database. so your file also should be on the database server file system for you to be able to read.
    check out UTL_FILE package. This is the database package to read/write files on the database server.

  • How to read in text string pairs from an external txt file?

    I have an external txt file containing 26000 pairs of strings in the format
    string1a   string1b
    string2a   string2b
    string26000a string26000b
    all strings are always 4 characters in length.  For example,
    a123   jkdh
    b456   uusp
    How can use TestStand 2013 to input this data into local string array varaibles?  I also have legacy testers running TestStand 3.5, so I need a TestStand 3.5 solution as well.  Thanks in advance for any ideas.  Hopefully this can be done in TestStand without the use of LabView or LabWindows/CVI.

    Daniel E., thanks for the reply. 
         It is very frustrating to have to implement workarounds in order to access text in an external file when using TestStand.  TestStand already reads in text ini files, so it would probably take a TestStand software developer all of 1/2 day to write the code and 1/2 day to debug it.  Here I will restate the obvious.  Engineers have to deal with data, sometimes data generated from other applications.  Sharing data in a text file is one of the most basic functions (and one of the easiest to implement).  When I think of all the effort needed to support ActiveX, but NI did not see it fit to give TestStand users simple text file read/write functionality, it does not make sense to me.  Maybe it was too simple or mundane a task so it did not get developed?  Whatever the reason, I think users of TestStand deserve this basic functionality.
         Even if TestStand cannot read the data directly into array variables, it should provide some mechanism to read the text from the file, either line by line or all file text into a string, so the data could then be parsed into the array variables I need.
         I will pursue a CVI or LabView solution.  Please also know that I am not directing these comments at you, I do appreciate your reply.
    Regards,
    Ron

  • How to read HyperLinks from pdf file??

    hi developer's,
    I am in PDF processing... I am having doubt in that Processing.
    How to read Hyperlinks from PDF file?
    I can able to set the hyperlink.. But i cant able to get the hyperlinks..
    The following example program will set the hyperlink to the PDF file using lowagie API..
    import com.lowagie.text.Anchor;
    import com.lowagie.text.Chunk;
    import com.lowagie.text.Document;
    import com.lowagie.text.DocumentException;
    import com.lowagie.text.Paragraph;
    import com.lowagie.text.html.HtmlWriter;
    import com.lowagie.text.pdf.PdfReader;
    import com.lowagie.text.pdf.PdfWriter;
    public class Argu1 {
         public static void main(String[] args) {
              Document document = new Document();
              try {
                   PdfWriter pdf = PdfWriter.getInstance(document,
                             new FileOutputStream("PageLink.pdf"));
    PdfReader pdf_read=new                
                   document.open();
                   document.add(new Paragraph("Hi Everbody....!"));
                   Anchor pdfRef = new Anchor("Click Me");
                   pdfRef.setReference("www.java2s.com");
                   Anchor rtfRef = new Anchor("Touch Me");
                   rtfRef.setReference("www.sun.com");
                   System.out.println(rtfRef.reference());
                   document.add(pdfRef);
                   document.add(Chunk.NEWLINE);
                   document.add(rtfRef);
              } catch (DocumentException de) {
                   System.err.println(de.getMessage());
              } catch (IOException ioe) {
                   System.err.println(ioe.getMessage());
              document.close();
    Help me how to read the Hyperlinks from the PDF file using java ...
    Thanks in advance,
    With Regards,
    J.Imran

    Instead of cross-posting unformatted code you could have taken a look at the API, because there you might have come across a method named getLinks...Even though it's not documented, I really suspect that it will return the Hyperlinks on a given page.

  • How to read data from a file that was formatted by excel?

    Hi everyone, I'm familiar with java.io and the ability to read from files, can anyone tell me how to read data from a file that was formatted by excel? Or at least give me some web references so that I can learn about it?

    http://jakarta.apache.org/poi/hssf/index.html
    HSSF stands for Horrible Spreadsheet Format, but it still works!

  • How do I Open an HTML file for iOS?

    How do I open an HTML file for iOS using acrobat?

    This is a forum for Adobe Reader for iOS. Acrobat doesn't run on iOS. You need to run it on a Mac or Windows computer. In Acrobat, you can choose File > Create > PDF from Web Page to do what you want. You cannot do that in iOS.

  • How to read list of all files in folder on application server?

    How to read list of all files in folder on application server?

    Hi,
    First get the files in application server using the following function module.
        CALL FUNCTION 'RZL_READ_DIR_LOCAL'
          EXPORTING
            name     = loc_fdir
          TABLES
            file_tbl = int_filedir.
    Here loc_fdir contains the application server path.
    int_filedir contains all the file names in that particular path.
    Now loop at int_filedir.
    OPEN DATASET int_filedir-name FOR INPUT IN TEXT MODE ENCODING  DEFAULT MESSAGE wf_mess.
    MESSAGE wf_mess.
        IF sy-subrc = 0.
          DO.
            READ DATASET pa_sfile INTO wf_string.
            IF sy-subrc <> 0.
              EXIT.
    endif.
    close datset int_filedir-name.
    endloop.

  • How to load external storage html file in web view

    hi all,
        how to load external storage html file in web view, please help me
       " ms-appdata://local/index.html" not working
    veerasuthan veerakesan

    It need be read as string. Then load the string by  Webview.NavigateToString.
    Sample as below
    string htmlstring = string.Empty;
    try
    var htmlfile = await Windows.Storage.ApplicationData.Current.LocalFolder.OpenStreamForReadAsync("a.html");
    using (System.IO.StreamReader streamReader = new System.IO.StreamReader(htmlfile))
    htmlstring = streamReader.ReadToEnd();
    webview.NavigateToString(htmlstring);
    catch(Exception ex)
    Debug.WriteLine(ex.ToString());
    在現實生活中,你和誰在一起的確很重要,甚至能改變你的成長軌跡,決定你的人生成敗。 和什麼樣的人在一起,就會有什麼樣的人生。 和勤奮的人在一起,你不會懶惰; 和積極的人在一起,你不會消沈; 與智者同行,你會不同凡響; 與高人為伍,你能登上巔峰。

Maybe you are looking for

  • Syncing a new ipod to my existing itunes

    I've just bought a new ipod as I lost my old one, and I need to sync it to my itunes. The last time I did this, I managed to erase the entire itunes library by mistake so had to set everything up again from scratch, so I want to avoid this happening

  • CAN YOU PURCHASE ITUNES MATCH USING AN ITUNES VOUCHER CREDIT?

    can you purchase/subscribe to itunes match using itunes voucher credit?

  • SENDING TO MULTIPLE BURNERS

    There are days when I need to burn many different DVD projects using iDVD from my computer. 1. Is there a way to have more than one active iDVD window open, like in iMovie? 2. Can you select external burners in iDVD? 3. Is it feasible to have multipl

  • How to unlock iPhone 4s tel cell without tel cell sim

    How to by pass iPhone 4s tecel

  • Executing javascript

    Would like to know how I can give the below javascript command to a java applet on a html page. Via javascript directly on the html page it works fine, but the manual of Flex 2 is not helping me understand how to do it from Flex (within the built swf