How to read UTF-8 encoded text file randomly?

I am trying to read a text file which has been encoded in UTF-8. The problem is that I need to access the file randomly. The RandomAccessFile is a low-level class and there seems to be no-way to wrap it in InputStreamReader so that UTF-8 encoding can be done on-the-fly. Is there any easy way to do that. Below is the simplified version of my program.
import java.io.*;
public class Test{
        public Test(String filename){
                try{
                        RandomAccessFile rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public static void main(String[] args){
                Test t= new Test("template.idx");
}

The file that I am going to read could be few hundreds of MBs or GBs. Hence, I will index interesting items in the file. The index file contain the keyword and the byte offset in the file. So, I will need to seek to any byte to read it. The file could be UTF-8 encoded XML or UTF-8 encoded plain text.
Also, would like to add-up that in the sample program above I am reading the file sequentially. The concerned class has another method which actually does the reading randomly. If this helps, I am pasting the simplified version of code again but this also includes the said method.
import java.io.*;
public class Test{
        long bloc;
        long eloc;
        RandomAccessFile rafTemIn;
        public Test(String filename){
                bloc=0L;
                eloc=0L;
                try{
                        rafTemIn = new RandomAccessFile(new File(filename), "r");
                        while(true){
                                char chr = rafTemIn.readChar();
                                System.err.println(chr);
                } catch (EOFException e) {
                        System.err.println("File read.");
                } catch (IOException e) {
                        System.err.println("File input error");
        public String getVal(String templateName){
                String stemval=null;
                try {
                        rafTemIn.seek(bloc); //bloc is a long value for beginng location to read from. It changes.
                        byte[] b = new byte[(int)(eloc - bloc + 1L)];
                        rafTemIn.read(b,0,(int) (eloc - bloc + 1L));
                        stemval = new String(b,"UTF-8");
                } catch(IOException eio) {
                        System.err.println("Template Dump file IO error.");
                return stemval;
        public static void main(String[] args){
                Test t= new Test("template.idx");
                System.out.println(t.getVal("wikipedia"));
}

Similar Messages

  • How to save a UTF-8 encoded text file ?

    hi People
    I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my ˚ degree symbols all whack.
    Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.
    But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)
    any help greatly appreciated../daniel
        var theComp = app.project.activeItem;
        var dataRO = theComp.layer("dataRO").sourceText;
        // prompt user to save file
        var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
        theFile = theFile.saveDlg("Save an ASCII export file.");
        if (theFile != null) {          // check user didn't cancel dialog
            theFile.lineFeed = "windows";
            //theFile.encoding = "UTF8";
            theFile.open("w","TEXT","????");
            theFile.writeln("move details:");
            theFile.writeln(dataRO.value.toString());
        theFile.close();

    Hi,
    Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.
    I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php
    However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)
    So here is the code.
    Header 1
    function convertCharToUTF(character){
        var utfBytes = "";
        c = character.charCodeAt(0)
        if (c < 0x80) {
            utfBytes =  String.fromCharCode (c);
        else if (c < 0x800) {
            utfBytes =  String.fromCharCode (0xC0 | c>>6);
            utfBytes +=  String.fromCharCode (0x80 | c & 0xbF);
        else if (c < 0x10000) {
            utfBytes = String.fromCharCode (0xE0 | c>>12);
            utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
            utfBytes += String.fromCharCode (0x80 | c & 0xbF);
        else if (c < 0x200000) {
            utfBytes += String.fromCharCode (0xF0 | c>>18);
            utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);
            utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
            utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);
            return utfBytes
    function convertStringToUTF(stringToConvert){
        var utfString = ""
        for (var i = 0 ; i < stringToConvert.length; i++){
            utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))
        return utfString;
    var theFile= new File("~/Desktop/_output.txt");
    theFile.open("w", "TEXT");
    theFile.encoding = "BINARY"
    theFile.linefeed = "Unix"
    theFile.write("");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)
    theFile.write(convertStringToUTF("Your stuff éàçËôù"));
    theFile.close();

  • How to read characters from a text file in java program ?

    Sir,
    I have to read the characters m to z listed in a text file .
    I must compare the character read from the file.
    And if any of the characters between m to z is matched i have to replace it with a hexadecimal value.
    Any help or suggesstions in this regard would be very useful.
    Thanking you,
    khurram

    Hai,
    The requirement is like this
    There is an input file, the contents of the file are as follows, you can assume any name for the file.
    #Character mappings for Japanese Shift-JIS character set
    #ASCII character Mapped Shift-JIS character
    m 227,128,133 #Half width katakana letter small m
    n 227,128,134 #Half width katakana letter small n
    o 227,129,129
    p 227,129,130
    q 227,129,131
    r 227,129,132
    s 227,129,133
    t 227,129,134
    u 227,129,135
    v 227,129,136
    w 227,129,137
    x 227,129,138
    y 227,129,139
    z 227,129,142
    The contents of the above file are to be read as input.
    On encountering any character between m to z, i have to do a replacement with a hexadecimal code point value for the multibyte representation in the second column in the input file.
    I have the code to get the unicode codepoint value from the multibyte representation, but not from a file.
    So if you could please tell me how to get the characters in the second column, it would be very useful for me.
    The character # is used to represent the beginning of a comment in the input file.
    And comment lines are to be ignored while reading the file.
    Say i have a string str="message";
    then i should replace the m with the unicode code point value.
    Thanking you,
    khurram

  • How to read data from a text file

    I have saved a text file on my hard drive, and would like to read the file. Once read I would like to copy it to another file. I think I am suppose to use the BufferedReader, and FileReader classes. I am not sure what sequence to follow in order to do this. Could some one give me some feed back on what is needed, and in what sequence to perform this task?

    Where do I store my file within my Platform(Eclipe), in order to read or access the file. Whenever I use the declaration below. I get an FileNotFoundException. I understand the code, but am not sure where to store the file, so that the line of code below will recognize it..
    FileReader in = new FileReader("data1.txt");
              BufferedReader a = new BufferedReader(in);

  • Hi how to read lines in a text file???

    Hi experts
    please help me out
    i wanna read a file which contaning students id and name.
    such as
    211 john
    122 david
    111 Chris
    and so on.
    i know how to read using bufferedreder but the problem is
    my teacher is gonna change the number of students in the file
    when she marks my code....
    so i don't know how many lines(students) will it be.in the file.....
    in this case, how to read a file?? i don't know how many lines will it be..
    but i know the maximum students in the file is 20.
    how to do it with this..........................
    experts~ please help me~~~

    try this
    try {
                FileReader reader = new FileReader(f);
                BufferedReader bufferedReader = new BufferedReader(reader);
                line = bufferedReader.readLine();
                while (line !=null) {
                    line = bufferedReader.readLine();
                bufferedReader.close();
                reader.close();
            catch(IOException e) {
                String msg = new String("Error Reading " + f + " data file");
                throw new OpenFileException(msg);
            }

  • How to read long line from text file

    Hi,
    I just faced problem when reading a big text file.
    BufferedReader br = new BufferedReader(new FileReader("D:\\afile.txt"));
    String str;
    int i;
    while ((str = br.readLine())!=null)
    i++;
    //do some work here...
    ...This code throws exception:
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap spaceSo I thought file line is very long. When I opened it in FAR it only displays no more than 4096 characters as a line.
    Help me to read a text file that's line is very long?

    try,
    $ java -X
    -Xmixed mixed mode execution (default)
    -Xint interpreted mode execution only
    -Xbootclasspath:<directories and zip/jar files separated by ;>
    set search path for bootstrap classes and resources
    -Xbootclasspath/a:<directories and zip/jar files separated by ;>
    append to end of bootstrap class path
    -Xbootclasspath/p:<directories and zip/jar files separated by ;>
    prepend in front of bootstrap class path
    -Xnoclassgc disable class garbage collection
    -Xincgc enable incremental garbage collection
    -Xloggc:<file> log GC status to a file with time stamps
    -Xbatch disable background compilation
    -Xms<size> set initial Java heap size
    -Xmx<size> set maximum Java heap size
    -Xss<size> set java thread stack size
    -Xprof output cpu profiling data
    -Xrunhprof[:help]|[:<option>=<value>, ...]
    perform JVMPI heap, cpu, or monitor profiling
    -Xdebug enable remote debugging
    -Xfuture enable strictest checks, anticipating future default
    -Xrs reduce use of OS signals by Java/VM (see documentation)
    -Xcheck:jni perform additional checks for JNI functions
    The -X options are non-standard and subject to change without notice.

  • Reading UTF-8 Encoding xml file sqlserver

    Hi ,
    I am recieving a xml file from a third party vendor. it is encoded in UTF-8. while i am reading it i am getting the below error.
    Msg 9420, Level 16, State 1, Line 3
    XML parsing: line 30117390, character 33, illegal xml character
    the characters causing the problem are like è,Ö,è.
     My database default collation is ‘SQL_Latin1_General_CP1_CI_AS’
    I am using the below query to read the xmlfile.
    declare @xml xml
    SELECT
    @xml= CAST(x AS XML)
    FROM
    OPENROWSET(BULK 'D:\sample.xml',SINGLE_BLOB) AS T(x)
    select
    X.product.value('(ID/text())[1]', 'varchar(50)') as ID ,
    X.product.value('(Name/text())[1]', 'varchar(50)') as Name
    from
    @xml.nodes('Students/Student') AS X(product)
    how can i read the file successfully. any help is appreciated.
    Thanks in advance.

    This issue normally happens when the XML file is not in the correct format. To save in the correct format open the xml file and click save as. Choose the encoding option as "UTF-8".
    Regards, RSingh

  • How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?

    How to get UTF-8 encoding when create XML using DBMS_XMLGEN and UTL_FILE ?
    Hi,
    I do generate XML-Files by using DBMS_XMLGEN with output by UTL_FILE
    but it seems, the xml-Datafile I get on end is not really UTF-8 encoding
    ( f.ex. cannot verifying it correct in xmlspy )
    my dbms is
    NLS_CHARACTERSET          = WE8MSWIN1252
    NLS_NCHAR_CHARACTERSET     = AL16UTF16
    NLS_RDBMS_VERSION     = 10.2.0.1.0
    I do generate it in this matter :
    declare
    xmldoc CLOB;
    ctx number ;
    utl_file.file_type;
    begin
    -- generate fom xml-view :
    ctx := DBMS_XMLGEN.newContext('select xml from xml_View');
    DBMS_XMLGEN.setRowSetTag(ctx, null);
    DBMS_XMLGEN.setRowTag(ctx, null );
    DBMS_XMLGEN.SETCONVERTSPECIALCHARS(ctx,TRUE);
    -- create xml-file:
    xmldoc := DBMS_XMLGEN.getXML(ctx);
    -- put data to host-file:
    vblob_len := DBMS_LOB.getlength(xmldoc);
    DBMS_LOB.READ (xmldoc, vblob_len, 1, vBuffer);
    bHandle := utl_file.fopen(vPATH,vFileName,'W',32767);
    UTL_FILE.put_line(bHandle, vbuffer, FALSE);
    UTL_FILE.fclose(bHandle);
    end ;
    maybe while work UTL_FILE there is a change the encoding ?
    How can this solved ?
    Thank you
    Norbert
    Edited by: astramare on Feb 11, 2009 12:39 PM with database charsets

    Marco,
    I tryed to work with dbms_xslprocessor.clob2file,
    that works good,
    but what is in this matter with encoding UTF-8 ?
    in my understandig, the xmltyp created should be UTF8 (16),
    but when open the xml-file in xmlSpy as UTF-8,
    it is not well ( german caracter like Ä, Ö .. ):
    my dbms is
    NLS_CHARACTERSET = WE8MSWIN1252
    NLS_NCHAR_CHARACTERSET = AL16UTF16
    NLS_RDBMS_VERSION = 10.2.0.1.0
    -- test:
    create table nh_test ( s0 number, s1 varchar2(20) ) ;
    insert into nh_test (select 1,'hallo' from dual );
    insert into nh_test (select 2,'straße' from dual );
    insert into nh_test (select 3,'mäckie' from dual );
    insert into nh_test (select 4,'euro_€' from dual );
    commit;
    select * from nh_test ;
    S0     S1
    1     hallo
    1     hallo
    2     straße
    3     mäckie
    4     euro_€
    declare
    rc sys_refcursor;
    begin
    open rc FOR SELECT * FROM ( SELECT s0,s1 from nh_test );
    dbms_xslprocessor.clob2file( xmltype( rc ).getclobval( ) , 'XML_EXPORT_DIR','my_xml_file.xml');
    end;
    ( its the same when using output with DBMS_XMLDOM.WRITETOFILE )
    open in xmlSpy is:
    <?xml version="1.0"?>
    <ROWSET>
    <ROW>
    <S0>1</S0>
    <S1>hallo</S1>
    </ROW>
    <ROW>
    <S0>2</S0>
    <S1>straޥ</S1>
    </ROW>
    <ROW>
    <S0>3</S0>
    <S1>m㢫ie</S1>
    </ROW>
    <ROW>
    <S0>4</S0>
    <S1>euro_€</S1>
    </ROW>
    </ROWSET>
    regards
    Norbert

  • Read data from a text file, one line at a time.

    I need to read data from a text file, and display each line in a String Indicator on Front Panel. It displays each line, but I get Error 4, End Of Line, unless I enter an extra line of data in the file that I don't need. I tried Read From Text File.vi, made by Nat Instr, and it gave the same error.

    The Read from Text File.vi reads data from a text file line by line until the user stops the VI manually with the Stop button on the front panel, or until an error (such as "Error 4, End of file") occurs. If an error occurs, the Simple Error Handler.vi pops up a dialog that tells you which error occurred.
    The Read from Text File.vi uses a while loop, but if you knew how many lines you wanted to read, you could replace the while loop with a for loop set to read that many lines from the file.
    If you need something more dynamic because the number of lines in your files vary, then you could change the code of the Read from Text File.vi to the expect "Error 4, End of file" and handle it appropriately. This would require unbundling the error cluster that comes fro
    m the Read File function with the Unbundle By Name function, so that you can expose the individual error "status" and error "code" values stored in the cluster. If the value of the error "code" is 4, then you can change the error "status" from true to false, and you can rebundle the cluster with the Bundle by Name function. Setting the error "status" to false instructs the Simple Error Handler to ignore the error. Otherwise, pass the original error cluster to the Simple Error Handler.vi, so that you can see what the error is.
    Of course, if you're not interested in what the errors are, you could just remove the Simple Error Handler.vi, but then you wouldn't see any error messages.
    Best of Luck,
    Dieter
    Dieter Schweiss
    Applications Engineer
    National Instruments

  • Need to read data from a text file

    I need to read data from a text file and create my own hash table out of it. I'm not allowed to use the built in Java class, so how would I go implementing my own reading method and hash table class?

    It's not possible to read from a file without using classes from the core API*. You'll have to get clarification from your instructor as to which classes are and are not allowed.
    [http://java.sun.com/docs/books/tutorial/essential/io/]
    *Unless you write a bunch of JNI code to replicate what the java.io classes are doing.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • How to read the complete path in file upload UI

    Hi,
    I want to know how to read the complete path in file upload UI in java web dynpro.
    I have created 1 file upload UI and than when i do browse and select some file say small.jpg from my local PC, desktop , its path is coming in file upload UI like E:\small.jpg,
    I want to know how to get this path in java webdynpro code.
    please let me know..

    Hi Satyam,
    In webdynpro java, first file stores in server location then it reads from server.
    Create a button with upload and write this code OnAction
    Resource is the attribute name in context of type com.sap.ide.webdynpro.uielementdefinitions.Resource, this attribute is for Resource property for Upload UI Element.
    Then in OnAction of button
    InputStream text = null;
           int temp=0;
           try{
                File file = new File(wdContext.currentContextElement().getResource().getResourceName().toString());
               String path = file.getAbsolutePath();
                wdComponentAPI.getMessageManager().reportSuccess(path);
           }catch(Exception e){
                e.printStackTrace();
        //@@end
    Regards,
    Pradeep
    Edited by: pradeep_546 on May 11, 2011 12:22 PM

  • How to get the content of text file to write in JTextArea?

    Hello,
    I have text area and File chooser..
    i wanna the content of choosed file to be written into text area..
    I have this code:
    import java.awt.Container;
    import java.awt.FlowLayout;
    import java.awt.event.ActionEvent;
    import java.awt.event.ActionListener;
    import java.io.File;
    import javax.swing.JButton;
    import javax.swing.JFileChooser;
    import javax.swing.JFrame;
    import javax.swing.JLabel;
    import javax.swing.*;
    public class Test_Stemmer extends JFrame {
    public Test_Stemmer() {
    super("Arabic Stemmer..");
    setSize(350, 470);
    setDefaultCloseOperation(EXIT_ON_CLOSE);
    setResizable(false);
    Container c = getContentPane();
    c.setLayout(new FlowLayout());
    JButton openButton = new JButton("Open");
    JButton saveButton = new JButton("Save");
    JButton dirButton = new JButton("Pick Dir");
    JTextArea ta=new JTextArea("File will be written here", 10, 25);
    JTextArea ta2=new JTextArea("Stemmed File will be written here", 10, 25);
    final JLabel statusbar =
                  new JLabel("Output of your selection will go here");
    // Create a file chooser that opens up as an Open dialog
    openButton.addActionListener(new ActionListener() {
       public void actionPerformed(ActionEvent ae) {
         JFileChooser chooser = new JFileChooser();
         chooser.setMultiSelectionEnabled(true);
         int option = chooser.showOpenDialog(Test_Stemmer.this);
         if (option == JFileChooser.APPROVE_OPTION) {
           File[] sf = chooser.getSelectedFiles();
           String filelist = "nothing";
           if (sf.length > 0) filelist = sf[0].getName();
           for (int i = 1; i < sf.length; i++) {
             filelist += ", " + sf.getName();
    statusbar.setText("You chose " + filelist);
    else {
    statusbar.setText("You canceled.");
    // Create a file chooser that opens up as a Save dialog
    saveButton.addActionListener(new ActionListener() {
    public void actionPerformed(ActionEvent ae) {
    JFileChooser chooser = new JFileChooser();
    int option = chooser.showSaveDialog(Test_Stemmer.this);
    if (option == JFileChooser.APPROVE_OPTION) {
    statusbar.setText("You saved " + ((chooser.getSelectedFile()!=null)?
    chooser.getSelectedFile().getName():"nothing"));
    else {
    statusbar.setText("You canceled.");
    // Create a file chooser that allows you to pick a directory
    // rather than a file
    dirButton.addActionListener(new ActionListener() {
    public void actionPerformed(ActionEvent ae) {
    JFileChooser chooser = new JFileChooser();
    chooser.setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY);
    int option = chooser.showOpenDialog(Test_Stemmer.this);
    if (option == JFileChooser.APPROVE_OPTION) {
    statusbar.setText("You opened " + ((chooser.getSelectedFile()!=null)?
    chooser.getSelectedFile().getName():"nothing"));
    else {
    statusbar.setText("You canceled.");
    c.add(openButton);
    c.add(saveButton);
    c.add(dirButton);
    c.add(statusbar);
    c.add(ta);
    c.add(ta2);
    public static void main(String args[]) {
    Test_Stemmer sfc = new Test_Stemmer();
    sfc.setVisible(true);
    }could you please help me, and tell me what to add or to modify,,
    Thank you..                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

    realahmed8 wrote:
    thanks masijade,
    i have filter the file chooser for only text files,
    but i still don't know how to use FileReader to put text file content to the text area (ta) ..
    please tell me how and where to use it..How? -- See the IO Tutorials on Sun for the FileReader (and I assume you know how to call setText and append in the JTextArea).
    Where? -- In the actionPerformed method (better would be a separate thread that is triggered through the actionPerformed method, but that is probably beyond you at the moment), of course.
    Give it a try.

  • Where are these unix executable files coming from and how do I recover the original text file?

    where are these unix executable files coming from and how do I recover the original text file?

    When you upgraded to Lion did you have AppleWorks installed on your mac?
    Most of the AW documents can be opened by Pages 09 or Numbers 09 with most of the orginal format in tact. (I do not know if previouse verision will work) just open the AW file with both and see which one works best.
    Text Edit will also open most of the AW files as well but will require a lot of work to restore them to their orginal format.
    If you have AW Database documents then they are not supported. 
    These document show up as "exec icons", Kind: Unix Executagle File.
    They also will show up as .cwk file if they are small files. I have a couple that were under 1mb that are shown as " Kind: AppleWorks Document" but will not open.
    The only option to open AW database is to have AW installed on a mac with a pre-Lion OS to recover the file.

  • How can u insert and retrieve text files in any format using forms6i.

    how can u insert and retrieve text files in any format using forms6i.
    can u give me an example of an insert statement, let's assume the file is located in the a:drive.
    and retrieving the files, i would give the user a list of all the files that are in the database, the user would select one, but what command(or piece of code) would open the file in its apppropriate editor.
    e.g .pdf formatted file would open in acrobat.
    any help would be appreciated.
    Thanks
    Hussein Saiger

    the filereference class is for downloading and uploading files.
    if you want to load xml, use the xml class.
    and, if you want to write to an xml file and don't want to use server-side code, wait.

  • Read a large size text file

    how can i read a large size text file in multiple parts without lossin any data ?
    Ben

    Why are you afraid of losing data? There's no reason that you would lose data if you are reading a large text file.
    You should use the various Reader and Writer classes in package java.io to read and write text files. Here's an example of how you can read a text file line by line:
    BufferedReader r = new BufferedReader(new FileReader("myfile.txt));
    int lineno = 1;
    String line;
    while ((line = r.readLine()) != null) {
      System.out.printf("%d: %s%n", i, line);
      i++;
    r.close();

Maybe you are looking for