How to read SGML files using Java

I've got a text categorisation test collection called Reuters-21578 for my Information Retrieval project. It is distributed in 22 files. Each of the first 21 files (reut2-000.sgm through reut2-020.sgm) contains 1000 documents, while the last (reut2-021.sgm) contains 578 documents. The files are in SGML format. Each of the 22 files begins with a document type declaration line:
<!DOCTYPE lewis SYSTEM "lewis.dtd"> The DTD file lewis.dtd is included in the distribution. Following the document type declaration line are individual Reuters articles marked up with SGML tags.
My questions is how to write a java program to read those 21578 documents or transform them into 21578 seperated text files.

I guess I missed something. What is Renes link?. The
parser stuff isn't really what I'm looking for. I'm
a new at and just learning java and I just want to
know the easiest way to read a SGML file. Should I
use a buffered Reader with a Pushback Input Stream?Hang on.....you want to just read the file without intelligently extracting the SGML data contained within and so have no need of a parser?
Well, in that case, its just text.....so just use BufferedReader or whatever to read the text data. If I understand you correctly, all you really wanted to ask was "how do I read a text file?"

Similar Messages

  • How to read pdf files using java.io package classes

    Dear All,
    I have a certain requirement that i should read and write PDF files at runtime. With normal java file IO reading is not working. Can any one suggest me how to proceed probably with sample code block
    Thanks in advance.

    hi I also have the pbm. to read pdf file using JAVA
    can any body help meWhy is it so difficult to read the thread you posted in? They say: java.io is pointless, use iText. So why don't you?
    or also I want to read a binary encoded data into
    ascii,
    can anybody give me a hint how to do it.Depends on what you mean with "binary encoding". ASCII's binary encoding, too, basically.

  • How to read word files using java

    Reding text files is prity simple. But when i tried to read msword file I could do it.
    Can any one discuss how to do it
    Thanks

    Sorry this is not a reply but in fact i need the solution for that as i am in an urgency of that can you post that to to me if u have got it, I need it for my project

  • How To Read RTF file in JAVA?  Using  iText?

    How To Read RTF file in JAVA?  Using  iText?.....
    import java.io.*;
    import com.lowagie.text.*;
    import com.lowagie.text.rtf.*;
    public class RTF3 {
    public static void main(String[] args) {
    // System.out.println("This example generate a RTF file name Sample.rtf");
    // Create Document object
    Document myDoc = new Document();
    try {
    // Create writer to listen document object
    // and directs RTF Stream to the file Sample.rtf
    RtfWriter2.getInstance(myDoc, new FileOutputStream("Sample.rtf"));
    // open the document object
    myDoc.open();
    // Create a paragraph
         Paragraph p = new Paragraph();
         p.add("Helloworld in Rtf file..amazing isn't");
         // Add the paragraph to document object
    myDoc.add(p);
    catch(Exception e) {
    System.out.println(e);
    //close the document
    myDoc.close();
    Exception in thread "main" java.lang.NoSuchMethodError: com.lowagie.text.Rectangle.width()F
         at com.lowagie.text.rtf.document.RtfPageSetting.rectEquals(RtfPageSetting.java:433)
         at com.lowagie.text.rtf.document.RtfPageSetting.guessFormat(RtfPageSetting.java:362)
         at com.lowagie.text.rtf.document.RtfPageSetting.setPageSize(RtfPageSetting.java:341)
         at com.lowagie.text.rtf.RtfWriter2.setPageSize(RtfWriter2.java:248)
         at com.lowagie.text.Document.open(Unknown Source)
         at view.RTF3.main(RTF3.java:23)
    CAN you HELP me?

    import com.lowagie.text.Document;
    import com.lowagie.text.rtf.parser.RtfParser;
    import java.io.FileInputStream;
    String inputFile = "sample.rtf";
    Document document = new Document();
    document.open();
    RtfParser parser = new RtfParser(null);
    parser.convertRtfDocument(new FileInputStream(inputFile), document);

  • How to read HTML files using UTL_FILE

    Hello Friends,
    How to read HTML files using UTL_FILE package ? According
    to Oracle documentation UTL_FILE can read or write OS Text Files.
    Thanx in advance..
    Adi

    HI Hareesh,
    i have gone through that blog.
    i tried it...but i am getting mapping error  no receiver determination fond because there are so  many excel files.
    my data is available on sharedString.xml but also it is in not same order.
    i have no clue how to handle this part form the blog.
    "This way our mapping will receive all data from the sheet in an XML format. The only thing that's left is to create an XSD file from the XML file we received in order to be able to use it in the mapping and as our Service Interface and we can proceed with mapping. As you can see from the sheet.xml files all the data is placed with column name and row number so it's not that difficult to map it to an table type format using the Message Mapping only (no java, abap mapping required)."

  • How to print PDF files using java print API

    Hi,
    I was goign throw lot of discusion and reading lot of forums related to print pdf files using java api. but nothing seems to be working for me. Can any one tell me how to print pdf files using java api.
    Thanks in advance

    Mike,
    Can't seem to get hold of the example described in your reply below. If you could let us have the URL to get then it would be great.
    My GUI application creates a pdf document which I need to print. I want to achieve this using the standard Java class PrinterJob (no 3rd party APIs I'm afraid, commercial restraints etc ..). I had a stab at it using the following code. When executed I get the pretty printer dialog then when I click ok to print, nothing happens!
    boolean showPrintDialog=true;
    PrinterJob printJob = PrinterJob.getPrinterJob ();
    printJob.setJobName ("Contract.pdf");
    try {
    if (showPrintDialog) {
    if (printJob.printDialog()) {
    printJob.print();
    else
    printJob.print ();
    } catch (Exception PrintException) {
                   PrintException.printStackTrace();
    Thank you and a happy new year.
    Cheers,
    Chris

  • How to uncompress zip files using java program

    hai,
    please give some sample code to decompress the zip file.
    how to uncompress zip files using java program
    thanking you
    arivarasu

    http://developer.java.sun.com/developer/technicalArticles/Programming/PerfTuning/
    Scroll down to 'Compression'

  • How to read pdf file using file adapter

    Hi..
        How to read pdf file using file adapter?
    regards
    Arun

    Hi
    This may help you
    /people/sap.user72/blog/2005/07/27/xi-generate-pdf-file-out-of-file-adapter
    /people/alessandro.guarneri/blog/2007/02/21/sap-xi-acting-as-a-huge-file-mover
    ---Ram

  • How to read system eventlog using java program in windows?

    How to read system eventlog using java program in windows?
    is there any java class available to do this ? or any one having sample code for this?
    Your friend Zoe

    Hi,
    There is no java class for reading event log in windows, so we can do one thing we can use windows system 32 VBS script to read the system log .
    The output of this command can be read using java program....
    we can use java exec for executing this system32 vbs script.
    use the below program and pass the command "eventquery"
    plz refer cscript,wscript
    import java.io.*;
    public class CmdExec {
    public static void main(String argv[]) {
    try {
    String line;
    Process p = Runtime.getRuntime().exec("Command");
    BufferedReader input =
    new BufferedReader
    (new InputStreamReader(p.getInputStream()));
    while ((line = input.readLine()) != null) {
    System.out.println(line);
    input.close();
    catch (Exception err) {
    err.printStackTrace();
    This sample program will list all the system log information....
    Zoe

  • How to read system evenlog using java program in windows

    How to read system evenlog using java program in windows???
    is there any java class available to do this ? or any one having sample code for this?
    Your friend Zoe

    Welcome to the Sun forums.
    >
    How to read system evenlog using java program in windows???>
    JNI. (No.)
    >
    is there any java class available to do this ? or any one having sample code for this?>You will generally get better help around here if you read the documentation, try some sample code and come back with a specific question (hopefully with an SSCCE included).
    >
    Your friend Zoe>(raised eyebrow) Thank you for sharing that with us.
    Note also that one '?' denotes a question, while 2 or more generally denotes a dweeb.

  • How to run .jar on linux & how to create .jar file using java?

    hi, may i know how to run .jar on linux & how to create .jar file using java? Can u provide the steps on doing it.
    thanks in advance.

    Look at the manual page for jar:
    # man jar
    Also you can run them by doing:
    # java -jar Prog.jar

  • Reading PDF file Using java.

    I tried to read the pdf file using FileInputStream. but it gives the Juncked charectars.
    How can i read(means content) the pdf file using Java.

    I just found the "Multivalent" library, it is free and will do exactly what you want: http://www.cs.berkeley.edu/~phelps/Multivalent/
    Check out the source of the tools/ExtractText.java file
    Ed

  • How to read XML files from java

    i need a sugession that how to read a xml file using java code
    and i need to parse using some parsers and display attributes and entity seperately
    as a string.......

    import org.dom4j.Document;
    import org.dom4j.DocumentException;
    import org.dom4j.io.SAXReader;
    import java.io.File;
    import java.text.AttributedCharacterIterator.Attribute;
    import java.util.Iterator;
    import java.util.StringTokenizer;
    public class XmlParser
    private String Result="";
    private String Final="";
    private String Delim="";
    public void bar1(Document document) throws DocumentException
    org.dom4j.Element root = document.getRootElement();
    // System.out.println(root.getName());
    bar2(root);
    System.out.println(this.Result);
    process();
    public void bar2(org.dom4j.Element e)
    for(Iterator i = e.elementIterator();i.hasNext();)
    org.dom4j.Element Element = (org.dom4j.Element) i.next();
    Result += Element.getName()+"\t"+Element.getText()+"\n";
    bar2(Element);
    public void process()
    StringTokenizer Tokenizer = new StringTokenizer(this.Result,"\n");
    String element;
    while(Tokenizer.hasMoreTokens())
    element = Tokenizer.nextToken();
    StringTokenizer Tokenizer2 = new StringTokenizer(element,"\t");
    // Do what ever String Process here Example
    this.Final += element.getName();
    this.Final += this.Delim;
    System.out.println(this.Final);
    public static void main(String s[])throws Exception
    Document document = null;
    SAXReader reader = new SAXReader();
    File f1= new File("D:/Rajesh/EDI to XML/EDI.xml");
    document = reader.read(f1);
    Demo obj = new Demo();
    obj.bar1(document);
    i think this will hep full.......

  • Read Text file using Java Script

    Hi,
    I am trying to read a text file using Java Script within the webroot of MII as .HTML file. I have provided the path as below but where I am not able to open the file. Any clue to provide the relative path or any changes required on the below path ?
    var FileOpener = new ActiveXObject("Scripting.FileSystemObject");
    var FilePointer = FileOpener.OpenTextFile("E:\\usr\\sap\\MID\\J00\\j2ee\\cluster\\apps\\sap.com\\xapps~xmii~ear\\servlet_jsp\\XMII\\root\\CM\\OCTAL\\TestTV\\Test.txt", 1, true);
    FileContents = FilePointer.ReadAll(); // we can use FilePointer.ReadAll() to read all the lines
    The Error Log shows as :
    Path not found
    Regards,
    Mohamed

    Hi Mohamed,
    I tried above code after importing JQuery Library through script Tag. It worked for me . Pls check.
    Note : You can place Jquery1.xx.xx.js file in the same folder where you saved this IRPT/HTML file.
    <HTML>
    <HEAD>
        <TITLE>Your Title Here</TITLE>
        <SCRIPT type="text/javascript" src="jquery-1.9.1.js"></SCRIPT>
        <script language="javascript">
    function Read()
    $.get( "http://ldcimfb.wdf.sap.corp:50100/XMII/CM/Regression_15.0/CrossTab.txt", function( data ) {
      $(".result").html(data);
      alert(data);
    // The file content is available in this variable "data"
    </script>
    </HEAD>
    <BODY onLoad="Read()">
    </BODY>
    </HTML>

  • How to read a file using servlet

    hi ,
    i've to read a file using servlet ,
    should read the file using servlet and display it in JSP,Could anybody get me how can i do it .
    Shiva

    To do that you need to get the response output stream and write yur file contents to that.
    response.setContentType(mimeType); //Set the mime type for the response
    ServletOutputStream sos = resp.getOutputStream();
    sos.write(bytes from your file input stream);
    sos.close();

Maybe you are looking for

  • Configuring Apache httpd.conf for Multiple Web Servers

    After successfully serving a single web site from my MacMini, I'm having trouble configuring httpd.conf to add an additional site. My httpd.conf file looks something like this: NameVirtualHost *:80 <VirtualHost *:80> ServerName www.domain1.net Docume

  • OBIEE/ADF Integration using the Action Framework

    I would like to integrate OBIEE and ADF to achieve the following. 1. Embed BI objects into an ADF application 2. Pass parameter from the ADF application to the BI objects 3. Pass context (parameters) from the BI object to the ADF components 4. Have t

  • OpenGL/Elite3D Performance in Solaris 10?

    Hi Folks, I've searched but can't seem to find anything on this. I have an Ultra-2 with 2x300MHz, 640MB of RAM, and an Elite3d-M6 framebuffer. Life is good, and Solaris 10 is great. But - I use a brain modeling application that converts MRI images of

  • CS5.5 will not export MPEG2 Transport stream with mpg extension?

    I am creating video for the local cable access channel. I have been using the MPEG2 NTSC DV High Quality preset. That preset outputs MPEG layer 2 audio. The cable access coordinator says that those files will not play audio from the Soloist media ser

  • L2 or l3 switch with NAC appliance

    Hi, I am planning for deploying NAC appliance in OOBVG mode. For the access layer, L2 switches are selected (2960). If I change the L2 access switches with L3 (3560 or 3750) would this add more manageability to the access layer by NAC? Regards, Mlade