HTML  Parsers and reading all files in a directory

Hello all,
I was wondering if there was a html parser included in java. I am writing a program where I want to be able to search all the files in a directory tree for a particular string. I was able to read one file in a single directory but not in any other directory. How would I go about writing code to be able to access all files in directory that has multiple sub folders.
Also, when I started to write my program I tried to use the DOM XML Parser to parse my html page. My logic behind this decision was that if you look at html code it is an xml document. But as I was trying to run my program I noticed that I had to convert my html document into xml format. I really don't want to have to build my own html parser. Is there a html parser that is included in Java. Oh my program is just your basic text program. No interfaces.
Thanks for any help that you can provide
Hockeyfan

a particular string. I was able to read one file in
a single directory but not in any other directory.
How would I go about writing code to be able to
access all files in directory that has multiple sub
folders.
You can do that several ways.
Most likely you'll end up with some sort of recursive iteration over the directory tree.
Not hard to write, somewhat harder to prevent memory problems if you end up with a lot of data.
Also, when I started to write my program I tried to
use the DOM XML Parser to parse my html page. My
logic behind this decision was that if you look at
html code it is an xml document. But as I was
trying to run my program I noticed that I had to
convert my html document into xml format. I really
don't want to have to build my own html parser. Is
there a html parser that is included in Java. Oh my
program is just your basic text program. No
interfaces.
A VALID xhtml document is a valid XML document.
Problem is that most HTML isn't xhtml.
Another problem is that most HTML is fundamentally broken even to its own standards (which have always been based on SGML on which XML is also based).
Browsers take that into account by being extremely lax on standards compliance and effectively making up tags as they go along to fill in the missing ones in the trees they parse.
That's however not a standardised process and each browser handles it differently (and as a result most html will render differently in different browsers).
Java contains a simple HTML parser in Swing, but it's primitive and will only parse a subset of HTML 2.0.
There are almost certainly 3rd party libraries out there that can do better, both free and/or commercial.

Similar Messages

  • How to open and read many files from a directory and store contents in 2D array?

    I want to make a VI that opens and reads the data from various files contained in a directory (200 files each with 2 columns) and store these in a single 2D array. For file number 1 I want to store the data from both columns in the 2D array, but for files 2 to 200 I only want to store the second column of each file. Can someone please help?

    Hi Nadav,
    Thanks for your help. I have followed your instructions but i cannot get it to work. I used the LIST DIRECTORY to list the files in the directory - that works. However, how do I read each of the 200 files using READ FROM SPREADSHEET FILE without me having to manually select each of the 200 files? So, if I use LIST DIRECTORY to list all 200 files in an array, how do I get each of these to open and store the data in a 2D array? Here is what I have done (File called read_files.VI) Could you please help me? Thank you very much in advance.
    Attachments:
    read_files.vi ‏18 KB

  • Read all files in a directory

    I need to compare the date of creation of multiple files that are being created throughout the execution of the application, so that if the date and time of the file is equal to the current date and time, a led will light up, and if not, the file will be deleted.
    I've made a VI, but I'm getting an error and I don't know how to fix it.
    What I need is for the vi to read all the files in the directory and continue reading all the files as they are created in the directory. The weird thing about this error is that it only shows up when I run it without highlight execution.
    Attachments:
    Date & Time comparison.vi ‏10 KB

    Several things come to mind.
    0. What error do you get? The error code or description can be very useful in identifying a problem.
    1. When a VI runs OK with execution highlighting and fails without, that usually indicates a timing problem.
    2. The inner loop runs as fast as possible because there is nothing in there to slow it down. Put a Wait (ms) in there to slow things down a bit.
    3. The loops will never stop.  Do not create infinite loops by wiring False constants to the Stop if True terminal. The inner loop should probably be a for loop which will automatically stop after checking all files. The outer loop should have something to stop it - a front panel button, error, something.
    4. The timestamp has approximately 19 digits of precision in the fraction of a second portion. The resolution is limited by the clock in the computer. The implication is that you will almost never get exact equality.  You need to define "how close" is close enough. Perhaps within 1 second? Then modify the comparison function to determine whether the timestamps are within that limit.
    Lynn

  • Trying to modify SaveDocsAsSVG to open and save all files within a directory automatically

    I am trying to do as the title suggests, but I can't find a way to specify the folder I want the script to look at and I can't get it to open/save/close the files without user interaction. Another way to say it is that I need this to be itterative as the PC I'm using cannot handle too many open files at once.
    On a side note, I can't even find the documentation. Example, what arguments does Folder.selectDialog() accept? Where is the documentation for that?
    Thank peeps!
    Using CS6 trying to convert EPS to SVG

    Here is what I have so far. I think I'm close I just need to figure out how to open the documents in the app and close them after converting them so only one document is open at a time. My goal is to run this program over night since there are thousands of files to convert. Thank you all for your help!
    app.userInteractionLevel = UserInteractionLevel.DONTDISPLAYALERTS;
    try {
              if (app.documents.length > 0 ) {
                        // Get the folder to save the files into
            var inFolder = Folder( Folder.desktop + '/temp' )
            var files = inFolder.getFiles( /\.eps$/i );
                        var destFolder = null;
                        destFolder = '~/Desktop/temp';
                        if (destFolder != null) {
                                  var options, i, sourceDoc, targetFile;
                            // Get the SVG options to be used.
                                  options = this.getOptions();
                            // You can tune these by changing the code in the getOptions() function.
                                  for ( i = 0; i < files.length; i++ ) {
                      //THIS IS WHAT I NEED HELP WITH
                      var docRef = open(files[i]);
                      sourceDoc = app.documents[i]; // returns the document object
                                            // Get the file to save the document as svg into
                                            targetFile = this.getTargetFile(sourceDoc.name, '.svg', destFolder);
                                            // Save as SVG
                                            sourceDoc.exportFile(targetFile, ExportType.SVG, options);
                                            // Note: the doc.exportFile function for SVG is actually a Save As
                                            // operation rather than an Export, that is, the document's name
                                            // in Illustrator will change to the result of this call.
                                  alert( 'Documents saved as SVG' );
              else{
                        throw new Error('There are no document open!');
    catch(e) {
              alert( e.message, "Script Alert", true);
    /** Returns the options to be used for the generated files.
              @return ExportOptionsSVG object
    function getOptions()
              // Create the required options object
              var options = new ExportOptionsSVG();
              // See ExportOptionsSVG in the JavaScript Reference for available options
              // Set the options you want below:
              // For example, uncomment to set the compatibility of the generated svg to SVG Tiny 1.1
              // options.DTD = SVGDTDVersion.SVGTINY1_1;
              // For example, uncomment to embed raster images
              // options.embedRasterImages = true;
              return options;
    /** Returns the file to save or export the document into.
              @param docName the name of the document
              @param ext the extension the file extension to be applied
              @param destFolder the output folder
              @return File object
    function getTargetFile(docName, ext, destFolder) {
              var newName = "";
              // if name has no dot (and hence no extension),
              // just append the extension
              if (docName.indexOf('.') < 0) {
                        newName = docName + ext;
              } else {
                        var dot = docName.lastIndexOf('.');
                        newName += docName.substring(0, dot);
                        newName += ext;
              // Create the file object to save to
              var myFile = new File( destFolder + '/' + newName );
              // Preflight access rights
              if (myFile.open("w")) {
                        myFile.close();
              else {
                        throw new Error('Access is denied');
              return myFile;

  • Read all files in chield and their chield's directory

    I am trying to make a program which read all files in a directory and its subdirectory like scandisk program of window's operating system. I gause this I can do via recursion using the methods isDirectory(), File.list() etc. Please help me to implement this problem in code.
    Regards
    yogesh

    you guys should collaborate,
    http://forum.java.sun.com/thread.jsp?forum=4&thread=153110&start=0&range=30#443719

  • How  to read all files  under a folder directory in FTP site

    Hi Experts,
    I use this SQL to read data from a file in FTP site. utl_file.fopen('ORALOAD', file_name,'r');
    But this need to fixed file name in a directory. However, client generate output file with auto finename.
    SO do we have any way to read all file by utl_file.fopen('ORALOAD', file_name,'r');
    We need to read all file info. because client claim for security issue and does not to overwirte output file name,
    we must find a way to read all file in output directory.
    Thanks for help!!!
    Jim

    If you use Chris Poole's XUTL_FTL package, I believe that contains functions that allows you to query the directory contents.
    http://www.chrispoole.co.uk/apps/xutlftp.htm
    Edited by: BluShadow on Jan 13, 2009 1:54 PM
    misread the original post

  • Reading all files on directory using "utl_file" package...

    I need to read all files in directory via PL/SQL. I don't know
    name files (are data dynamics create for automation system),
    only I know your extensions.
    Can I do this using the package "utl_file" or I need to create
    program in another language (C, C++, for example)?
    Any ideas...
    Thanks.

    Hi,
    you can't do that with the UTL_FILE package (it can't retrieve
    file names).
    A very simple solution would be, if you created on OS-level a
    file which contains the filenames of directory and then read this
    file using UTL_FILE. With the information on all file names you
    can enter a loop which opens and reads all files again using
    UTL_FILE.
    A more mundane solution could be to use the features on the iFS.
    Cheers
    Gerald

  • FTP Adapter to read multiple files from a directory. Not through polling.

    Dear Friends,
    I would like to know is it possible to configure the FTP adapter in Oracle BPEL 10.1.3.4 to read multiple files (different names, same structure) from a given directory. I do not want the BPEL to do a polling. Instead when I submit the BPEL process it should read all files from the directory.
    I was looking at the option of Synchronous read but I am not able to specify wild card in the file name field. I do not know the file names at the time of reading.
    Thanks for your help!

    Hi,
    While you read the file, you can configure an adapter property in 'Receive'. This will store the filename, this filename can be used for sync read as the input parameter.
    1. Create a message type variable called 'fileheader'. This should be of type Inboundheader_msg (whatever relevant Receive activity).
    2. This variable will contain three parts - filename, FTPhost, FTPPort
    3. Copy this fileheader to 'Syncheader'.
    4. syncheader can be passed as an adapter proerty during sync read of the file.
    During Receive and Invoke, you need to navigate to 'Adapter' tab to choose the created message type variable.
    Let me know if you have further questions.
    regards,
    Rev

  • Read All files in folder and sub folder

    Hi,
    i can read files in a folder,but not in a sub folders.can any one give me idea to read all files in the folder and its sub folder.
    Deepan

    Hi
    This code may help you
    Bliz
    import java.io.File;
    import java.io.IOException;
    import java.io.RandomAccessFile;
    import java.nio.channels.FileChannel;
    public class CountFiles {
          * @param args
          * @throws IOException
          * Count files in <path> and all its subDirs
         public static String path = "c:";
         public static String src1 = "";
         public static FileChannel channel;
         public static int numF;
         public static int numD;
         public static void main(String[] args) throws IOException {
              countFiles(path);
              System.out.println("Number of files :\t"+numF);
              System.out.println("Number of dirs :\t"+numD);
         public static void countFiles(String strPath) throws IOException
              File src = new File(strPath);
              if (src.isDirectory())
                   numD++;
                   String list[] = src.list();
                   try {
                       for (int i = 0; i < list.length; i++)
                        src1 = src.getAbsolutePath() + "\\" + list;
                        File file = new File(src1);
                        try {
                   channel = new RandomAccessFile(file, "r").getChannel();
                        }catch(java.io.FileNotFoundException e){}
                        countFiles(src1);
                   }catch(java.lang.NullPointerException e){}
              else
              numF++;

  • How can i open the directory/​folder and read all the files inside it in order and then close it?

    How can i open the directory/folder and read all the files inside it in order and then close it? any example would be nice.
    thanks

    In the File I/O>>Advanced File Functions Palette is a function named "List directory". This function will give you two arrays. One contains the names of all subdirectories the other the names of all files. If you want to sort them by name use the array sort function. If you want to sort them by another attribute use the File/Directory Info function to get more data. Use a cluster which contains the attribut to sort and the original index of the name. Then sort this array.
    Waldemar
    Waldemar
    Using 7.1.1, 8.5.1, 8.6.1, 2009 on XP and RT
    Don't forget to give Kudos to good answers and/or questions

  • File Adapter and reading all XML files from direcotry

    Problem occurs on PI 7.1
    I defined sender file adapter. File name mask is: "*.xml" to read all XML messages from directory.
    Quality of service is: Exactly One.
    Poll Interval: 30
    Retry interval: 30
    Processing mode: Archive with option "Add Timestamp".
    Processing sequence: by name.
    I though that with above configuration my File Adapter will be reading folder for all coming XML files. But  somehow it is reading XMLs only when I'm activating it in Integration Builder.
    Any idea what can cause such strange problem?

    Hi Tomasz,
    As per my understanding, you need to activate the file adapter for reading the XML files on your directory. Right?
    If that is the case, then the issue might be with the Cache.
    1. Clear the cache from the Integration Builder.
    2. Check in SXI_CACHE whether there are any issues. Click on Delta Cache refresh to find out if there are any cache related issues.
    Thanks,

  • I want read PDF file from SAP directory and create a spool request or print

    Hi all,
    I want read PDF file from SAP directory and create a spool request or print the pdf through SAP. Can any body  help me in this.
    Also please write to me if its possible to open PDF from SAP directory to adobe pdf reader.
    Thanks in advance,
    Sunny

    Hi Sunny,
    Check these links.
    http://www.sapdevelopment.co.uk/reporting/rep_spooltopdf.htm
    http://www.erpgenie.com/sap/abap/pdf_creation.htm
    http://www.geocities.com/mpioud/Z_EMAIL_ABAP_REPORT.html
    http://www.thespot4sap.com/Articles/SAP_Mail_SO_Object_Send.asp
    http://www.sapdevelopment.co.uk/reporting/email/attach_xls.htm
    Hope this resolves your query.
    Reward all the helpful answers.
    Regards

  • I am getting messages that I can't download and read .pdf files since I have the wrong Adobe reader. I know about their security disasters of course, but I downloaded the latest version of Adobe Reader from the Adobe web site and I have other ,pdf file re

    I am getting messages that I can't download and read .pdf files since I have the wrong Adobe reader. I know about their security disasters of course, but I downloaded the latest version of Adobe Reader from the Adobe web site and I have other ,pdf file readers as well, and for some reason they won't work either. I have 5 computers running top end processors and RAM. By this I mean I have one, this one which I am using that has an AMD Phenom Black 3.2 Quad-core with 8 GBs of Corsair top DDR2 RAM, my other two AMD have either an Athlon II triple core with 4 GBs of DDR2 Corsair RAM, one with the Phenom X4 965 3.4 GHz Quad-core with 8 GBs of their best DDR2 RAM, and two Intels with the i7 920 Processors using the triple channel 1366 socket processors and one with 8 GBs of low latency DDR3 RAM and the other with 4 GBs of the same RAM. I am getting the message on this one, which has a fresh install of XP Pro X64 operating system, as do the other 4 as well. I have run Avast Business Pro Anti-virus on this one, which I am getting the message on with a single result which I deleted, and also both Spybot Search and Destroy, which came back clean as well as Malwarebytes Antimalware, which got a lot of tracing cookies now removed, and SuperAntiSpware which also found a few cookies also now deleted. Can you tell me what I need to do to get these files to show as .pdf files rather than as a clean blank page. One other issue is that I wish to know how to turn off my downloads so they are saved and Mozilla will give me the option of returning them instead of me losing them all together as it does now. Thanks for your assistance. If there is another Adobe reader I should download and install, could you provide me with the link to it? I appreciate your assistance here
    == When I download and try to read a .pdf file and when I am asked to turn off all Firefox files and if I do, I lose them since I need to know how to save them without rebooting my computer.

    Brilliant! Problem solved! Thanks so much.

  • How to open and read binary files?

    How do I open and read Binary files?

    Did you  look on The Unarchiver's web site where it has a link to older versions? http://theunarchiver.googlecode.com/files/TheUnarchiver3.2_legacy.zip
    The best thing to do is ask your friends what programs they used to produce these files, or at least what format files they are producing.  Otherwise it's like being shown a car and given a bundle of 200 keys with no idea to which one to use, or even if any of them work with that car.
    Using The Unarchiver will likely not do anything because it too will not know what format files are involved, and they may not even been in an archived format.  If they sent you a Word file without telling you (a favorite of Windows users to do  -- it drives me crazy when they could have just sent them in plain text), The Unarchiver won't open them.  If it's a picture file then using Hexedit will just show you a bunch of unintelligible stuff as shown in an earlier post, though you may see a line of text providing a hint.
    As I said earlier, often .bin may be an executable program which needs another program to actually interpret it.  That's what Java is trying to do.  Still, it may think it can execute the file, but it is highly unlikely somebody would send you an executable program (and if they did I would not trust it).  For all you know it may be a Windows virus.

  • I upgraded to Firefox 8.0.1 on may MacBook OSX 10.6.8 and not all files I download from my Outlook Web app are ashx file, not what they were originally sent in. How do I change this ?

    I have just upgraded to Firefox 8.0.1 on may MacBook (10.6.8) and now all files I download from my Outlook Web app labeled attachment.ashx, not as they were originally sent (.docx, .pdf, etc). How do I change this back?

    I'm going to back up to Firefox 7 next. I'm getting the drift that 8.0.1 and/or flash on 8.0.1 are unstable.
    I'll let you know if that solves this problem.
    NOPE. This had no effect.

Maybe you are looking for

  • Ipod nano running app crash

    I use an ipod nano 7th generation for running, mainly. Recently, the Nike+ app crashes with a full reboot of the ipod. The run is not saved and not transferred to Nike+ webpage. The crash happens typically right after an announcement of a completed k

  • What is "Remediate non Compliant Rule when supported" and how to use it ?

    Hi,  now i have created around 10 baselines for the driver compliance check for different make and model of laptops and desktops, the os platform on the computer will be Win 7 X 64 computers  CI's working fine and iam curious to learn what is the  (

  • Jazzy lines appearing just prior to login on Mac Mini

    I just restarted my Mac mini (for no reason other than to give it a reboot) and I noticed a strange selection of varying jazzy lines, which flashed just prior to the login screen, the screen then cleared & my login screen was displayed. I could log i

  • Problem with ELSTER and Business Connector

    Hi all, it's not really a XI Problem but I have no idea, where I could post it anyway. For LSTA/LSTB with Elster we installed the Business Connector. To check the settings we execute the report RPUTX7D0. The following error message appears: Testrepor

  • Unable to open any OBIEE links

    Hi, I installed OBIEE 11.1.1.5.0 into dev server environment after working few days in RPD we got issue to connect the web aplications. i.e., We are unable to open any OBIEE links like:http://xxx.xx.xx.xx:7001/analytics http://xxx.xx.xx.xx:7001/em ht