Extracting contents of html -pattern matching

Hi all,
I've been reading through past posts but can't quite find the solution to my problem. I'm trying to adapt the code below to deal with poorly formed html (i.e. < title>, < TitLe >, etc). The html is written to a text file and I want to extract the content of the <title> tags.
Any help would be appreciated.
David.
private void showTitleTag(String file) { // extract <title> tag content from HtmlFile.txt
        try {
          BufferedReader fileInput = new BufferedReader(new FileReader(file));
          String fileLine;
          String keepLine="";
          String title;
          int titlePos;
          int endTitlePos;
          int endOfTitlePos;
          boolean foundTag=false;
          while((fileLine = fileInput.readLine()) != null) {  // While not at end of text file
            titlePos = fileLine.indexOf("<title>");         // check for <title> tag (Change for tokenizer)
            endTitlePos = fileLine.indexOf("</title>");     // check for </title> tag
            if (titlePos>=0) {                              // if found
               foundTag=true;                               // set foundTag flag to true            
            if (foundTag) {
              keepLine+=fileLine;                             // append fileLine to keepLine
            if (endTitlePos>=0) {
               break;
        titlePos = keepLine.indexOf("<title>");
        endOfTitlePos = keepLine.indexOf("</title>");         // get location of </title>
        title = keepLine.substring(titlePos+7,endOfTitlePos); // dispose of </title> tag
        System.out.println(title);                            // print title content only
      catch(Exception e) {
} // showTitleTag

Hi all,
I've just found out that the code used below can cause a text file to corrupt. I'm not sure why but it's something to do with the FileChannel, ByteBuffer, CharBuffer section.
I thought I'd let you know, in case others experienced the same problem.
Here's the code I was using:
If anyone knows how to fix the problem, I'd be very interested.
Cheers,
David.
   //Shows title tag content of a given url
   private void showTitleTag(String file) {
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(file);
      // The following three lines seem to cause the text file to corrupt intermittently
      FileChannel fc = fis.getChannel();
      ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
      CharBuffer cb = Charset.forName("8859_1").newDecoder().decode(bb);
      Matcher m = Pattern.compile("<title\\s*>(.*?)</title\\s*>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL).matcher(cb);
      if (m.find()) {
         String title1 = (m.group(1)).trim().replace('\n', ' ');
         String title = title1.replace('\r', ' ');
         System.out.println(title1);
         titlesList.addItem(title1);
         titlesList.setSelectedIndex(0);
      else
         System.out.println("No title found.");
    catch (Exception e) {
      e.printStackTrace();
    finally {
      try { fis.close(); }
      catch (Exception e) {}
   } // end showTitleTag

Similar Messages

  • Programatically extract x,y posisition found by Vision Builder pattern match

    I would like to write the cordinates found by doing pattern match to a file.  Can this be done with the Vision Builder?

    In the following we proceed to function block search pattern extracted in the previous process (the parameters as rotation angle and minimum score is inserted into SETTINGS control), extract the output of the search function to get the position values indicators that will be displayed on the front panel)
    Atom
    Certified LabVIEW Associate Developer

  • Extract URL from HTML text

    Suppose you have the following String that is body text with HTML.
    String bodyText = " My name is Blake. I live in New York City. See my image here: <img href="http://www.blake.com/blake.jpg"/> isn't my picture awesome? Tata for now!"
    I want to extract the URL that contains the location of the image in this bodyText. The ideal would be to create a function called public String extractor(String bodyText) to be used
    String imageURL = extractor(bodyText);
    //imageURL should be "http://www.blake.com/blake.jpg"
    My first thoughts are using reg exp, yet the place i would find to use that would using the .replace in String class. I am by no means an expert on reg exp so I haven't taken too much time to try to figure it out with reg exp. I obviously could do a linear search of the bodyText and do a ton of if statements, but thats just poor coding. I want see if anyone came across this or has insight to this problem.
    Thanks for all the help,
    Blake

    How would the regexp change if there were multiple img tags within the String.I don't rightly know, I'm just a raw beginner in regexes.
    Would this regexp return all the img URLs found in the String.No, as it stands it would return only the last URL. But this will:String bodyText = " My name is Blake. " +
          "I live in New York City. See my image here: " +
          "<img href=\"http://www.blake.com/blake.jpg\"/>" +
          " isn't my picture awesome? Here's another: " +
          "<img href='http://www.blake.com/Vandelay.jpg'/>" +
          " Tata for now!";
    String regex = "(?<=<img\\shref=[\"'])http://.*?(?=[\"']/?>)";
    Pattern pattern = Pattern.compile (regex);
    Matcher matcher = pattern.matcher (bodyText);
    while (matcher.find ()) {
       System.out.println (matcher.group ());
    }Note the enhancement that takes into account that both single and double quotes are legal in HTML. But unlike the earlier example, this does not tolerate more than one space between <img and href=, I couldn't find a way to achieve that.
    Visit this thread later, there are some real regex experts around who may post a better solution.
    db

  • Problem while using color pattern matching

    Currently we are doing projects on real time object tracking where we found one doubt that irrespective of the object size whether this color pattern match works or not . My questions are as follows:
    1. Whether it is applicable for objects moving far . Because as it moves far, the size of the object decreases such that the color pattern matching is not working what will be solution since we must use color image
    2. What is the difference in using scale and  rotate invarient in color pattern matching
    3. How we can effectively decrease the ROI depending upon the object position as per below attached screen shots .
    we have removed boundary box values of X and Y coordinates at the four corner but we can't track as the object moves far away or we can't decrease the ROI as the object moves far.
    4. whether it is possible to see the value of particular pixel  in LABVIEW vision development module as we seen only the coordinate position . whether it is applicable to see particular pixel value. Guide us
    please, see the below screen shots and provide the solution how effectively decrease or increase  the ROI depending on objects position using color pattern match
    Attachments:
    problem in matching while object moves far.png ‏515 KB

    Hello,
    I have not been using the color pattern matching a lot (especially not in real-time). But since the pattern matching considers only small scale changes, you could try updating the color template every n-th iteration (depending on your setup and requirements). The major problem is the template size, since the color pattern matching tends to take quite a lot of time in learning the template. You would of course need to come up with some idea on how to change the subimage size, where the new template will be learned.
    This is the part of coarse (rough) object detection as was suggested by MoviJOHN. For example, if your object is distinctly red, you can extract the green channel from your rgb image and use threshold to roughly find the object and apply the new ROI - template.
    So:
    1. learn the template,
    2. use pattern matching with bounding rectangle (ROI) for the next couple of frames (you would need to experiment here where the detection fails -> how fast can you move the object away so that the detection fails),
    3. Before the detection fails -> rough object detection with some padded bounding rectangle (new ROI),
    4. Re-learn te template of new ROI and go back to 2.
    Again, the biggest issue is the template learning time - if you have a high resolution camera and the template is large, this won't satisfy your real-time application.
    You should set up the appropriate illumination first. The resolution is also important, since your object is moved back and forth (but the resolution will have a direct impact on the template learning time).
    Best regards,
    K
    https://decibel.ni.com/content/blogs/kl3m3n
    "Kudos: Users may give one another Kudos on the forums for posts that they found particularly helpful or insightful."

  • Vision assistant steps to be followed for pattern matching

    I am acquiring color images of hands movement using web camera of laptop.
    I want to process the acquired images to use for pattern matching.
    What are the steps to be followed to achieve the above mentioned task.

    In the following we proceed to function block search pattern extracted in the previous process (the parameters as rotation angle and minimum score is inserted into SETTINGS control), extract the output of the search function to get the position values indicators that will be displayed on the front panel)
    Atom
    Certified LabVIEW Associate Developer

  • Pattern matching in String

    Hi,
    I want to do pattern matching using String. Here is my requirement.
    String file_name = (String)hash.get("DOCNAME"));
    file_name = file_name.replace("'","285745@");
    So, whereever I have '(apostrophe) I will replace it with pattern "285745@" and then at another jsp where I get this request parameter I do reverse as follows:
    String docname = (String)request.getParameter("doc_name").replace("285745@","'");
    Now I know replace function is not going to do this. It is just a indicative, for you to know what I want to achieve. which other java function / method i can implement to get the desired result.
    thanks,
    pp

    String file_name = (String)hash.get("DOCNAME"));
    file_name = file_name.replace("'","285745@");The problem here is that String.replace() operates only on char arguments, you cannot replace entire substrings with it.
    The String.replaceAll() method, on the other hand, operates on regular expressions. In many common cases (those in which the substring you want to find contains no characters with special meaning to the regular expression processor) you can use it exactly as you would String.replace() except that it operates on substrings.
    But regular expressions are much more powerful than that. The javadoc for the "Pattern" class has some information on how to use them. There is also a tutorial at http://java.sun.com/docs/books/tutorial/extra/regex/intro.html which you might find helpful.
    In the 1.4 edition of Java there is no longer any need to screw around with while loops and StringBuffers. Nearly any text processing operation can be done with regular expressions.

  • How do I extract content

    I need to find a way in which to extract the content from a
    web site that is located in an edible region which I've named
    "Body." It doesn't matter if the content is extracted as an HTML
    file, a text file, or another form. I just need to be able to
    easily get at that specific content in order to create a Word
    document using the same content. Is there a simple way to do so
    without having to manually extract that information from each html
    page? FYI, I'm using Dreamweaver 7.
    Thanks.

    quote:
    Originally posted by:
    Newsgroup User
    >I need to find a way in which to extract the content from
    a web site that
    >is
    > located in an edible region which I've named "Body."
    Yummy! Edible HTML!
    Too funny!
    quote:
    Are you trying to get the content from all your pages and put
    it all into
    one word document? If so, that's not something DW does. You'd
    likely need
    some sort of web spider software to handle that.
    FYI, Word can open HTML files whole. So that may be an option
    too.
    Yes, that's exactly what I'm trying to do - any suggestions
    for a freeware software application?
    Thanks.

  • Color pattern matching is very slow

    Hi
    I tried this code creating one vi application.
    After the testing with USB webcam I have realized that the color pattern matching is very slow. How to increase the speed and to work smoothly in real time.
    Thank you

    Hello tiho,
    the color pattern matching is not as fast as 8-bit matching, but should still be fast.
    For example, I am attaching a VI for color pattern matching where you load the image, create the template and search do the matching.
    In my example I tried color pattern matching on color image of size 4288x2848 pixels and the matching is performed in ~140 ms (~7Hz). So, for a smaller image, I think the real-time processing is quite achievable (I consider real-time 20 Hz or more). The only problem is the template learning, which in my case takes around 10 seconds. But you should learn the template only once in the initialization stage.
    Best regards,
    K
    https://decibel.ni.com/content/blogs/kl3m3n
    "Kudos: Users may give one another Kudos on the forums for posts that they found particularly helpful or insightful."
    Attachments:
    color matching.zip ‏49 KB

  • Saving Pattern matching information to an array

    I am using imaqLearnPattern function to create a template image. According the IMAQ documentation, the pattern matching information is "appended" to the image. But when I try to call the imaqImageToArray function on this template image, it does not give me any pattern matching information. Is there any way, that I can save the template data into an array? I do not want to use the imaqWriteVisionFile function because I want the data in an array format and not saved in a file.
    Thanks.

    Normally you cannot save this information without this function. This information saved in *.png file. This format have a possibilities for saving "user data", and this fact used by IMAQ Vision. Theoretically you can extract this information from *.png into array (format of png is a well known format), but what can you do with this information afterwards? You cannot load this information separately without IMAQ Read Image and Vision Info.vi, because (pretty sure) this function allocated memory for Vision Info before loading, but you not able to do this. You can make this only if you know internal representation of IMAQ image in memory. How organized common parameters, such as width, height, pixel pointer, resolution, linewidth - not very complicated (IMAQ Image -
    a cluster of string and pointer to appropriate structure), but where placed vision info - not very easy.
    Better, fastest and easyest way - to use IMAQ Write Image and Vision Info.vi.
    with best regards

  • Pattern matching for LCD function test

    Hello,
    I am writing a VI that is taking a single picture from a camera and then compares it with a template extracted before that.
    The camera is looking at LCD display and and the purpose of the matching is to find whether there is a dead pixel or sth wrong at the display. The problem is that the comparison doesn't work as expected.
    I will illustrate this with just one examples with picture attached:
    - At step 1. the template is created. (you can see RUN 2 at the top left corner at the display) 
    - At step 2. pattern matching is performed with the same mode of the LCD but a little bit tilted(score - 972 achieved)
    - At step 3. the mode is changed (you can see RUN 3 at the top left corner at the display, everything else is the same) - but however a bigger score of 974 is achieved
    Any explanation of this and some suggestions that can improve my program !?
    Thanks a lot
    Iliya G. 
    Attachments:
    1step.JPG ‏43 KB
    2step.JPG ‏42 KB
    3step.JPG ‏38 KB

    Hi David,
    I went through this OCR examples that you mentioned but in fact I find them very annoying because in many cases the output string depends on how you draw your ROI. Please, find an example below and could you please tell me if i am doing sth wrong, but in fact i don't think so.
    When I try to read the characters separetely, everythings goes ok - it finds r,u,n and 1.
    But when I make the ROI bigger in order to contain all the characters, the output is the one at the picture. Any ideas?
    Thanks,
    Iliya 
    Attachments:
    1.JPG ‏21 KB

  • Pattern matching help

    Hi!
    I'm new to pattern matching and need a little code to get me started. I currently have it where the user enters a name and it compares that name to a string pulled from a file using String.indexOf(). If it does not match, it pulls the next name string from the file and trys again until it finds a match or reaches the end of the file.
    For example, if I enter "hn smith" it will match to "john smith" in the string pulled from the file.
    This works ok, but it would be a lot nicer if the user could enter in
    jo.n smith or jo* smith
    where it can match parts of the phrase instead of the exact phrase. I have done a slight bit of this in perl, but I am having a harder time in java.
    I would appreciate any help. Thank you!

    that's good when you know it, but when you are
    starting out it is nice to have an example to work
    with, one that is explained well, unlike many of the
    sites I have visited.http://java.sun.com/docs/books/tutorial/extra/regex/index.html
    The very first result from Google is a tutorial that explains regular expressions.

  • How to use Colour pattern matching with a webcam

    Hi,
    I have a web cam which I am able to use successfully in labview (i.e.. Get images)
    I have looked at the colour pattern matching examples and tried to modify them, so that I can detect a red spot, that can be seen through the webcam, but have been unsuccessful. 
    In essence I'm trying to do real time colour pattern matching
    Can anyone steer me in the right direction? Or help me out?
    Thanks 
    Solved!
    Go to Solution.

    Hi kr121,
    I'm trying to work on color myself right now.
    What have you tried so far?  What type of web camera are you using?  I'm using a Microsoft Life Camera with LV 2011 on Windows 7.
    I started here:  http://zone.ni.com/devzone/cda/epd/p/id/5030
    If you are not using an NI camera I was able to get this to work using the cmd prompt and extracting the files manually to at least run the NI-IMAQ for USB: Snap and Save Image with USB Camera and NI-IMAQ for USB: Grab and Save Image with USB Camera examples.
    The command prompt command is:
     ni_imaq_usb_installer_86.exe /x
    Don't know if this is 100% correct but it at least allowed me to capture images and avi's.
    Regards,
    -SS

  • Pattern matching in Strings

    Hi,
    I need some help using pattern matching in strings .. this is what i need to do ..
    if given a string of this format
    String tempNotes="07/07/05 3:42 PM - 65. Java forum test 07/01/05 5:11 PM - 62. Trying regualt Expressions";I need to extract the number(s) after the time .. in the above case would be 65,62 .
    The string might have more than one line of the above format .. can some one help me with this .
    I tried using regular expressions .. I am pretty new to Regex's tried this
    String regex="\\d(2)/\\d(2)/\\d(2)\\s\\d+:\\d(2)\\sP|AM\\s-";
    Pattern p= Pattern.compile(regex);
    Matcher m1 = p.matcher(tempNotes);
    if(m1.find()){
    System.out.println("Num = "+tempNotes.substring(m1.end()+1,m1.end()+3));
    } I am totally lost .. can someone help me with this please. I need to extract all the numbers after the time .
    Thanks in advance.

    I see two major problems with that regex. First, you're using parentheses where you should be using braces - "\\d{2}", not "\\d(2)". Second, you need to need to limit the scope of the alternation: "(?:P|A)M", or better, use a character class instead: "[PA]M". As it is, the vbar is splitting the whole regex into two alternatives. Also, you can use a capturing group to extract the number.
      String regex="\\d{2}/\\d{2}/\\d{2}\\s\\d+:\\d{2}\\s[AP]M\\s-\\s+(\\d+)";
      Pattern p= Pattern.compile(regex);
      Matcher m1 = p.matcher(tempNotes);
      while (m1.find()) {
        System.out.println("Num = " + m1.group(1));
      }

  • RegEx, need help with pattern matching

    im going thru a list of Strings...and id like to match some input to it..but the tutorial for regex wont let me find a smaller string within a bigger one if it exists
    for example i have a String "java.sun.com" and i want to find "sun" or "java" or "com" or "jav" or ".co"
    i think the only way regex will work is if i group the entire thing into
    any ideas on how i can manipulate the string into a proper regex pattern so that itll find any of those "searches"
    thanks

    No, that is not correct. A regex can be constructed to return a match on anything you want. A single character, a newline character, a numeric character and any combination of them. There are limitless possibilities for pattern matching.
    See here:
    http://java.sun.com/j2se/1.4.1/docs/api/java/util/regex/Pattern.html
    Any of the patterns may be compiled into a regex for searching using the matcher.
    An alternative is to use the indexOf method of the string class to find what you are looking for.
    Example:
    String myString = "java.sun.com";
    String matchThis = "n.co";
    int patternFoundAtThisIndexPosition = myString.indexof(matchThis);patternFoundAtThisIndexPosition will be 7;
    or simply:
    int index = myString.indexof("sun");index will be 5;

  • How to accelerate the processing time of pattern match

    How to accelerate the processing time  of pattern match
      If my camera acquire an image in 50ms, but when we analyze this image with pattern match, we need about 300ms. Except using queue to seprate acquisition
    and analysis to different loops, there is another way to directly accelerate the processing time  of pattern match ? Maybe compact rio with FPGA ?
    But I saw a paper in NI web, there is no this function in c-rio ?  
    Any suggersion will be very apperciated ~
    Ricky
    Attachments:
    aa.png ‏65 KB

    Hello,
    what matching algorithm are you using? You can reduce time using the Gaussian pyramids algorithm.You can fine tune the parameters using the advanced options.
    What is your template size/image resolution?
    I suppose using a computer with better performance would also work...
    Best regards,
    K
    https://decibel.ni.com/content/blogs/kl3m3n
    "Kudos: Users may give one another Kudos on the forums for posts that they found particularly helpful or insightful."

Maybe you are looking for

  • After 2.0 "Iphone cannot connect due to software not installed" error

    Hey everyone, I have an original iphone 8G running it on XP (which is the first problem im sure) I have never had a problem updating untill now. I just updated to 2.0 It took about 20 min to do the whole update process. When it finished, i got an err

  • Stored procedure - insert clob obj - error msg: ORA-01460: unimplemented

    Hi all, I have a situation where I want to insert a clob object to my local table via a stored procedure. The clob object stores large amount of text. The clob data is populated from retrieving content in an external text file. When executing an inse

  • Message content not visible in Outlook&OWa but visible in smartphone

    Hello, We experienced very strange issue last time. One user received properly message on the smartphone and also in Outlook 2007 client, but the thing is that message content (internal text) is visible only on phone and there is blank message in Out

  • Final cut - multiclip angles ~Help!

    Hey all ~ I am at the tail end of a video project - there are 8 video sequences involved.All are finished except for one stubborn sequence in which the multiclip within the sequence will not show all four angles when I'm playing it, thus I can't cut

  • UML from JavaDocs

    Hey, does anyone know of a tool that I can use to convert existing java docs into some sort of UML diagrams? Within my company there is a large (ridiculously large) amount of code which has no UML attached to it. Is there any tools that can transform