Help with regex pattern matching.

Hi everyone
I am trying to write a regex that will extract each of the links from a piece of HTML code. The sample piece of HTML is as follows:
<td class="content" valign="top">
     <!-- BODY CONTENT -->
<script language="JavaScript"
src="http://chat.livechatinc.net/licence/1023687/script.cgi?lang=en&groups=0"></script>
<a href="makeReservation.html">Making a reservation</a><br/>
<a href="changeAccount.html">Changing my account</a><br/>
<a href="viewBooking.html">Viewing my bookings</a><br/>I am interested in extracting each link and the corrresponding text for that link into groups.
So far I have the following regex <td class="content" valign="top">.*?<a href="(.*?)">(.*?)</a><br>However this regex only matches the first line in the block of links, but I need to match each line in the block of links.
Any ideas? Any suggestions are appeciated as always.
Thanks.

Hi sabre,
thanks for the reply.
I am already using a while loop with matcher.find(), but it still only returns the first link based on my regex.
the code is as follows.
private static final Pattern MENU_ITEM_PATTERN = compilePattern("<td class=\"content\" valign=\"top\">.*?<a href=\"(.*?)\">(.*?)</a><br>");
private LinkedHashMap<String,String> findHelpLinks(String body) {
    LinkedHashMap<String, String> helpLinks = new LinkedHashMap<String,String>();
    String link;
    String linkText;
      Matcher matcher = MENU_ITEM_PATTERN.matcher(body);
      while(matcher.find()){
        link = matcher.group(1);
        linkText = matcher.group(2);
        if(link != null && linkText != null){
          helpLinks.put(link,linkText);
    return helpLinks;
private static Pattern compilePattern(String pattern) {
    return Pattern.compile(pattern, Pattern.DOTALL + Pattern.MULTILINE
        + Pattern.CASE_INSENSITIVE);
  }Any ideas?

Similar Messages

  • Problem with regex patter/matcher

    Hello, I used some code I found in the tutorial and the forums to accomplish some html pattern matching. I'm just now learning regex and I can't figure out how to find each occurance of my pattern. Here's the code.
    import java.util.regex.*;
    public class Parser {
         public Parser() {
         public static void main(String[] args) {
              String INPUT = "<tr><td>1st cell</td><td>2nd cell</td></tr>";
              String REGEX = "<td>.*</td>";
              Pattern p = Pattern.compile(REGEX);
              Matcher m = p.matcher(INPUT);
              int count = 0;
              while(m.find()) {
                   count++;
                   System.out.println("Match number " + count);
                   System.out.println("start(): " + m.start());
                   System.out.println("end(): " + m.end());
    }output is:
    Match number 1
    start(): 4
    end(): 38I'm looking for a match for each cell, not the outermost match... I hope that made sense.
    Thanks for the help!

    sabre, i guess your hint is nicer, uh ? ;-)I prefere your regex because I like a closed form of
    termination condition rather than the open .*. I
    suspect that both will work OK for the OP.Funny, I was about to sugest the same you sugested when I hit your post, I prefer the one wiht the lazy search.
    Also, just for fun, I would sugest to do a more generic one with:
    "<(.*?)>(.*?)<\\1>"
    and just because it is fun:
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * HtmlParser.java
    * version 1.0
    * 14/07/2005
    * @author notivago
    public class HtmlParser {
        public static void main(String[] args) {
            new HtmlParser().parse( "<tr><td>1st cell</td><td>2nd cell</td></tr>" );
        private Pattern pattern = Pattern.compile("<(.*?)>(.*?)</\\1>");
        public void parse( String input) {
            Matcher matcher = pattern.matcher(input);
            while( matcher.find() ) {
                show(matcher);
                parse( matcher.group(2) );
         * @param matcher
        private void show(Matcher matcher) {
            System.out.println( "Tag: " + matcher.group(1) );
            System.out.println( "Content: " + matcher.group(2) );
            System.out.println( "----");
    }

  • Help with regex. How to escape "|"

    Hi, this is my code:
    String line = "00001740 00 a 01 able 0 003 = 04866033 n 0000 = 05262099 n 0000 ! 00002062 a 0101 | (usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"  ";
    //I want to capture with group(0) only the first half og the line, that preceeding "|".
    //Could anyone help me figure out why this piece of code does not work ?
    String patternStr = "(.*?)|(.*?)";
        Pattern pattern = Pattern.compile(patternStr);
        Matcher matcher = pattern.matcher(line);
        boolean matchFound = matcher.find();
        if (matchFound){
             relLine  = (String)matcher.group(0);
             System.out.println("LINE " + relLine);
             } else {
            System.out.println("ATTENTION MATCH WAS NOT FOUND FOR LINE " + line);
                  }

    "\\|"You need to use the backslash.
    However, since backslash is also a special character in a Java String literal, you need two of them. The first one gets eaten by the compiler, escaping the second one, which is used by regex to escape the pipe.

  • I need help with the shape matching of this image

    I am currently using NI Vision Assisstant to help me with the logic for the VI code. So currently I have 3 diagonal rectangles and I cannot figure out how to find the shape of each of them. Attached is a screenshot of the image I have. I have it going through a Luminosity filter and then a frequency filter and then I converted it to a binary image... now I am at this point: If anyone could help me find the rectangles... that would be greatly appreciated. I am sorry for such a basic quesiton.
    Thank you,
    Yousuf M. Soliman
    Attachments:
    Test.png ‏19 KB

    Hello Yousuf,
    There are quite a few functions that you might find useful for locating rectangles in a binarized image.  Possibilities include Shape Matching, Pattern Matching, Geometric Matching, Shape Detection, and more.  The difficulty that you will run into is the fact that your retangles are not straight edged, but are also not curved edges.  Many of the shape matching tools look for straight edges and right angles, which are not present in your image.  As such, it may be easiest to first use edge detecting tools to locate your edges, then use measurement tools to calculate area or distance.
    I played around with your image briefly, and was able to localize the edges to within an approximation.  I used the Clamp (Rake) tool to achieve the edge localization seen in the attached screen shots.  As you can see in ClampRect.png, the edge lines drawn aren't terribly accurate, but the other images show the points used, and with some playing around you should be able to find a better approximation of the rectangle edge.  Hope this helps!
    Patrick
    CLA
    Attachments:
    ClampRect.PNG ‏47 KB
    ClampLongEdge.PNG ‏55 KB
    ClampShortEdge.PNG ‏47 KB

  • Help for image pattern matching

    Hello Everyone
    I am working for my last year project. In my project I will work on the image processing to find a moving object. I will work by JMF. I have finished to grab a frame from the webcam video clips. Now I need a algorithm to find a Image pattern from the grabed image. But I donot know which algorithm is fine for image pattern matching as well as how can I implement in java. Is anyone know please help me very urgently.
    Thank you
    Md. Mainul Hasan

    If you would like to take a look at http://www.exactfutures.com/index01.htm and http://www.exactfutures.com/index02.htm and http://www.codeproject.com/useritems/activity.asp then these pages and links may well be useful to you. It may not be exactly what you are looking for, but it does point to some examples with source for video analytics, and at the very least they illustrate how to capture & handle the data including a fast movement detection algorithm. If you want to find a specific shape then search the internet for information on chamfer distance transforms - one can use JMF or extend these simple examples to apply those techniques.

  • Large size variations with IMAQ Pattern matching?

    Does the pattern matching functions work for only +-5% size variations? Which means that the pattern matching is made for static situations only? (With static I mean a static camera watching e.g. a moving assemblyline)
    I have a scenario where the camera is moving in 6DOF, giving my fiducials very much slant and very large size variations.
    Is it then not possible to use the pattern matching of IMAQ?
    The "IMAQ Vision Concepts Manual" says:
    "Because pattern matching is the first step in many machine vision
    applications, it should work reliably under various conditions.
    In automated machine vision applications, the visual appearance of
    materials or components under inspection can change due to factors such
    as ori
    entation of the part, scale changes, and lighting changes. The pattern
    matching tool maintains its ability to locate the reference patterns despite
    these changes."
    -But with my experience, this is not correct in my scenario. Actually, the pattern matching tool was not able to find a match in any of my tested images. My size variations were large in these images (probably 50-200%).
    Thanks!

    Unfortunately the pattern matching algorithm NI currently uses is not a geometric (scaleable) pattern matching algorithm. The current algorithm works despite orientation, and some lighting changes, but not scale changes.

  • HT204406 Hello, I need help with my iTunes match

    Helo, I need help with iTunes Match.
    I signed up for iTunes match.  It did its thing and now it is supposed to have every one of my songs in the cloud.
    I checked for Ring my bell (Which I purchase from iTunes) and it is not on my iTunes on my computer anymore?
    I did match to give my 4S with limited memory and over 400 songs the ability to put all songs in the cloud, then by connecting to my computer, somehow I can tell my phone to not sync with match, the re sync and all the songs would be deleted from my phone and then I could down load any songs I wanted on my phone, thus giving me a bunch more memory.   I could not figure out how to do this? 
    Can you please help me.  Thank you.

    You probably won't like this suggestion, but I suggest you reinstall Lion.
    First, backup your system. Next, reboot your system, press/hold the COMMAND-R keys to boot into the Recovery HD. Select 'Reinstall Mac OS X'. If you purchased Lion as an upgrade to Snow Leopard, the reinstall process will install Lion 10.7.3. If your system came preinstalled with Lion, you might still get Lion 10.7.2. Both installs are a total install of the OS. None of your apps or data will be impacted. Just the OS.

  • Help with regex

    Hi,
    I have an expression like this
    <select name="contact_list">
    <option value="SS109445168429566">Mark
    <option value="SS109445173826096">Keith
    </select>
    this is not in its entire state, i have trimmed the expression to make it a bit readable
    no what i want is both the options value to be extracted and stored in a value i am using the pattern like this to do the same
    pattern="(?s)(?i)<option value="(.*?)""
    but i am always getting th first value only, whereas i want both the values.
    Can anyone help me on this.
    Thank you in advance

    How are you using the regex?  Pattern p = Pattern.compile("(?is)<option value=\"(.*?)\"");
      Matcher m = p.matcher(input);
      while (m.find())
        System.out.println("value is " + m.group(1));
      }That should work.

  • I need help with Regex, please !

    Hello,
    I would like to build the following expression: I want matcher a succession of words (e.g. separated by 1 space), but not containing a specifc word.
    Let's me give you an example to be more explicit:
    String regex = "([\\S]+ )+"; that exprisses a succession of words, but I want to supplement my regular expression (regex) to indicate that I don't want the word "MANU" for example.
    string input = "TUTU TATA TITI MANU TOTO TITI TATA MANU"
    I would like the following results:
    "TUTU TATA TITI MANU"
    "TOTO TITI TATA MANU"
    but not these one:
    "TUTU TATA TITI MANU TOTO TITI TATA MANU"
    Thank you for your answers

    Right, I forgot that the matcher would bump along and try "(?!MANU\\b)\\S+" at the second letter of each group. The solution is to use \w instead of \S, and anchor the match at the beginning of the word with \b:import java.util.regex.*;
    public class Test
      public static void main(String[] args)
        String target = "TUTU TATA TITI MANU TOTO TITI TATA MANU";
        Pattern p = Pattern.compile("(?!MANU\\b)\\b\\w+(?: (?!MANU\\b)\\b\\w+)*");
        Matcher m = p.matcher(target);
        while (m.find())
          System.out.println(m.group());
    }

  • Help with RegEx and Textinput

    I have the following method:
    public function handleKeyPress(event:KeyboardEvent):void {
    //trace("Key pressed: "+event.keyCode+","+event.charCode);
    var testTest:String = cComponent.cInput.text +
    String.fromCharCode(event.keyCode);
    trace("Text tested: \""+testTest+"\"");
    // See if the user typed '/' followed by a number and a
    space
    if(testTest.match(new RegExp("\/[0-9]*\s"))) {
    // User typed '/### '
    // TODO: Get bpname and do something here...
    cComponent.cInput.text="";
    event.stopImmediatePropagation();
    Basically, I need to match on the pattern "/### ", where ###
    is any series of numbers, do something, then clear the TextInput,
    but I can't seem to get the RegEx right. I have tried several
    different ones, including '\/\d*\s' and '\/[0-9]* ', but nothing
    seems to work.
    On top of this, the call to cComponent.cInput.text="" doesn't
    wipe the value of the TextInput field either.
    Can someone point me in the right direction?

    This got me mostly working. It's still matching on all
    strings that start with / then some number, but I still cannot get
    that TextInput to clear. From the code I call:
    cComponent.cInput.text="";
    It doesnt throw an error, but it doesn't clear the text
    either.

  • Help with REGEX to block invalid characters

    I have a regex that is used to block unusual characters from being entered into a user name, so they can put pipes etc in there, I just want 0-9 and a-z (upper or lowercase), but I just noticed that it's not working. I am not up to speed on regex, I took this from somewhere else
    here is the expression:
    <cfif len(trim(ReReplaceNocase(form_username, '^[A-Za-z][A-Za-z0-9_]*', '', 'ALL'))) gt 0>
    It is failing when I enter 2kljlkll3456 as the username
    Anybody have any idea why it's not working?
    After posing this I found out that the issue is that it does not allo me to have a username that starts with a number, only a letter, anybody have any idea how to fix that?
    Thanks
    Mark

    Hey Dan,
    I found a link that explained how the regex is actually formed which helped!
    http://stackoverflow.com/questions/336210/regular-expression-for-alphanumeric-and-undersco res
    Now I have managed to get a basic understanding of how they are formed the fix was easy
    I had:
    <cfif len(trim(ReReplaceNocase(form_username, '^[A-Za-z][A-Za-z0-9_]*', '', 'ALL'))) gt 0>
    But should have had
    <cfif len(trim(ReReplaceNocase(form_username, '^[A-Za-z0-9_]*', '', 'ALL'))) gt 0>
    Simply removing [A-Za-z] from the start fixed it. I get it now ... so the first section defined the first character which was restricted to A-Za-z only.
    I'll mark this as answered
    Thanks
    Mark

  • Further help with regex

    Hi,
    I want to detect the presence of "fromCharCode" in a String.
    But only when it is not preceded by "String." and not followed by "(34)".
    I have already managed to do the "String." with this regex: (?<!string\.)fromCharCode --> uses the negative look behind
    This will detect the "fromCharCode", but not when it is preceded by "String.".
    But I don't know how to do the "(34)" ?
    Can somebody help me with this?

    thanks, but the problem with this regex now is that "fromCharCode" is not detected in the String fromCharCode(34) and the String String.fromCharCode.
    The text "fromCharCode" must only be detected when not preceded by "String." AND not followed by "(34)".
    Can you do this?

  • Please help with regex

    Hi
    I want regex for following condition:(case insensitive)
    1)First letter must be always an alphabet(case insensitive).
    2)It should be maximum of 64 characters.
    3)Rest all letter can be number alphabets and underscore.(case insensitive)
    4)Defines an inverse domain name like "com.sun.yyyy.xxxx". Restricted to max 20 dot-separated identifiers of max. 30 chars each.
    5)After . there must not be number for eg com.sun.1awt is not valid.

    Not properly tested, but here goes:
    // some tests
    String[] tests = {
      "com.sun.yyyy.xxxx",
      "com.sunsd.yy435yy.xxxx",
      "com.sun.yyyy.5xxxx",
      "com.sun.yyyeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeey.xxxx",
      "com.sun.yyyy..xxxx"
    // An atom must start with a letter follwed by 0 to 29 numbers, letters or underscores
    String atom = "[a-z][a-z0-9_]{0,29}";
    String regex =
      "(?i)"+                  // ignore case
      "(?=^.{1,64}$)"+         // must be between 1 and 64 characters
      atom+                    // start atom
      "(?:\\."+atom+"){0,19}"; // followed by 0 to 19 atoms with a '.' in front of it
    for(String t: tests) {
      System.out.println(t.matches(regex)+" -> "+t);
    }

  • Regex Pattern Match in an extremely long string

    I need to search a file containing 1 extremely long line (approximately 1 million characters), The pattern I want to search is "ABC" as long as it appears at least n times whatever user input as n. I need the position of where this pattern is found. How to best do this? I tried to break the input into blocks of 100000 characters at a time as too many characters read cause the 'java out of memory' error to occur.
    Then I converted this to a string in order to use REGEX to search. My problem is how to ensure that the last few characters of the current block is also being searched too? How to write the regex expression to do this? Will breaking the input file into multiple lines help?
    eg:
    Searching for ABC as long as it appears at least 3 times continuously ie (ABCABCABC)
    Original Line = XXXXXXXABCABCABCXXXXXXABCX
    The first block of 10 characters read is XXXXXXXABCABC
    The second block of 10 characters read is ABCXXXXXXABCX
    The search result should be position 7 and position 22

    If the sequence of characters is longer than a few hundred KB, then turning it into a String requires you to have enough heap space available in the JVM to store the entire String.
    If that is a problem, an alternative solution is to have a while loop over an InputStream that reads from the source of characters (a file, a network connection, stdin etc.) and looks for the string. Keep a ring buffer the size of the query string, and read the data from the InputStream into it. Then for each character read, compare the content of the ring buffer to the query string.
    This way you will not use more heap space than the size of the query-string, and the size of whatever buffer you use in your InputStream (8KB for the empty constructor of BufferedInputStream at the moment) plus the odds'n ends from the implementation.

  • Regex pattern matching

    I am trying to match the following pattern:
    TIMEPAST=151:19:15.00
    I have tried the following:
    Pattern pattern5 = Pattern.compile("(TIMEPAST)=(\\d*:\\d*:\\d*.\\d*)");
    This has not worked as well as other patterns I have tried.
    Any suggestions greatly appreciated.

    Kaj,
    My apologies. When I was trying to put together another posting, I discovered that I was using TIMEPAST when I should have been using AVGTIME. When I corrected this, you patterned worked like a charm:
    Pattern pattern5 = Pattern.compile("(AVGTIME)=(\\d+:\\d+:\\d+.\\d+)");
    Thanks for the help, my hair is gray enough ;-)
    Kind Regards,
    Bruce

Maybe you are looking for

  • Open ODS view in BW 7.4 SP 6

    I have the following doubts regarding Open ODS view in BW 7.4 What is the benefit of Open ODS view over Transient Provider (how they are different from each other) ? Can we use Open ODS view in Composite Provider and Multiprovider ? According to belo

  • In Firefox 6 Beta i can only switch 2-3 tabs by using ctrl + tab ... this is rly annoying! Is this only a bug in FF6B?

    11 tabs are open. 1 Youtube and the rest are non-flash based. There is no really a solution by me to switching tabs permanently. In Firefox 8 Nightly i can switching tabs without any problems. :S

  • Spry Accordion not working in IE7 and IE6

    My spry accordion is working great in Chrome, Safari, Firefox, and IE8, but not in IE7 or 6. The panels are expanded and not hiding. The site is www.christendom.edu/n. It is there on the left.  I am using Spry 1.6 and have googled this question to th

  • Running Total Changes?

    Post Author: cjsmile2106 CA Forum: Formula Hello Everyone   I have a weird problem.  I have a report that has three groups.  In each group I have manual running totals.  I would like to take a manual running total that I have in Group 2 and subtract

  • Deliver out put issue

    sales order is created for sales area 1000/10/00. The material used in the sales order belongs to division 10. There is a condition record maintained for delivery output with combination of 1000/10/10 and this is getting picked at the delivery header