REGEX: question about finding Overlapping matches using regular expressions

I have the following problem.
Say for my pattern I use:
Pattern pattern = Pattern.compile("AAA");
Matcher matcher = pattern.matcher("AAAAAA");when I run a loop
while (matcher.find())
System.out.println("Match Found: "+matcher.start()+" "+matcher.end());I get 2 Hits shown in the following output:
Match Found: 0 3
Match Found: 3 6
therefore the regex is seeing the first AAA then the second AAA.
I want it to find the other AAA's in there that are overlapping the other two finds i.e. I want the output to find
AAA from 0 to 3
AAA from 1 to 4
AAA from 2 to 5 and finally
AAA from 3 to 6
thereby including the overlapping finds.
How can I do this using regex? what am I missing that prevents the overlapping matches to be found? Do I need a quantifier?
Thanks for the help!

While the solutions above work fine with the given input, they don't really find all overlapping matches. They just find the longest possible match at each start position. Here's a more thorough approach:import java.util.*;
import java.util.regex.*;
public class Test
  public static List<String> matchAllWays(String rgx, String str)
    Pattern p = Pattern.compile(rgx);
    Matcher m = p.matcher(str);
    List<String> result = new ArrayList<String>();
    int len = str.length();
    int start = 0;
    int end = len;
    while (start < len && m.region(start, len).find())
      start = m.start();
      do
        result.add(m.group());
        end = m.end() - 1;
      } while (end > start && m.region(start, end).find());
      start++;
    return result;
  public static void main(String[] args)
    List<String> matches = matchAllWays("a.*a", "abracadabra");
    System.out.println(matches);
}This approach requires JDK 1.5 or later; that's when the regions API was added to Matcher.

Similar Messages

  • Pattern matching using Regular expression

    Hi,
    I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
    A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
    B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
    the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
    The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
    I used the condition
    AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
    First of all, is this possible to match multiple patterns in one condition?
    If yes, please suggest.
    Thanks

    user2544469 wrote:
    Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
    - query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
    If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
    WITH     got_token_cnt     AS
         SELECT     cat
         ,     book                                        -- Added March 22
         ,     REPLACE (codes, '+', '*') AS codes                    -- If desired.  Changed March 22
         ,     LENGTH (codes) - LENGTH ( TRANSLATE ( codes
                                                       , 'x*+-'
                                      , 'x'
                             ) AS token_cnt
         FROM    test_cat
    ,     cntr     AS
         SELECT     LEVEL     AS n
         FROM     (  SELECT  MAX (token_cnt)     AS max_token_cnt
                 FROM        got_token_cnt
         CONNECT BY     LEVEL     <= max_token_cnt
    ,     got_tokens     AS
         SELECT     t.cat
         ,     t.book                                        -- Added March 22
         ,     REGEXP_SUBSTR ( t.codes
                         , '[*+-]'
                         , 1
                         , c.n
                         )          AS token_type
         ,     SUBSTR ( REGEXP_SUBSTR ( t.codes
                                       , '[*+-][^*+-]*'
                               , 1
                               , c.n
                   , 2
                   )          AS token
         FROM     got_token_cnt     t
         JOIN     cntr          c  ON     c.n     <= t.token_cnt
    ,     got_must_have_cnt     AS
         SELECT       cat, book                                   -- Changed March 22
         ,       COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
         FROM       got_tokens
         GROUP BY  cat, book                                   -- Changed March 22
    SELECT       mh.cat
    ,       vn.vn_no
    FROM       got_must_have_cnt     mh
    JOIN                    vn  ON  mh.book     = vn.book               -- Changed March 22
    LEFT OUTER JOIN      got_tokens     gt  ON     mh.cat                  = gt.cat
                                     AND INSTR (vn.codes, gt.token) > 1
    GROUP BY  mh.cat
    ,            mh.must_have_cnt
    ,            vn.vn_no
    HAVING       COUNT (CASE WHEN gt.token_type = '*' THEN 1 END)     = mh.must_have_cnt
    AND       COUNT (CASE WHEN gt.token_type = '-' THEN 1 END)     = 0
    ORDER BY  mh.cat
    - query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
    When your query takes too long ...
    HOW TO: Post a SQL statement tuning request - template posting
    Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
    Edited by: Frank Kulash on Mar 22, 2011 12:04 PM

  • A question about using regular expression

    Hi,
    This is a part of HTML file.
    <SPAN>how</span>
    <SPAN>are</span>
    <SPAN>you</span>
    I want to search string between each pair of <SPAN> , </SPAN> tags by using Regular Expression.
    For example:
    how
    are
    you
    If I use following method
    String regx="<SPAN>(.+)<SPAN>";
    Matcher m=Pattern.compile(regx).matcher(str);
    int currentLoc=0;
    while(currentLoc<str.length()){
        if(m.find(currentLoc))
        System.out.println(m.group(1));
        currentLoc=m.end();
    }The content between first <SPAN> and last </SPAN> will be searched.
    How to solve this problem?

    Use a non-greedy match:
    (?s)<SPAN>(.+?)<SPAN>(?s) makes the dot match the line terminator (same as setting the dot all option)

  • Matching substrings between square brackets using regular expressions

    Hello,
    I am new at Java and have a problem with regular expressions. Let me describe the issue in 3 steps:
    1.- I have an english sentence. Some words of the sentence stand between square brackets, for example "I [eat] and [sleep]"
    2- I would like to match strings that are in square brackets using regular expressions (java.util.regex.*;) and here is the code I have written for the task
    +Pattern findStringinSquareBrackets = Pattern.compile("\\[.*\\]");+
    +     Matcher matcherOfWordInSquareBrackets = findStringinSquareBrackets.matcher("I [eat] and [sleep]");+
    +//Iteration in the string+
    +          while ( matcherOfWordInSquareBrackets.find() )+
    +{+
    +          System.out.println("Patter found! :"+ outputField.getText().substring(matcherOfWordInSquareBrackets.start(), matcherOfWordInSquareBrackets.end())+"");     +
    +          }+
    3- the result I have after running the code described in 2 is the following: *Patter found!: [eat] and [sleep]*
    That is to say that not only words between square brackets are found but also the substring "and". And this is not what I want.
    What I would like to have as a result is:
    *Patter found!: [eat]*
    *Patter found!: [sleep]*
    That is to say I want to match only the words between the square brackets and nothing else.
    Does somebody know how to do this? Any help would be great.
    Best regards,
    Abou

    You can find the words by looping through the sentence and then return the substring within the indexes.
    int start=0;
    int end=0;
    for(int i=0; i<string.length(); i++)
       if(string.substring(i,i+1).equals("[");
      start=i;
    if(start!=0)
    if(string.substring(i,i+1).equlas("]");
    end=i;
    return string.substring(start,end+1);
    }something like that. This code will only find the firt word however. I do not know much about regex so I cannot help anymore.
    Edited by: elasolova on Jun 16, 2009 6:45 AM
    Edited by: elasolova on Jun 16, 2009 6:46 AM

  • Finding URLs using regular expression.

    I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
    "Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
    I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
    For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
    My acutal problem is to find the URL irrespective its position within the string.
    Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
    Matcher matcher = urlPattern.matcher(plainText);
    Map stringIndexMap = new HashMap();
    //Searching the input string for urlPattern...
    while(matcher.find()) {
    String urlString = matcher.group();
    //Storing the urls in a hashmap with their indices as keys....
    stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
    Set keySet = stringIndexMap.keySet();
    Iterator it = keySet.iterator();
    //Iterating over the hashmap containing urls...
    while(it.hasNext()) {
    String urlString = (String) stringIndexMap.get(it.next());
    * Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
    * using String index
    clickableURLString.replace(clickableURLString.indexOf(urlString),
    clickableURLString.indexOf(urlString) + urlString.length(),
    "<a href=\"#\" onclick=\"window.open('" + urlString
    + "')\">" + urlString + "</a>");
    return clickableURLString.toString();

    The end of the input is '$' as a regex.
    import java.util.regex.*;
    public class Prasanna{
      public static void main(String[] args){
        String text
    = "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
    //    String regex = "(http|https)://.+?(?:\\s|$)"; // this works
        String regex = "(http|https)://[^ ]+";          // this also works
        Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher mat = pat.matcher(text);
        while (mat.find()){
          System.out.println(mat.group());
    }

  • How to use regular expression to find string

    hi,
    who know how to get all digits from the string "Alerts 4520 ( 227550 )  (  98 Available  )" by regular expression, thanks
    br, Andrew

    Liu,
    You can use RegEx as   
    d+
    Whether you are using CL_ABAP_REGEX class then
    report  zars.
    data: regex   type ref to cl_abap_regex,
          matcher type ref to cl_abap_matcher,
          match   type c length 1.
    create object regex exporting pattern = 'd+'
                                  ignore_case = ''.
    matcher = regex->create_matcher( text = 'Test123tes456' ).
    match = matcher->match( ).
    write match
    You can find more details regarding REGEX and POSIX examples here
    http://www.regular-expressions.info/tutorial.html

  • Question about Finder-Load-Beans flag

    Hi all,
    I've read that the Finder-Load-Beans flag could yield some valuable gains in performance
    but:
    1) why is it suggested to do individual gets of methods within the same Transaction
    ? (tx-Required).
    2) this strategy is useful only for small sets of data, isn't it? I imagine I
    would choose Finder-Load-Beans to false (or JDBC) for larger sets of data.
    3) A last question: its default value is true or false ?
    Thanks
    Francesco

    Because if there are different transactions where the get method is called
    then the state/data of the bean would most be reloaded from the database. A
    new transactions causes the ejbLoad method to be invoked in the beginning
    and the ejbStore at the end. That is the usual case but there are other ways
    to modify this behavior.
    Thanks
    Gaurav
    "Francesco" <[email protected]> wrote in message
    news:[email protected]...
    >
    Hi thorick,
    I have found this in the newsgroup. It's from R.Woolen answering
    a question about Finder-Load-Beans flag.
    "Consider this case:
    tx.begin();
    Collection c = findAllEmployeesNamed("Rob");
    Iterator it = c.iterator();
    while (it.hasNext()) {
    Employee e = (Employee) it.next(); System.out.println("Favorite color is:"+ e.getFavColor());
    tx.commit();
    With CMP (and finders-load-beans set to its default true value), thefindAllEmployeesNamed
    finder will load all the employees with the name of rob. The getFavColormethods
    do not hit the db because they are in the same tx, and the beans arealready loaded
    in the cache.
    It's the big CMP performance advantage."
    So I wonder why this performance gain can be achieved when the iterationis inside
    a transaction.
    Thanks
    regards
    Francesco
    thorick <[email protected]> wrote:
    1) why is it suggested to do individual gets of methods within thesame Transaction
    ? (tx-Required).I'm not sure about the context of this question (in what document,
    paragraph
    is this
    mentioned).
    2) this strategy is useful only for small sets of data, isn't it? Iimagine I
    would choose Finder-Load-Beans to false (or JDBC) for larger sets ofdata.
    >
    If you know that you will be accessing the fields of all the Beans that
    you get back from a
    finder,
    then you will realize a significant performance gain. If one selects
    100s or more beans
    using
    a finder, but only accesses the fields for a few, then there may be some
    performance cost.
    It could
    depend on how large some of the fields are. I'd guess that the cost
    of 1 hit to the DB per
    bean vs.
    the cost of 1 + maybe 1 more hit to the DB per bean, would usually be
    less. A performance
    test using
    your actual apps beans would be the only way to know for sure.
    3) A last question: its default value is true or false ?The default is 'True'
    -thorick

  • Find text using regular expression and add highlight annotation

    Hi Friends
                       Is it possible to find text using regular expression and add highlight annotation using plugin

    A plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.

  • Find/Replace Using Regular Expressions

    Can someone help me with this...I am using Regular expressions to
    FIND:
    http.*lid=([^&"]*)[^"]*
    REPLACE:
    $set(\1,ID_id,code)$
    So that in the following it will change this:
    a href="http://www.test.com/shc/s/home_10153_12605?lid=Search" rilt="Search"
    To this:
    a href="$set(Search,ID_id,code)$" rilt="Search
    Those expressions  work in Notepad++ but when i use dreamweaver it just replaces the http... with "$set(\1,ID_id,code)$" and doesnt reference the "search"
    Any help?
    Thanks

    Let me begin by saying I'm a complete idiot with DW's Reg Ex.   I use Search Specific Tag whenever possible.  See screenshot below.
    Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
    Good luck,
    Nancy O.

  • How to fetch substring using regular expression

    Hi,
    I am new to using regular expression and would like to know some basic details of how to use them in Java.
    I have a String example= "http://www.google.com/foobar.html#*q*=database&aq=f&aqi=g10&fp=c9fe100d9e542c1e" and would like to get the value of "q" parameter (in bold) using regular expression in java.
    For the same example, when we tried using javascript:
    match = example.match("/^http:\/\/(?:(?!mail\.)[^\.]+?\.)?google\.[^\?#]+(?:.*[\?#&](?:as_q|q)=([^&]+))?/i}");
    document.write('
    ' + match);
    We are getting the output as: http://www.google.com/foobar.html#q=database,*database* where the bold text is the value of "q" parameter.
    In Java we are trying to get the value of the q parameter separately or atleast resembles the output given by JavaScript. Please help me resolving this issue.
    Regards
    Praveen

    BalusC wrote:
    Regex is a cumbersome solution for fixed patterns like URL's. String#substring() in combination with String#indexOf would most likely already suffice.I usually agree, although, in this case, finding the exact parameter might be difficult without a small regex, perhaps:
    "\\wq=\\s*"in conjunction with Pattern/Matcher, used similarly to an indexOf() to find the start of the parameter value.
    Winston

  • Checking valid e-mail's using Regular expressions

    Hey buddies ,
    I desperately need some help here. I need to develop a generic method that wiull use regular expressions and patterns to validate an e-mail.
    Does anyonw have any idea on how to do this. And if possible please share some code with me.
    Thanks a lot

    You can do regular expresions in java using java.util.regex.*:
    import java.util.regex.*;
       Pattern p = Pattern.compile("\\S++\\s++");
       Matcher m = p.matcher(sInputLine);
       if(m.find()){
          sUser = m.group().trim();
       }For more info on regular expressions in java, check out the javadocs:
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

  • Searching for a substring using Regular Expression

    I have a lengthy String similar to repetetion of the one below
    String str="<option value='116813070'>Something1</option><option value='ABCDEF' selected>Something 2</option>"I need to search for the Sub string "<option value='ABCDEF' selected>" (need to get the starting index of sub string) and but the value ABCDEF can be anything numberic with varying length.
    Is there any way i can do it using regular expressions(I have no other options than regular expression)?
    thanks in advance.

    If you go through the tutorial then you will find this on the second page:
    import java.io.Console;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    public class RegexTestHarness {
        public static void main(String[] args){
            Console console = System.console();
            if (console == null) {
                System.err.println("No console.");
                System.exit(1);
            while (true) {
                Pattern pattern =
                Pattern.compile(console.readLine("%nEnter your regex: "));
                Matcher matcher =
                pattern.matcher(console.readLine("Enter input string to search: "));
                boolean found = false;
                while (matcher.find()) {
                    console.format("I found the text \"%s\" starting at " +
                       "index %d and ending at index %d.%n",
                        matcher.group(), matcher.start(), matcher.end());
                    found = true;
                if(!found){
                    console.format("No match found.%n");
    }It's does everything you need and a bit more. Adapt it to your needs then write a regular expression. Then if you have problems by all means come back and post them up here, but first at least attempt to solve it yourself.

  • Matches from regular expression into collection

    Hello,
    I need to do the following:
    I have a long string with some similar repeated data. I would like, using a regular expression, to extracts all matches in a collection. Is there a way of performing this task?
    I have look through the owa_pattern package, but as far as I found out, I can extract only a simple match. Here is an exact quote:
    "If multiple overlapping strings can match the regular expression, this function takes the longest match. " - http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/w_patt.htm
    So what can I do if I want to get all the matches?
    Thank you in anticipation. Any help would be appreciated.
    Best regards,
    beroetz

    I think your need a tokenizer-function.
    If the string +:in_str+ is delimited by +:in_delimiter+ you could try this:
    SELECT REGEXP_REPLACE(REGEXP_SUBSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ), :in_delimiter, '') TOKEN
    BULK COLLECT INTO :my_nested_table
    FROM DUAL
    CONNECT BY REGEXP_INSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ) > 0
    ORDER BY LEVEL ASC;
    I wrote a string-to-textarray-tokenizer (and it's pendant) some times ago, being able to cut from certain positions within the string using regular expressions and return the elements into an nested table of varchar2. It looks like:
    TYPE pos_arraytype IS TABLE OF POSITIVE ;
    TYPE text_arraytype IS TABLE OF VARCHAR2(2000);
    FUNCTION stringToTextarray(in_str IN VARCHAR2, in_pos_arr IN pos_arraytype, in_regexp_arr IN text_arraytype DEFAULT NULL, in_trim_strings IN BOOLEAN DEFAULT TRUE)
    RETURN text_arraytype ;
    in_str is the string to be tokenized
    in_pos_arr is a table of positive values of positions in the string to be cut
    in_regexp_arr is a table of regular expressions to use at each position declared by in_pos_arr
    in_trim_strings is a flag, if the cutted element should be trimmed
    using above for example:
    in_str = 'Markus van Muster 347651234XY Musterdaam ABCDE'
    in_pos_arr = (1, 13, 35, 35, 42)
    in_regexp_arr = ('(.?){12}', '([^[:digit:]]?){22}', '[[:digit:]]{4}', '[[:alpha:]]{2}', '(.?){14}')
    in_trim_strings = TRUE
    RETURN collection ('Markus','van Muster','1234','XY','Musterdaam')
    If you need the code, then tell me! I'm looking for....
    Cheers,
    Martin
    Edited by: Nuerni on 17.10.2008 08:49

  • Date Validation (yyyy/MM/dd) Using Regular Expression

    Hi Friends,
    I want to validate date entered by user in yyyy/MM/dd format and for this I want to use Regular Expressions only. Also is there any tool that can be used to generate Regular Expression (for Win2000, Win NT)?
    Regards,
    Himanshu Rathore

    try this
    public class Test
         public static void main(String [] args)
              String regex = "\\d{4}/[01]\\d/[0-3]\\d";
              System.out.println("2003/12/11".matches(regex));
              System.out.println("2djd/kj3".matches(regex));
              System.out.println("22/12/12".matches(regex));
              System.out.println("2003/23/05".matches(regex));
              System.out.println("1999/12/51".matches(regex));
              System.out.println("2007/05/07".matches(regex));
    }i'm not able to try on it because i only have jdk1.3.1 installed on my computer and these codes
    required j2sdk1.4

  • Validate cfinput using regular expression

    Hi,
    can somebody help me valditing an cfinput field using regular expressions?
    First digit must be a number or "R".
    Digit 2-15 can be everything without special characters. Digit 2-15 can also be empty.
    I try this, but it doesn't work (Sorry I'm a beginner using regex).
    <cfinput type="text" name="field1" required="yes" validate="regular_expression"  pattern="[0-9Rr][0-9a-zA-Z]*"  maxlength="15">
    Thank you in advice!
    Claudia

    You haven't told your regex to match the entire string, so it will "pass"
    any string that has your match anywhere within it, so like as long as it's
    got an R or a digit in it: it's OK.
    To tell it to match the entire string, anchor it to the start and end of the
    string with ^ and $ respectively.
    Regular expression syntax: Using special characters
    http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f -7ffb.html#WSc3ff6d0ea77859461172e0811cbec0a38f-7fef
    Adam

Maybe you are looking for