Problems with java regular expressions

Hi everybody,
Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
For example, I have this code in java:
import java.util.regex.*;
String text = "abc";
          Pattern p = Pattern.compile("(a)b(c)");
          Matcher m = p.matcher(text);
if (m.matches())
               int count = m.groupCount();
               System.out.println("Groups found " + String.valueOf(count) );
               for (int i = 0; i < count; i++)
                    System.out.println("group " + String.valueOf(i) + " " + m.group(i));
My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
Groups found 2
group 0 abc
group 1 a
I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
I am using Java 1.5.0 on Mac OS X 10.4.
Thank to all who can help.

paulcw wrote:
If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
Matcher.groupCount():
Group zero denotes the entire pattern by convention. It is not included in this count.

Similar Messages

  • Problem with java regular expression

    Hi,
    I try to use the regular expression as follows
    Pattern pattern = Pattern.compile("\wpub\w");
    Matcher matcher = null;
    matcher = pattern.matcher("38712pubeeqpwoiu");
    if (matcher.matches())
    System.out.println("Matched");
    else
    System.out.println("Not Matched");
    and I always get the answer as "Not Matched". I am not sure what is wrong with the code.
    thanks

    Use find() rather than matches().

  • Problem with email regular expression

    I found the following regular expression from this website : http://www.regular-expressions.info/email.html
    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\bor
    ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$I am using the following website to check its success, as well as my java app. : http://www.regexplanet.com/simple/index.html
    of course java has a problem with the illegal escape character so i add another "/".
    In both of my testing scenarios the regex value seems to fail. Am I missing something here?
    If anyone could also test this regex and verify or correct my problem I would be very grateful.
    Edited by: rico16135 on Jun 5, 2010 4:01 PM
    Edited by: rico16135 on Jun 5, 2010 4:02 PM

    rico16135 wrote:
    of course java has a problem with the illegal escape character so i add another "/".I seem to be missing the distinction between a forward slash / which is not an escape character, and a backslash \ which is.
    Also, if you show your actual Java code (in code tags, of course), and describe clearly and precisely what's going wrong--pasting in the exact, complete error message and indicating clearly which line caused it, if there is one--people will be in a better position to help you.

  • Help with java regular expressions

    Hi all ,
    i am going to match a patternstring against an input string and print the result here is my code:
         import java.util.regex.*;
         import java.util.*;
         public class Main {
              private static final String CASE_INSENSITIVE = null;
              public static void main(String[] args)
              CharSequence inputStr = "i have 5 years FMCG saLEs exp on java/j2ee and i worked on java and j2ee and 2 projects on telecom java j2ee domain with your  with saLEs maNAger experience of java j2ee and c# having very good  on c++ exposure in JAVA"
             String patternStr = "\"java j2ee\" and \"c#\"";
              StringTokenizer st = new StringTokenizer(patternStr,"\",OR");
             Matcher matcher=null;
              while(st.hasMoreTokens()){
                   String s=st.nextToken();
                   Pattern pattern = Pattern.compile(s,Pattern.CASE_INSENSITIVE);
               matcher = pattern.matcher(inputStr);
               while (matcher.find()) {
                  String result = matcher.group();
                 if(!result.equalsIgnoreCase(" "))
                             System.out.println("result:"+result);
         when i compile this code i am getting the expected result...ie
    result:java j2ee
    result:java j2ee
    result: and
    result: and
    result: and
    result: and
    result: and
    result: and
    result:c#
    but when i replace String patternStr = "\"java j2ee\" and \"c#\""; with
    String patternStr = "\"java j2ee\" and \"c++\""; i am just getting c in the result instead of c++ ie i am getting result :
    result:java j2ee
    result:java j2ee
    result: and
    result: and
    result: and
    result: and
    result: and
    result: and
    result:C
    result:c
    result:c
    result:c
    result:c
    result:c
    result:c
    In the last lines i should get result:c++ instead of result: c
    Any ideas please
    Thanks

    In the last lines i should get result:c++ instead of result: cThe regular expression parser considers the plus sign '+' a special
    character; it means: one or more times the previous regular expression.
    So 'c++' means one or more 'c's on or more times. Obviously you don't
    want that, you want a literal '+' plus sign. You can do that by prepending
    the '+' with a backslash '\'. Unfortunately, the javac compiler considers
    a backslash a special character and therefore you have to 'escape'
    the backslash also, by adding another backslash. The result looks
    like this:"c\\+\\+"kind regards,
    Jos

  • Problem with my regular expression

    Hi
    I am trying to find function declarations within source code. I've been trying to use regular expressions, but if anyone else has an easier / better way to do this, as ross perot said, "I'm all ears"
    Here's my regular expression printed out:
    sb = [.]*void foo[.]*
    sb is a stringbuffer variable, so the regex is [.]*void foo[.]*
    Can anyone tell me why when I run this pattern on my loaded source file it tells me that the pattern does not match? LOL can you help me fix this?

    I doubt that regex is a very good way to find method
    declarations, although you can probably make it work
    reasonably well for the most common cases.You've got a point. This is just to get the ball rolling. I don't want to have to do full syntax parsing yet.
    >
    In your particular situation, I'd guess it has to do
    with [.]*. I think that inside [], the
    dot is a literal dot, not "any character." If you
    want "zero or more of anything followed by void" then
    it's .*void.
    Just tried
    sb = .*void foo.*
    it still didn't work.
    In any case, however, this only covers methods
    returning void. For the more general case, I don'tI didn't give you all my code. I actually use a classloader and reflection to determine what the methods are, return values etc. So yes, void in this case is not overly flexible, but I can plug in whatever it is I'm looking for.
    think a simple regex will do it. You need a kind of
    tool that I can't recall the name of--something to
    build and examine an abstract syntax tree perhaps.
    Try starting here:
    http://www.google.com/search?q=java+abstract+syn
    tax+treeI'm not even sure it abstract syntax tree is what you
    want, but I think it's at least closer than regex.
    There might be something that's a step or two earlier
    in the compilation process that will give you what
    you want.

  • Problem with a regular expression

    I want to check the date.
    I wrote a sample pattern
    pattern = Pattern.compile("([0-9]{2}\\.[0-9]{2}\\.[0-9]{2,4}){0,10}");DD.MM.YYYY
    DD.MM.YY
    "" - empty date
    It works fine, but how could I check, whether the date is not greater than 31 and so on with the other things.
    It should pass:
    "11.12.2006"
    "15.15.2005" - wrong date
    "01.00.2005" - worng date
    "22.22.1000" - also could be wrong.
    Thanks in advance.

    CalendarI check befor the data will be sent to the database.Good. I just meant that regex can't check whether Febuary has 28 or 30 days in a giant leap year.
    There is not only dates, special fields like names,
    ids and so on. I check all with regular expressions,
    wanted the date also. And wanted to know whether it
    is possible in this expression to make so, that the
    date can be "". I can check it explicitly, I wanted
    to do all the job with a regex.Why not simply check for a String length of 0? :)
    The first sample I wrote allows
    "DD.MM.YYYY" and "" and "DD.MM.YY", but in last
    sample "" and YY impossibleHm? You mean two-digit year and no year (but the rest of date still present)?

  • SQL Injection and Java Regular Expression: How to match words?

    Dear friends,
    I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
    The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
    I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
    Thanks,
    Ricky
    Edited by: Ricky Ru on 28/04/2011 02:29

    Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

  • Little help with this regular expression ?

    Greetings all,
    My string is like "P: A104, P105, K106" and I tried to split it using following regular expression ,but seems even though it returns 'true' as matched, it does not split the string.
    Finally what I want is String is array like {"P:","A104","P105","K106"}
    What seems to be the problem with my regular expression?
           String PAT1="[a-zA-Z]:\\s([a-zA-Z]\\d{1,}([,]\\s)?)*";
         String inp="P: A104, P105, K106";
         Pattern p=Pattern.compile(PAT1);
         System.out.println(p.matcher(inp).matches());
         String strs[]=inp.split(PAT1);
         for(String s:strs){
              System.out.println("part  "+s);
         }

    pankajjaiswal10 wrote:
    You can also use split(":|,")No, that will remove the colon and leave the whitespace intact. My understanding is that the OP wants to keep the colon and remove the commas and whitespace. I would use this: String[] strs = inp.split(",?\\s+");

  • Logical AND in Java Regular Expressions

    I'm trying to implement logical AND using Java Regular Expressions.
    I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
    Any ideas, how to do it ?
    Baken

    First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
    Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

  • What's wrong with the regular expression?

    Hi all,
    For the life of me I can not figure out what is wrong with this regular expression
    .*\bA specific phrase\.\b.*
    This is just an example the actual phrase can be an specific phrase. My problem comes when the specific phrase ends in a period. I've escaped the period but it still gives me an error. The only time I don't get an error is when I take off the end boundry character which will not suffice as a solution. I need to be able to capture all the text before and after said phrase. If the phrase doesn't have a period it would look like this...
    .*\bA specific phrase\b.*
    which works fine. So what is it about the \.\b combination that is not matching?
    I've been banging my head on this for a while and I'm getting nowhere.
    The application highlights text that comes from a server. The user builds custom highlights that have some options. Highlight entire line, match partial word, and ignore case. The code that builds my pattern is here
    String strHighlight = _strHighlight;
            strHighlight = strHighlight.replaceAll("\\*", "\\\\*");
            strHighlight = strHighlight.replaceAll("\\.", "\\\\.");
            String strPattern = strHighlight;
            if(_bEntireParagraph)
                if(_bPartialWord)
                    strPattern = ".*" + strHighlight + ".*";
                else               
                    strPattern = ".*\\b" + strHighlight + "\\b.*";           
            else
                if(_bPartialWord)
                    strPattern = strHighlight;
                else               
                    strPattern = "\\b" + strHighlight + "\\b";  
            if(_bIgnoreCase)
                _patHighlight = Pattern.compile(strPattern, Pattern.CASE_INSENSITIVE);
            else
                _patHighlight = Pattern.compile(strPattern);So for example I matching the phrase: The dog ate the cat. And that phrase came over in the following text: Look there's a dog. The dog ate the cat. "Oh my!"
    And my user has the entire line and ignore case options selected then my regex woud look like this: .*\bThe dog ate the cat\b.*
    That should get highlighted, but for some reason it doesn't. Correct me if I'm wrong but doesn't the regex read as follows:
    any characters
    word boundry
    The dog ate the cat[period]
    word boundry
    any characters until newline.
    Any help will be much appreciated

    A word boundary (in the context of regexes) is a position that is either followed by a word character and not preceded by one (start of word) or preceded by a word character and not followed by one (end of word). A word character is defined as a letter, a digit, or an underscore. Since a period is not a word character, the only way the position following it could be a word boundary is if the next character is a letter, digit or underscore. But a sentence-ending period is always followed by whitespace, if anything, so it makes no sense to look for a word boundary there. I think, instead of \b, you should use negative lookarounds, like so:   strPattern = ".*(?<!\\w)" + strHighlight + "(?!\\w).*";

  • Java – Regular Expressions – Finding any non digit byte in a multiple byte

    Hello,
    I’m new to JAVA and Regular Expressions; I’m trying to write a regular expression that will find any records that contain a non digit byte in a multiple byte field.
    I thought the following was the correct expression but it is only finding records that contain “all” non digit bytes.
    \D{1,}
    \D = Non Digit
    {1,} = at least 1 or more
    Below is my sample data. I would like the regular expression to find all of the records that are not all numeric. However when I use the regular expression \D{1,} it is only finding the 2 records that all bytes are non digits. (i.e. “ “ and “A “)
    “ 111229”
    “2 111229”
    “20091229”
    “200912c9”
    “201#1229”
    “20101229”
    “20110229”
    “20111*29”
    “20111029”
    “20111229”
    “20B11229”
    “A “
    “A0111229”
    Please note I have also tried \D{1,}+ and \D{1,}? And they also do not return my desired results
    Any assistance someone can provide would be greatly appreciated.

    You don't show the code you are using but I surmise you are using String.matches() which requires that the whole target must match the regular expression not just part of it. Instead you should create a Pattern and then a Matcher and use the Matcher.find() method. Check the Javadoc for Pattern and Matcher and look at the Java regex tutorial - http://docs.oracle.com/javase/tutorial/essential/regex/ .
    P.S. You can re-use the Pattern object - you don't have to create it every time you need one.
    P.P.S. Java regular expressions work with characters not bytes and characters are not not not bytes.

  • Java regular expression for Arabic

    i want to use java regular expression to evaluate some string in Arabic
    can some body tell me how to do a match for arabic characters

    i have this code :
    String poem="���";
          //String m1="\\p?";   
           String m1= "\\p{�}";
            Matcher m =
          Pattern.compile(m1)
            .matcher(poem);
        while(m.find()) {
          for(int j = 0; j <= m.groupCount(); j++)
            System.out.print("[" + m.group(j) + "]");
          System.out.println();
        }i get the error:
    Exception java.util.regex.PatternSyntaxException: Unknown character property name {?} near index 2
    \p?
    if you find that is hard to help with Arabic regex, can someone post a code on how to match Arabic regex or chineese or any thing not latin regex match
    because a need to match a Strings in Arabic if some one can tell me how?

  • Improving Java Regular Expression Compile Time

    Hi,
    Just wondering if anyone here knows how can i improve the compile time of Java Regular Expression?
    The following is fragment of my code which I tired to see the running time.
    Calendar rightNow = Calendar.getInstance();
    System.out.println("Compile Pattern");
    startCompileTime = rightNow.getTimeInMillis();
    Pattern p = Pattern.compile(reg, Pattern.CASE_INSENSITIVE);
    rightNow = Calendar.getInstance();
    endCompileTime = rightNow.getTimeInMillis();
    Below is fragment of my regular expression:
    (?:tell|state|say|narrate|recount|spin|recite|order|enjoin|assure|ascertain|demonstrate|evidence|distinguish|separate|differentiate|secern|secernate|severalize|tell apart) me (?:about|abou|asti|approximately|close to|just about|some|roughly|more or less|around|or so|almost|most|all but|nearly|near|nigh|virtually|well-nigh) java
    My regular expression is a very long one and the Pattern.compile just take too long. The worst case that I experience is 2949342 milliseconds.
    Any idea how can I optimise my regular expression so that the compilation time is acceptable.
    Thanks in advance

    My regular expression is a very long one and the
    Pattern.compile just take too long. The worst case
    that I experience is 2949342 milliseconds.Wow, that's pretty pathological. I was going to tell you that you were measuring something wrong, because I had written a test program that could compile a 1 Mb "or" pattern (10,000 words, 100 bytes per) in under 200 ms ... but then I noticed that your patterns have two "or" components, so reran my test, and got over 14 seconds to run with a smaller pattern.
    My guess is that the RE compiler, rather than decomposing the RE into a tree, is taking the naive approach of translating it into a state machine, and replicating the second component for each path through the first component.
    If you can create a simple hand-rolled parser, that may be your best option. However, it appears that your substrings aren't easily tokenized (some include spaces), so your best bet is to break the regexes into pieces at the "or" constructs, and use Pattern.split() to apply each piece sequentially.
    import java.util.Random;
    import java.util.regex.Pattern;
    public class RegexTest
        public static void main(String[] argv) throws Exception
            long initial = System.currentTimeMillis();
            String[] words = generateWords(10000);
    //        String patStr = buildRePortion(words);
    //        String patStr = buildRePortion(words) + " xxx ";
            String patStr = buildRePortion(words) + " xxx " + buildRePortion(words);
            long startCompile = System.currentTimeMillis();
            Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
            long finishCompile = System.currentTimeMillis();
            System.out.println("Number of components = " + words.length);
            System.out.println("ms to create pattern = " + (startCompile - initial));
            System.out.println("ms to compile        = " + (finishCompile - startCompile));
        private final static String[] generateWords(int numWords)
            String[] results = new String[numWords];
            Random rnd = new Random();
            for (int ii = 0 ; ii < numWords ; ii++)
                char[] word = new char[20];
                for (int zz = 0 ; zz < word.length ; zz++)
                    word[zz] = (char)(65 + rnd.nextInt(26));
                results[ii] = new String(word);
            return results;
        private static String buildRePortion(String[] words)
            StringBuffer sb = new StringBuffer("(?:");
            for (int ii = 0 ; ii < words.length ; ii++)
                sb.append(ii > 0 ? "|" : "")
                  .append(words[ii]);
            sb.append(")");
            return sb.toString();
    }

  • Java regular expression for CSV?

    I found several regular expressions in the internet to parse/split csv data lines. Howeverm, they all don't work with the Java regular expression API. Is there a regular expression to tokenize CSV fields for the Java regexp API?

    If the licensing of the above solution is too restrictive for you...I'm sure there are other types of parsers out there that do that type of thing.
    In the meantime, here is some code I cooked up (no GPL...use it freely) that might get you started.
    Don't know that it handles everything, but I never said it would...
    Please READ and let me know what changes could be made. I'm always looking for improvements in my understanding of regular expressions...
    import java.util.regex.*;
    import java.util.*;
    import java.util.List;
    public class Example
       final static Pattern CSV_PATTERN;
       final static Pattern DOUBLE_QUOTE;
       static
          String regex = "(?:  ([^q;]+) | (?: q ((?: (?:[^q]+) | (?:qq) )+ ) q)  );?";
          //                       1          2          a           b       3    4
          // So, pretend your quote character is q
          // (you can change it to \" later when you understand what's going on.)
          // This regex (when applied iteravely) matches a token that:
          // 1) contains NO QUOTE MARKS whatsoever (;'s) (in group 1)
          //                       or
          // 2) starts with a QUOTE, then contains either
          //    a) no quotes at all inside or
          //    b) double quotes (to escape a quote)
          // 3) and ends with a QUOTE.
          // 4) and is followed by a separator (optional for the last value)
          // Note that (a) and (b) are captured in group 2 of the regex.
          CSV_PATTERN = Pattern.compile(regex, Pattern.COMMENTS);
          DOUBLE_QUOTE = Pattern.compile("qq");
        * Attempts to parse Excel CSV stuff...
        * @param text the CSV text.
        * @return a list of tokens.
       public static List parseCsv(String text)
          Matcher csvMatcher = CSV_PATTERN.matcher(text);
          Matcher doubleQuotes = DOUBLE_QUOTE.matcher("");
          List list = new ArrayList();
          while (csvMatcher.find())
             if (csvMatcher.group(1) != null)
                // The first one matched.
                list.add(csvMatcher.group(1));
             else
                doubleQuotes.reset(csvMatcher.group(2));
                list.add(doubleQuotes.replaceAll("q"));
          return list;
    }

  • Problem with Java and Windows (Mainly Vista and UAC)

    Hi all,
    I am having a problem with a program that I've devoloped. The program itself is packaged as a jar and I plan to deploy it across multiple platforms eventually however right now i am only concerned about windows based systems. I have made an installer for a windows baised systems using NSIS to install the software files. I made the installer as I need several java packages to be installed so the program would work (JAI, J3D, JAI ImageIO) and I also require the program to have fileassociations on windows.
    I know that this is not what java is about, however the majority of the users will be on windows baised systems so I've decided that OS specific installers is the best option.
    During the process I have noticed that there are several key problem with java for this type of application!
    The first issue that I have come across is getting file associations to work on java. As a .jar is not an excutable it is not possible to directly associate a filetype with it in java so to overcome this I currently run the program from a .bat files and also the program requires large memory so this also allows me to run the program with -xmx. The batch file that I use is :
    <code>
    cd PATH TO PROGRAM
    start javaw -Dsun.java2d.noddraw=true -Xmn100M -Xms500M -Xmx1000M -jar "PATH TO PROGRAM\program.jar" %1 -vram 134217728
    pause;
    </code>
    Ok so all this appears to work fine and allows windows to have file associations and start the program and thats all works perfectly but this is a non-ideal solution. Has anyone got any advice on improving this?
    The next problem that I have appears to be a problem with Vista and UAC (user access control). When a user installs the program and installs the program into the program files directory I found that the program did not work and kept saying that I did not have access to the files in the current directory. This is a problem as I read and write settings files during program execution.
    On a Vista system UAC prevents file write operations into the Program Files directory unless the program has requested elevated status even if the user is a full administrator. The probem is that there appears to be no real way to achieve this under java that I'm aware of...
    Has anyone else had this probem and has a suitable solution?
    Any advice on these issues would realy be appricated.
    Regards
    Joey

    Ok so i've kinda found a solution, its not ideal but its very good. I found this program called Elevate
    A link to the site I got it was
    http://www.wintellect.com/cs/blogs/jrobbins/archive/2007/03/27/elevate-a-process-at-the-command-line-in-vista.aspx
    This program allows you start java with a UAC dialog for high access using
    Elevate java -jar myjar.jar
    This then allows you to have full access using java... I guess it could be dangerous but it does the job.

Maybe you are looking for

  • Apple TV monitor question

    Okay here is a tough one. I have an Apple TV, a MacBook, a wireless keyboard, and a Samsung HD flat screen TV, 42". Can i use the Samsung flatscreen as a monitor [42"] for the MacBook, using the wirelesss Apple TV? For example if I wanted to use the 

  • Blueberry Video output

    Is there anyway to connect an external monitor to the original blueberry ibooks (300Mhz, 1999). I know that on some of the later models (466Mhz for example) the A/V port had video out, but that isn't the case for mine. Any options on the usb side of

  • Autosave functionality in peoplesoft

    hi, i've placed an autosave code (which i've got in the internet) in an HTML area (property is constant) then when I access the page where the code has been included, I've encountered an IE error which states: Char: 41 Error: Expected '{' here's the

  • UI5 load view from another view

    Hi all, I'm trying to load a view from an external project. The idea being that a view can be used by multiple projects to cater for reuse e.g a view that has different view modes, user role privileges etc. however I can't seem to find a way to find

  • Itunes won't play any media since upgrading to 10.6.3

    I have had my laptop since christmas last year and recently had virus issues so I restored my computer from a back-up and it cleared that problem, but about a week after the restore I noticed iTunes stopped playing media, it would say it is playing b