Regex matching bug?

it seems like j2sdk1.4.2b has some serious regex matching bug with strings that contain unicode characters. In my case, the string contained some Turkish chars.
regex is simple <[^>]*> which matches string runs that are enclosed in <>
(ex. <field>)
although the matching is successful with j2sdk1.4.1_02, it just doesn't match unicode containing text with 1.4.2b
What do you think? Is this a bug or could I be missing something?

ahmeti, did you submit a bug report on this? Because it definitely is a bug in the Pattern class, I finally figured out. They added a new node type to make matching ASCII characters in character classes more efficient, but they screwed up the match condition: it always returns false if the character it's looking at is not ASCII, even if the class has been negated. I'll go ahead submit a report myself unless I hear from you.

Similar Messages

  • [bug]Jdev 11g:NullPointerException at java.util.regex.Matcher.getTextLength

    Hi,
    Jdev 11.1.1.0.31.51.56
    If somebody of you get the following trace stack when running a jspx using ViewCriteriaRow.setOperator :
    There is bug 7534359 and metalink note 747353.1 available.
    java.lang.NullPointerException
    at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
    at java.util.regex.Matcher.reset(Matcher.java:291)
    at java.util.regex.Matcher.<init>(Matcher.java:211)
    at java.util.regex.Pattern.matcher(Pattern.java:888)
    at oracle.adfinternal.view.faces.model.binding.FacesCtrlSearchBinding._loadFilter
    CriteriaValues(FacesCtrlSearchBinding.java:3695)
    Truncated. see log file for complete stacktrace
    Workaround:
    If you use 
            vcr.setAttribute("Job",job);
    or
            vcr.setAttribute("Job","="+job);
    than add following line of code:
            vcr.setOperator("Job","=");   regards
    Peter

    Hi,
    useful to mention that this happens when setting the equal operator or LIKE operator
    vcr.setAttribute("Job","= '"+job+"'");
    or
    vcr.setOperator("Job","=");
    Frank

  • Question on regex Matcher (group number)

    HI, everybody
    I am writing a program on replacement like the one below.
    String regex = "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)";
    String original = "ABCDEFGHIJKL";
    String replacement = "$12";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(original);
    String result = m.replaceFirst(replacement);What I actually want is to take out the first group, in this case an "A", and append a character "2" after it.
    The result I am expecting is "A2". But the result I get is "L". For the regex engine takes it as the 12th group.
    What should I do to remove the ambiguity.
    Thanks.

    In such case, use $1\\2.

  • Capture regex match as a variable?

    Hello!
    I have this program and I basically want to match a part of a string and grab the match as a variable. In this case, the string I need to parse is 'foo'.
    Here is what I have:
    public class Test
         public static void main(String[] args)
                    // link <link> format
              String foo = "http://www.foo.com <http://www.foo.com>";
              String the_regex = "\\<(http://[^\\>]*)\\>";
              String the_replacement = "<a href=\"$1\">$1</a>";
              System.out.println(foo.replaceAll(the_regex,the_replacement));
    }$1 (sorta like PERL) should be the captured text from the_regex
    Any ideas?
    Thanks in advance.

    Dubwai - I think I got it, thanks for the guidance. Here is what I used, and it seems to work. Thanks!
    public class Test
         public static void main(String[] args)
              String foo = "http://www.foo.com <http://www.boo.com>";
              String regex="\\<(http://[^\\>]*)\\>";
           Pattern p = Pattern.compile(regex);
           Matcher m1 = p.matcher(foo);
           while (m1.find())
             System.out.println("The site = " + m1.group(1));
    }

  • Regex Matching Involving Unicode

    Hi,
    I'm trying to do a regex match using boost::regex and followed the instructions on http://niemannross.com/developer/wiki/index.php?title=Using_boost_regular_expressions_(re gexp)_in_InDesign_CS/CS2/CS3_plug-in_code
    My regex needs to match a line that ends with punctuation characters and return the string that excluding the ending punctuation characters.
    ex. home -> home
    ex regex: (.*?)[ \\x{201C}\\x{201D}]+$
    however it does not match the line.
    I tried using boost::u32regex but i'm getting a boost::icu_regex_traits::translate_nocase symbols not found error on linking.
    How can I go around this problem?
    Thanks in advance!
    -- Jeff

    Escaping the backslash in '\x' is necessary for your programming language, otherwise it is interpreted as a 'real' hex character. So as it is, this feeds '\x{201C]' into your program, rather than the literal 0x201C code. (It'd be a syntax error for C, but you get the point.)
    However: because this is an expression IN GREP inside your running program, I think you have to escape it again, so it might need double double backslashes. Scripts suffer the same problem.

  • How to replace regex match into a char value (in the middle of a string)

    Hi uncle_alice and other great regex gurus
    One of my friends has a peculiar problem and I cant give him a solution.
    Using String#replaceAll(), i.e. NOT a Matcher loop, how could we convert matched digit string such as "65" into a char of its numeric value. That is, "65" should be converted into letter 'A'.
    Here's the failing code:
    public class GetChar{
      public static void main(String[] args){
        String orig = "this is an LF<#10#> and this is an 'A'<#65#>";
        String regx = "(<#)(\\d+)#>";
        //expected result : "this is an LF\n and this is an 'A'A"
        String result = orig.replaceAll(regx, "\\u00$2");
        // String result = orig.replaceAll(regx, "\\\\u00$2"); //this also doesn't work
        System.out.println(result);

    I don't know that we have lost anything substantial.i think its just that the kind of task this is
    especially useful for is kind of a blind-spot in the
    range of things java is a good-fit for (?)
    for certain tasks (eg process output munging) an
    experienced perl programmer could knock up (in perl)
    using built-in language features a couple of lines
    which in java could takes pages to do. If the cost is
    readability/maintainability/expandability etc.. then
    this might be a problem, but for a number of
    day-to-day tasks it isn't
    i'm trying to learn perl at the moment for this exact
    reason :)Yes. And when a Java source-code processor(a.k.a. compiler) sees the code like:
    line = line.replaceAll(regexp,  new String(new char[] {(char)(Integer.parseInt("$1"))}));or,
    line = line.replaceAll(regexp,  doMyProcessOn("$1")); //doMyProcess returns a Stringa common sense should have told him that "$1" isn't a literal string "$1" in this regular expression context.
    By the way, I abhor Perl code becaus of its incomprehensibleness. They can't be read by an average common sense. Java code can be, sort of ...

  • E50 number to contact name matching bug?

    Hello,
    I just got an E50 (RM-170, software V 07.36.00). 
    It seems to have a very irritating bug in the way numbers are matched to contacts in the 'recent calls' lists:
    - an entry in the list will show the contact name *only if* the number appears only once in the contact list;
    - otherwise the entry only shows the number. 
    This happens all the time with my well organized (synced ith my PC) contact list where 2 members of the same family typically have entries with different mobile numbers, but same landline numbers. In that case the landline number will never be matched to a contact name.
    To me this is not a 'feature', but definitely a bug. All other (non-Nokia) phones I have owned have the much more usable behaviour of showing in the calls list the name for the first (or last - who cares) contact that matches a call number.
    Is there a  software update that fies this bug for the E50?
    -- FL 

    cjlim wrote:
     What you are supposed to do is to save all the different number for a contact under the same contact name eg. Joe Soap - Home, Joe soap - mobile etc.
     Well this is exactly what I do: I have
     - Joe Soap
        - Home (xxxxxxxxx)
        - Mobile (yyyyyyyyy)
    But I have also:
     - Jane Soap
        - Home (xxxxxxxxx)
        - Mobile (zzzzzzzzz)
     And this where the Nokia bug hits:
    When Joe or Jane Soap call me from their home number xxxxxxxxx (which is the same for both contacts):
    - any other (sensible)  phone would match xxxxxxxxx to a contact (Joe or Jane Soap - I don't really care which Soap);
    - but the Nokia insists on showing the bare number xxxxxxxxx, without matching it to a contact.

  • Itunes match bugs and lost playlists

    There`s too many glitches, bugs, freezes etc, It`s too easy to lose playlists (I lost several years worth).  Keep your library backed up locally, this is no substitute for time machine!  Overall, it`s a nice idea, but it needs a safety mechanism to keep the playlists backed up, it seems a hard drive glitch on one machine can result in deletions being replicated across the cloud quietly in the background.
    Also, there`s no point in the iphone feature, because with a large library the iphone can`t cope and just crashes espcially on 3G.
    Things were better in the days of an ipod classic, syncing through USB to an itunes library stored locally, at least it worked!  I think I`ll go back to that until things improve.

    iTunes Match does over-promise, it certainly has a number of shortfalls (all of which now well documented in thisctforum) but what it enables is very impressive.
    For synchronisation acrorss multiple libraries it is very good, provided you steer clear of Smart Playlists the iDevice capability is very good - though keeping one or more iDevices on a manual synchronise to library certainly provides a worthwhile route depending upon how you choose to work with your music.
    Overall the service is impressive and certainly worth staying with, though it is not for all, your suggested route of waiting for the service to mature is perfectly reasonable.

  • Regex - matching literal characters

    Im trying to match the following pattern using regex:
    The string begins with a literal '\' is followed by any number of letters and/or numbers and ends with '&0]'
    e.g. '\07761739009B&0]'
    Im trying to devise my pattern but Im not exactly sure how to work with matching literal characters, I was lead to believe a '//' would dictate that the character is literal but this doesnt work:
    Pattern Serial = Pattern.compile("(\\/.*+\\&0])");Thanks in advance for any advice

    \ is an escape character both in Java string literals and in regex.
    "\\" produces a String containing a single \ character. But for a literal \, regex needs \\. So "\\\\" produces a single string containing \\ which in regex becomes a single literal \.
    Also, I don't think you need to escape &. And you might need to escape ] but I'm not sure--it might be okay bare if there was no preceding [.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Regex matches function

    Hi
    I am trying to come up with a regex that I can use with the matches function to validate the user id I accept. The user id can contain alphabets, numbers and 3 special chars ".","-" and "#".
    The regex I came up with was: user_id.matches("[a-zA-Z\\d\\.\\-#]"). The string I am trying to match is 'user-1'. But this fails to match.
    I am not confident about the regex I am using to match my string to. Please let me know what I am doing wrong.
    Thanks

    I would use '+' rather than '*'; I doubt that an empty string would be considered a valid user ID. ^_^   String regex = "[a-zA-Z0-9#.-]+";

  • Regex matcher class

    Hi
    I have a simple problem in regex.
    Whenever i try to write this piece of code i get an illegalStateException
    Matcher m = p.matcher(" absdsdfksj ");
    while (m.find()) {
         System.out.println("At loc : " + m.start());
         System.out.println("Found : " + m.group());
    But if i rewrite these two console print lines into one line then i dont get any exception and it runs fine:
    while (m.find()) {
         System.out.println("At loc : " + m.start() + " " + m.group());
    Pls clarify the difference.
    Thansk in advance
    Gaurav

    There must be more to the problem because I can run without problems
            Pattern p = Pattern.compile("s");
                Matcher m = p.matcher(" absdsdfksj ");
                while (m.find())
                    System.out.println("At loc : " + m.start());
                    System.out.println("Found : " + m.group());
                Matcher m = p.matcher(" absdsdfksj ");
                while (m.find())
                    System.out.println("At loc : " + m.start() + " " + m.group());
            }What pattern are you using on what data? Please give a sample of both.

  • Pattern regex matching advice needed

    Hi All,
    Many thanks for any/all advice :)
    Here's my problem. I'm trying to scan a text file for...
    \foo(parm1|parm2)
    ...in which I want the sub-string "parm1|parm2"
    So... [\\]foo matches the first section. No problem...
    It's when I try adding the '(' or ')' that I'm getting errors.
    java.util.regex.PatternSyntaxException: Unclosed character class near index
    [\]foo(.*)
    Basically, I'm trying to create a pattern, which can recognize \foo(parms), and extract the parms sections.
    Any ideas?

    Yes you can do this. It is not allowed in basic java but there are always around the syntax rules. What you can do it use AspectJ plugin in for eclipse and define a cutpoint and make it extend from two classes. What it does is it parses the byte code and inputs the code directly into the byte code. It's pretty neat.
    A simplier approach would be to have two classes A and B. Have A extend BASE and then have B Extend A and then therefore B "isa" A and a BASE.
    Hope this helps.

  • Regex Matching on Capture groups

    I have this regular expression:
    (throw|give)(?: ([1-3][A-B]))+
    given this input:
    throw 1A 2C 1B 3C
    How would I capture each of the items 1A 2C 1B and 3C?
    In the above expression I have 3 capture groups
    group0: whole expression
    group1: (throw|give)
    group2: ((?:1|2|3)(?:A|B|C))
    The problem is that when I execute the find() on my matcher it tries matching the whole expression at once! That means for the group 2 I always get the last match only:
    group0: throw 1A 2C 1B 3C
    group1: throw
    group2: 3C
    How do I get the matcher to only match on ONE capture group at a time?! Is it possible? I thought that was the purpose of the find() method. The documentation says find() matches on a "subsequence", yet I can only get it to match on the whole expression. Plus, I don't see where "subsequence" is defined in the documentation. What am I missing here?

    "Attempts to find the next subsequence of the input sequence that matches the pattern." (my emphasis)
    find() matches the whole regex, not components thereof. What you need to do is use one regex to match the whole expression and return a capture group with the digit-letter pairs, and then use another regex on that capture group to extract the pairs one at a time.*******************************************************************************
    Answer provided by Friends of the Water Cooler. Please inform forum admin via the
    'Discuss the JDC Web Site' forum that off-topic threads should be supported.

  • Util.regex matcher.groupCount()

    Hello all. I am trying to parse some text using regex. What I am parsing may have 1 or more matches per line and I need access to each match independantly. The code shown below works well in finding all matches except for the m.groupCount() always returns 0. Thus I can't to anything with individual matches. How can get the groupCount() to function properly?
    Thanks in advance.
    f (currentLine.startsWith("LOCUSLINK")){
                      line++;
                      String pattern = "[0-9]+";
                      Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
                      Matcher m = p.matcher(currentLine);
                      while(m.find()) {
                        int count = m.groupCount();
                        for (int x = 0; x <= m.groupCount(); x++)
                        System.out.println(line+"="+x+"="+count);

    There aren't capturing groups and really don't need to use in this case.
    Try this simple way:
    String re = "\\d+";
    Matcher m = Pattern.compile(re).matcher(anyString);
    for (int j=1; m.find(); j++) {
    System.out.println("matching " + j + ": " + m.group(0));
    ..

  • Regex - match a word except when it's preceeded by another word

    Does anyone know how to write a regular expression that will match an occurrence of a word except when it's preceeded by another word? I'm trying to match all occurrences of the word "function" except when it's part of the phrase "end function". Is that possible in a single regular expression?

    Maybe this is just how it works, but I'm not sure why a string
    with one space wouldn't match but a string with two would.At the beginning of the spaces, the lookbehind causes the match to fail, but then the Matcher bumps ahead one position and tries again. At that point, the lookbehind expression doesn't apply anymore, so you get a match. (You should be able to confirm this by counting the spaces in your output.) I tried using the "aggressive plus" to force it to treat all the spaces as one atom, but that didn't work:
      Pattern p = Pattern.compile("(?<!end)(\\s++)function");I don't see how to do this using "pure" lookaround, but if you don't mind matching the preceding word, this will work:
      Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
                                  Pattern.MULTILINE);Getting pretty hairy, I know, but it matches the word "function", either as the first thing on the line, or preceded by a word that is not "end" (those first couple of \b's are there to ensure that only the whole word "end" will block the match). Here's how you would use this pattern to replace "function" with "method", except when it's preceded by "end":
    import java.util.regex.*;
    public class Test
      public static void main(String[] args)
        String target = "end function\n"
                      + "function test\n"
                      + "functioning test\n"
                      + "test function\n"
                      + "test function end\n"
                      + "end    function\n"
                      + "ending function\n"
                      + "rend   function\n"
                      + "end   functioning\n";
        Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
                                    Pattern.MULTILINE);
        Matcher m = p.matcher(target);
        target = m.replaceAll("$1method");
        System.out.println(target);
    }Here's the output I get:
    end function
    method test
    functioning test
    test method
    test method end
    end    function
    ending method
    rend   method
    end   functioningOf course, if you do know that there will always be exactly one space between "end" and "function", none of this is necessary; you can just use dcostakos's original lookbehind regex--except that I would add word boundaries:
    Pattern p = Pattern.compile("(?<!end\\s)\\bfunction\\b");

Maybe you are looking for