Strange result about regular expressions

Hello everybody,
I write these codes to try regular expressions in Java, but there are some strang results. I read the reference like Sun Java Tutorials. however, I cann't find the problem.
Environnement:
WindowsXP Home + NetBeans IDE 5.0 + JDK 1.5
Input String:
"I write these codes to try regular expressions in Java, but it doesn't work. I read some reference like Sun Java Tutorials. Then, always cann't find the problem. Could you help me? Thanks."
My codes:
public static void main(String[] args) throws Exception, IOException {
P.rintln("Let's go!");
Date start = new Date();
if(args.length != 1) {
P.rintln("Input Error! Input format: java javaclass [directory path]");
System.exit(0);
StringBuffer sb = new StringBuffer();
String input = TextFile.read(args[0]);
sb = addSectionEelement(input, "re");
P.rintln(sb.toString());
P.rintln("Ok, it's over");
Date end = new Date();
System.out.println("It spends " + (end.getTime() - start.getTime()) + " ms.");
public static StringBuffer addSectionEelement(String input, String regex) {
Matcher m = Pattern.compile(regex).matcher(input);
StringBuffer sb = new StringBuffer();
int count = 0;
while(m.find()) {
count++;
P.rintln(m.group());
P.rintln("Found " + count + " fois.");
return sb;
Output:
run:
Let's go!
Found 0 fois.
Ok, it's over
It spends 16 ms.
BUILD SUCCESSFUL (total time: 0 seconds)
However if I change the Bold line by
sb = addSectionEelement(input, "r");
The resultats become:
run:
Let's go!
r
r
r
r
r
r
r
r
r
r
r
Found 11 fois.
Ok, it's over
It spends 15 ms.
BUILD SUCCESSFUL (total time: 0 seconds)
I have no idea about it. And you?
Thanks

Hi guys,
I re-examine the codes. In fact, it's the problem of encodings of the input file.
See u

Similar Messages

  • Off Topic: Books about Regular Expression

    Hi
    Somebody can to indicate books about Regular Expression in Oracle ?
    Thanks

    Regex tag of Blog of Volder.
    http://volder-notes.blogspot.com/search/label/Regular%20Expressions
    This entry mentions my regex solution :-)
    http://volder-notes.blogspot.com/2007/10/removing-duplicate-elements-from-string.html
    By the way
    My regex homepage mentions regex problems of perl like regex (regex of EmEditor).
    http://www.geocities.jp/oraclesqlpuzzle/regex/
    example questions (written by Japanese language)
    http://www.geocities.jp/oraclesqlpuzzle/regex/regex-2-1.html
    http://www.geocities.jp/oraclesqlpuzzle/regex/regex-3-5.html
    http://www.geocities.jp/oraclesqlpuzzle/regex/regex-4-4.html

  • Help About Regular Expression.

    Hello,
    I am trying to parse string buffer by using Regular Expression.
    Suppose my string buffer is:
    Hi , How are you?
    Hello: abc
    hurrey : [ this is test msg
    Pls reply to this mail
    Hello: xyz
    Test1
    I want to search string: "Hello: anystring till end of line" which is
    not included in [].
    So In above example my Regular expression should only find
    first "Hello: abc".
    Is it possible by using Regular expression?

    Can we have Regular Expression which will get both "Hello: string"
    suppose my string buffer is:
    Hi , How are you?
    Hello: abc
    hurrey : [ this is test msg
    Pls reply to this mail
    Hello: xyz
    Test1
    happy: [ test2
    my test
    Hello: abc
    then result should be :
    Hello: abc
    [ this is test msg
    Pls reply to this mail
    Hello: xyz
    Test1
    [ test2
    my test
    Hello: abc
    ]

  • Question about Regular Expressions, please help!

    I have created an app which reads files and extracts certain data using regular expressions in JDK1.4 using Pattern and Matcher classes.
    However it needs to run on JDK1.2.2 (dont ask). The regular expression classes are not available in 1.2.2 (the Pattern and Matcher class) so i am looking for something similiar which i can use?
    I need something that loops through all the matches found in the file like how Matcher works i.e.
    while (matcher.find())
    // do this
    Help!

    http://jakarta.apache.org/regexp/

  • About regular expressions

    This question was posted in response to the following article: http://help.adobe.com/en_US/ColdFusion/10.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38 f-7fff.html

    "ColdFusion supplies four functions that work with regular expressions" should be "ColdFusion supplies six functions that work with regular expressions,"

  • Basic question about regular expressions

    Hello,
    I am a beginner to regular expressions. I want to rewrite the following expression:
    public static final String REGULAR_EXP_SOFTWARE_PART_NUMBER = "([0-9]{7}[a-z]{1})(\\-{1})([a-z]{1})";I want THIS match
    (\\-{1})to occur EITHER if a hyphen is encountered OR if a space is encountered (instead of just the hyphen).
    How do I rewrite this?
    Thanks in advance,
    Julien.

    Hello and thanks for your feedback,
    I have created a small class as follows:
    package regExpr;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * @author Martin
    public class RegExprTest {
         private static String stringToBeParsed = "3800157w-e26";
         public static void main(String[] args) {
              Pattern pattern = Pattern.compile("" +
                        "([0-9]{7})" +
                        "([a-z]{1})" +
                        "(( |-){1})" +
                        "([a-z]{1})" +
                        "([0-9]{2})" +
              Matcher matcher = pattern.matcher(stringToBeParsed);
              while(matcher.find()){
                   System.out.println(matcher.group(1));
                   System.out.println(matcher.group(2));
                   System.out.println(matcher.group(3));
                   System.out.println(matcher.group(4));
                   System.out.println(matcher.group(5));
                   System.out.println(matcher.group(6));
    }the class is trying tobreak down the following string "3800157w-e26" as follows:
    3800157(seven digits)
    w(one letter)
    -(hyphen)
    e(one letter)
    26(two digits)
    Oddly enough the output of the class is as follows:
    3800157
    w
    e
    26
    I have to call the group method six times and I get two hyphens!
    Can anyone help?
    Thanks in advance,
    Julien

  • Simple question about regular expressions

    Hi,
    Using Java's regular expression syntax, what is the correct pattern string to detect strings like the following :-
    AnnnnnA
    where A = a single (fixed) alphabetic character and
    n = at least one but possibly many digits [0-9].
    Example strings to be searched :-
    A45A (this should match)
    A3A (this should match)
    A3446655577A (this should match)
    A hello world A (this should NOT match as no digits are present between the A's).
    Thanks.

    A least one digit "A.*\\d.*A"
    Only digits "A\\d+A"

  • One question about Regular Expression!!!

    I need to creat such a regular expression to match the format "[ ][ ][ ]".
    For example, there is a context,
    (1), " The project manager defines [1][0.400][+goals] for iterations."
    Suppose that there are some spaces or "\n" characters in this way,
    (2), " The project manager defines [    1 ] [  0.400   ]
    [   +goals] for iterations."
    If the pattern match the format succefully, (2) strings should be replaced by (1)strings, in order words, the format of (1) is what I need finally,
    I had ever tried creating a regular expression likes \\[([^\n\s]]+)\\]\\[([^\n\s]]+)\\]\\[([^\n\s]]+)\\] , but it does not work well!
    DO YOU HOW TO IMPLEMENT IT IN JAVA?
    Thanks for your any reply!

    What I really need is that, via the regular
    expression, all the spaces and \n characters in
    square brackets [ and ], ] and [, will be thrown
    away.
    For example,
    Original:
    1) "The project manager defines [   1  ] [
    0.400 ]
    [   +goals] for iterations with the support"
    After matching:
    2) "The project manager defines [1][0.400][ [+goals]
    for iterations with the support"
    String 2) is what I need finally!
    Thanks for your any reply!Well I gave you the answer to that one already :-)
    If you need to preserve the spaces in between words use this one. I'm sure there's a better way to do it, I'm no RegEx master.
        public static void main(String[] args)
            String s = "[ 1 ] [ 0.400 ]\n[ +go als]";
            System.out.println( "Before: " + s );
            System.out.println( "\n\n" );
            s = s.replaceAll( "\\[\\s+", "[" );
            s = s.replaceAll( "\\s+\\]", "]" );
            s = s.replaceAll( "\\]\\s+\\[", "][" );
            System.out.println( "After: " + s );
        }

  • Beginner question about Regular expression

    Hi all !
    I'd like to use a regular expression to parse a string like this:
    *<ID>4</ID><GROUP>5</GROUP>....*
    So for example to retrieve the ID I have built the following regular expression:
    Pattern p = Pattern.compile("<ID>(.*?)</ID>");  
    Matcher m = p.matcher(handle);    
    if (m.find()) {
          System.out.println("->"+m.group());     
    } else {
    System.out.println("No match!");   
    }The function m.group returns "<ID>4</ID>" but I want just the value (4) between the tag. Is there
    a way to get it ?
    thanks a lot
    mark

    fmarchioniscreen wrote:
    thank you very much, that's exactly what I needed.
    But it looks like you're parsing some XML like data: probably better to use a proper parser on it. Well it's a very short string containing XML tags. it's used in a marginal area of the application so I prefer just using a regular expression to fetch the values
    thanks again
    MarkYou could use XPath to get the value.

  • Simple question about regular expression

    Hi
    I have a little problem with
    select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
    answer: M
    I excecute this query in SQLPlus and SQL Developer result is this same.
    select regexp_substr('123 Mapla Avenue','[M]') my_test from dual;
    answer: M
    select regexp_substr('123 Mapla Avenue','[a]') my_test from dual;
    answer: a
    I used oracle 10g
    Thanks for your help

    hm wrote:
    In the oracle documentation of regexp_substr you can find:Do not confuse pattern and sort. Pattern [a-z] means any lowercase letter. REGEXP_SUBSTR parameter match_param value i tells REGEXP to treat uppercase letters same as lowercase letters and vice versa. And setting NLS_SORT can do the same. As you can see it is not that straight-forward. To make it transparent use exact pattern you need. In this particular case use:
    select regexp_substr('123 Mapla Avenue','[[:alpha:]]') my_test from dual;where class [:alpha:] is POSIX predefined class of all letters (regardless of case). This way you are not dependent of client side settings like NLS_SORT and the above will always return first letter within a string. If you want first uppercase letter use:
    select regexp_substr('123 Mapla Avenue','[[:upper:]]') my_test from dual;Or, for first lowercase letter:
    SQL> alter session set nls_sort=binary;
    Session altered.
    SQL> select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
    M
    a
    SQL> select regexp_substr('123 Mapla Avenue','[[:lower:]]') my_test from dual;
    M
    a
    SQL> alter session set nls_sort=binary_ci;
    Session altered.
    SQL> select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
    M
    M
    SQL> select regexp_substr('123 Mapla Avenue','[[:lower:]]') my_test from dual;
    M
    a
    SQL> SY.

  • An additional question about regular expressions with String.matches

    does the String.matches() method match expressions when some substring of the String matches, or does it have to match the entire String? So, if i have the String "123ABC", and i ask to match "1 or more letters" will it fail because there are non-letters in the String, but then pass if i add "1 or more letters AND 1 or more digits"? so, in the latter every character in the String is accounted for in the search, as opposed to the first. Is that correct, or are there ways to JUST match some substring in the String instead of the whole thing? i WILL make some examples too... but does that make sense?

    It has to match the whole String. Use Matcher.find() to match on just a sub-string()

  • Question about Regular Expressions

    Hi averyone!
    Could any one help me to create RegEx for string: <object>
    Thanks!
    Kind Regards, Dmitry.

    "<object>"

  • Regular expressions for URLs

    Hi everyone,
    I have a question about regular expressions.
    Let's say I want my program to extract last 10-digits from any URL that will be found (every found URL will end up on 10digit number!) and insert that number in the middle of other URL.
    Would anyone tell me please how to do that?
    Thank you

    I am not sure how to do that either...
    Actually I just figured out that there is no garantee that the URL will be ended on 10-digit number.
    Ok, my program is meant to search for the movie info on Yahoo (user enters keyword to search and chooses either 'title', 'actor', 'trailer', 'review' in drop-down menu). After the 'search' button is clicked the appropriate page is supposed to be found.
    For example, if the user types in 'shrek' and chooses 'trailer', the result is supposed to be this link http://movies.yahoo.com/movie/1808405861/trailer and not the
    following ones:
    http://movies.yahoo.com/mv/search?p=shrek
    or
    http://movies.yahoo.com/shop?d=hv&cf=info&id=1808405861
    So in my program the line for the 'title' search is
    url = "http://movies.yahoo.com/mv/search?type=all&p=";and it works for the titles. I thought if the found link has 10 -digit number on the end I can somehow 'catch' that number and insert into another link -so the page with trailers would be pulled up (it's an ID number in yahoo database) .
    But now since I am not sure if 10-digit number is going to be in the found link at all, I have no idea how to 'catch' that number.
    Does anyone have any ideas for my case?

  • Regular expressions in Format Definition add-on

    Hello experts,
    I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
    Summary of my problem:
    In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
    :61:071222D208,00N026
    :86:P  12345678BELASTINGDIENST       F8R03782497                $GH
    $0000009                         BETALINGSKENM. 123456789123456
    0 1234567891234560                                            
    :61:071225C758,70N078
    :86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
    CITY 48772-54314                                                  
    :61:071225C425,05N078
    :86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
    LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    
    :61:071225C850,00N078
    :86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
    DERNR. 53846 REF. MAIL 21-02
    - I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
    Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
    - a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
    - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
    - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
    I am looking forward to the right solutions, I can give more info if you need any.

    Hello Hendri,
    Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
    Answer: Format Definition uses .Net regular expression.
    You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
    Example 6
    Description:
    To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
    Regular expression:
    (?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
    Text:
    :61:0711211121C216,08N051NONREF
    Matches:
    1
    Tips:
    1.     All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D  (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
    Example 7
    Description:
    Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
    Regular expression:
    (?<=\b)(?<!\.)\d+(?=\b)(?!\.)
    Text:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    Matches:
    6
    Tips:
    1.     The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
    2.     It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    You may lose some real values like u201C3187u201D before the u201C-u201D.
    Example 8
    Description:
    Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
    Regular expression:
    (?<=^(P|0))\d
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
    2.     u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
    :86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
    The regular expression becomes
    (?<=:86:(P|0))\d
    Example 9
    Description:
    Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
    Regular expression:
    (?<=^(P|0)\d)[a-zA-Z. ]*
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     In this case, put BP account number regular expression into the look behind assertion.
    Example 10
    Description:
    Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc.  A possible document identification sub-record is made up of the following parts:
    u2022     keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
    u2022     an optional group made up of following:
         a separator of either a dot, hyphen or slash, and
         an optional space, and
         an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
         an optional space
    u2022     and finally document identification in digits
    Regular expression:
    (?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
    Kind Regards
    -Yatsea

  • Help with java regular expressions

    Hi all ,
    i am going to match a patternstring against an input string and print the result here is my code:
         import java.util.regex.*;
         import java.util.*;
         public class Main {
              private static final String CASE_INSENSITIVE = null;
              public static void main(String[] args)
              CharSequence inputStr = "i have 5 years FMCG saLEs exp on java/j2ee and i worked on java and j2ee and 2 projects on telecom java j2ee domain with your  with saLEs maNAger experience of java j2ee and c# having very good  on c++ exposure in JAVA"
             String patternStr = "\"java j2ee\" and \"c#\"";
              StringTokenizer st = new StringTokenizer(patternStr,"\",OR");
             Matcher matcher=null;
              while(st.hasMoreTokens()){
                   String s=st.nextToken();
                   Pattern pattern = Pattern.compile(s,Pattern.CASE_INSENSITIVE);
               matcher = pattern.matcher(inputStr);
               while (matcher.find()) {
                  String result = matcher.group();
                 if(!result.equalsIgnoreCase(" "))
                             System.out.println("result:"+result);
         when i compile this code i am getting the expected result...ie
    result:java j2ee
    result:java j2ee
    result: and
    result: and
    result: and
    result: and
    result: and
    result: and
    result:c#
    but when i replace String patternStr = "\"java j2ee\" and \"c#\""; with
    String patternStr = "\"java j2ee\" and \"c++\""; i am just getting c in the result instead of c++ ie i am getting result :
    result:java j2ee
    result:java j2ee
    result: and
    result: and
    result: and
    result: and
    result: and
    result: and
    result:C
    result:c
    result:c
    result:c
    result:c
    result:c
    result:c
    In the last lines i should get result:c++ instead of result: c
    Any ideas please
    Thanks

    In the last lines i should get result:c++ instead of result: cThe regular expression parser considers the plus sign '+' a special
    character; it means: one or more times the previous regular expression.
    So 'c++' means one or more 'c's on or more times. Obviously you don't
    want that, you want a literal '+' plus sign. You can do that by prepending
    the '+' with a backslash '\'. Unfortunately, the javac compiler considers
    a backslash a special character and therefore you have to 'escape'
    the backslash also, by adding another backslash. The result looks
    like this:"c\\+\\+"kind regards,
    Jos

Maybe you are looking for