Regex: UNGREEDY flag or (?U)

Hi,
I'd like to port a generic text processing tool, Texy!, from PHP to Java.
This tool does ungreedy matching everywhere, using `preg_match_all("/.../U")`. So I am looking for a library, which has some `UNGREEDY` flag.
I know I could use the `.*?` syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version.
I've checked
    * ORO - seems to be abandoned
    * Jakarta Regexp - no support
    * java.util.regex - no supportIs there any such library?
Thanks, Ondra
Edited by: OndraZizka on 12.10.2009 2:48

dcminter wrote:
I know I could use the `.*?` syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version.I'm not being funny, but couldn't you write a regex to rewrite your regexes?I thought about this when I first read the post but I can't see it being easy as 'star' and +++ are not always meta characters and one also has to look for the condition where the reluctant qualifier is already applied. 'uncle_alice' might be able to do it but us mere mortals could find it dangerous to get too close to the sun.
!!! How the ???? does one get a flippin single 'star' char in this silly markup?

Similar Messages

  • Regex not working

    Can anyone tell me why my regex is not working on my cfinput textbox please. I only want to allow a-z as available charactors to enter and the below code fails.
    <cfinput type="text"
            id="surname"
            name="surname"
            class="txt"
            title="Surname"
            value="#presentsurname#"
            validate="regex"
            pattern="[a-z]"
            message="Please enter a valid Surname."
            maxlength="60" />
    If i type in $$$ then the regex is flagged up and I get the error message displayed.
    Although if I type in a$$, I get no error message.
    Thanks,
    G

    Thanks for the quick replies guys, your really helpful and both regex examples you have given both work.
    To cut a long story short I was give the following regex to use against my surname textbox, and I need to follow 'their' standards so I must use this pattern.
    ([A-Z'\-]*)|([A-Z'\-][A-Z '\-]*[A-Z'\-])
    And this should only allow A-Z (in capitals) and any of the other special charactors that are mentioned in the regex....  Although this pattern is allowing me to enter A$$ and I do not understand why it is allowing the $ sign to be an enterable charactor when the dollar sign is not listed in the regex.

  • Design Question - Command Line Argument Processor

    Folks,
    I'm a java and OO newbie... I've been going through Sun's java tutorials
    I've "enhanced" Sun's RegexTestHarness.java (using Aaron Renn's gnu.getopt package) to expose the various Pattern.FLAGS on the command line.
    Whilst it does work the arguement processing code is awkward so I want to rewrite it... but I'm pretty new to OO, so before I spend days hacking away at a badly designed ArgsProcessor package I thought I'd run my deign ideas past the guru's... and atleast see if my ideas are impossible, or just plain bad.
    Any comments would be greatly appreciated.
    The starting point is RegexTestHarness.java/**
    *@source  : C:\Java\src\Tutorials\Sun\RegexTestHarness.java
    *@compile : C:\Java\src\Tutorials\Sun>javac -classpath ".;C:\Java\lib\java-getopt-1.0.13.jar" RegexTestHarness.java
    *@run     : C:\Java\src\Tutorials\Sun>java -classpath ".;C:\Java\lib\java-getopt-1.0.13.jar" RegexTestHarness -i
    *@usage   : RegexTestHarness [-vcixmslud]
    //http://java.sun.com/j2se/1.5.0/docs/api/java/io/package-summary.html
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.io.BufferedReader;
    //http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/package-summary.html
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    import java.util.regex.PatternSyntaxException;
    //http://www.urbanophile.com/arenn/hacking/getopt/gnu.getopt.Getopt.html
    import gnu.getopt.Getopt;
    import gnu.getopt.LongOpt;
    * private command line options interpreter class
    class Options {
         public boolean verbose = false;
         public int flags = 0;
         public Options(String progname, String[] argv) throws IllegalArgumentException {
              LongOpt[] longopts = new LongOpt[9];
              longopts[0] = new LongOpt("verbose",               LongOpt.NO_ARGUMENT, null, 'v');
              longopts[1] = new LongOpt("CANON_EQ",               LongOpt.NO_ARGUMENT, null, 'c');
              longopts[2] = new LongOpt("CASE_INSENSITIVE",     LongOpt.NO_ARGUMENT, null, 'i');
              longopts[3] = new LongOpt("COMMENTS",               LongOpt.NO_ARGUMENT, null, 'x');
              longopts[4] = new LongOpt("MULTILINE",               LongOpt.NO_ARGUMENT, null, 'm');
              longopts[5] = new LongOpt("DOTALL",                    LongOpt.NO_ARGUMENT, null, 's');
              longopts[6] = new LongOpt("LITERAL",               LongOpt.NO_ARGUMENT, null, 'l');
              longopts[7] = new LongOpt("UNICODE_CASE",          LongOpt.NO_ARGUMENT, null, 'u');
              longopts[8] = new LongOpt("UNIX_LINES",               LongOpt.NO_ARGUMENT, null, 'd');
              Getopt opts = new Getopt(progname, argv, "vcixmslud", longopts);
              opts.setOpterr(false);
              int c;
              //String arg;
              while ( (c=opts.getopt()) != -1 ) {
                   //arg = opts.getOptarg();
                   //(char)(new Integer(sb.toString())).intValue()
                   switch(c) {
                        case 'v': verbose = true; break;
                        //http://java.sun.com/docs/books/tutorial/essential/regex/pattern.html
                        case 'c': this.flags |= Pattern.CANON_EQ; break;
                        case 'i': this.flags |= Pattern.CASE_INSENSITIVE; break;
                        case 'x': this.flags |= Pattern.COMMENTS; break;
                        case 'm': this.flags |= Pattern.MULTILINE; break;
                        case 's': this.flags |= Pattern.DOTALL; break;
                        case 'l': this.flags |= Pattern.LITERAL; break;
                        case 'u': this.flags |= Pattern.UNICODE_CASE; break;
                        case 'd': this.flags |= Pattern.UNIX_LINES; break;
                        case '?': throw new IllegalArgumentException("bad switch '"+(char)opts.getOptopt()+"'"); //nb: getopt() spits
         public String toString() {
              StringBuffer s = new StringBuffer(128);
              if (verbose) s.append("verbose, ");
              if ((this.flags & Pattern.CANON_EQ) != 0)               s.append("CANON_EQ, ");
              if ((this.flags & Pattern.CASE_INSENSITIVE) != 0)     s.append("CASE_INSENSITIVE, ");
              if ((this.flags & Pattern.COMMENTS) != 0)               s.append("COMMENTS, ");
              if ((this.flags & Pattern.MULTILINE) != 0)               s.append("MULTILINE, ");
              if ((this.flags & Pattern.DOTALL)  != 0)               s.append("DOTALL, ");
              if ((this.flags & Pattern.LITERAL) != 0)               s.append("LITERAL, ");
              if ((this.flags & Pattern.UNICODE_CASE) != 0)          s.append("UNICODE_CASE, ");
              if ((this.flags & Pattern.UNIX_LINES) != 0)               s.append("UNIX_LINES, ");
              if (!s.equals("")) {
                   s.insert(0,"{");
                   s.replace(s.length()-2,s.length(),"");
                   s.append("}");
              return(s.toString());
    * public regular expression test harness
    public class RegexTestHarness {
         public static void main(String[] argv){
              BufferedReader in = null;
              try {
                   Options options = new Options("RegexTestHarness", argv);
                   //System.out.println(options);
                   in = new BufferedReader(new InputStreamReader(System.in));
                   System.out.println("RegexTestHarness");
                   System.out.println("----------------");
                   System.out.println();
                   System.out.println("usage: Enter your regex (none to exit), then the string to search.");
                   System.out.println("from:  http://java.sun.com/docs/books/tutorial/essential/regex/index.html");
                   String regex = null;
                   while(true) {
                        try {
                             System.out.println();
                             System.out.print("regex: ");
                             regex = in.readLine();
                             if (regex.equals("")) break;
                             Pattern pattern = Pattern.compile(regex, options.flags);
                             System.out.print("string: ");
                             Matcher matcher = pattern.matcher(in.readLine());
                             if (options.verbose) System.out.printf("groupCount=%d%n", matcher.groupCount());
                             while (matcher.find()) {
                                  System.out.printf("%d-%d:'%s'%n", matcher.start()+1, matcher.end(), matcher.group());
                                  //start is a zero based offset, but one based is more meaningful to the user, Me.
                        } catch (PatternSyntaxException e) {
                             System.out.println("Pattern.compile("+regex+") " + e);
                        } catch (IllegalStateException e) {
                             System.out.println("matcher.group() " + e);
                   } //wend
              } catch (IllegalArgumentException e) {
                   System.out.println(e);
              } catch (Exception e) {
                   e.printStackTrace();
              } finally {
                   try {in.close();} catch(Exception e){}
    }... I haven't got a clue if it's possible, but I want my ArgProcessor.getArgs method to return a hash (keyed on name) of Objects of the requested "mixed" types... for example a boolean, a String, and a String[].
    I want the client code of my new fangled ArgProcessor to look something like this:class testArgProcessor {
         public static void main(String[] args) {
              //usage testArgProcessor [-v] [-o outfile] file ...
              try {
                   HashMap<Arguement> args = ArgProcessor.getArgs( args,
                        { //hasArg value, letter, name, type, value, default
                          {hasArg.NONE,     'v', 'verbose', 'boolean', true, false}
                        , {hasArg.REQUIRED, 'o', 'outfile', 'String', null, null}
                        , {hasArg.ARRAY,     '', 'filelist', 'String[]', null, null}
                   if (args.outfile != null) {
                        out = new BufferedWriter(......);
                   } else {
                        out = System.out;
                   for (String file : filelist) {
                        if (args.verbose) System.out.println("processingFile: " + file)
                        ... process the file ...
              } catch (IllegalArgumentException e) { //from ArgProcessor.getArgs()
                   System.out.println(e);
    }

    Paul,
    What are you trying to do, and why?Sorry I should have made myself a lot clearer...
    What I'm really trying to do is learn Java, and good OO design... so I'm going through the Sun tutorials, and I see that the standard Pattern class has a few handy switches, so I wanted to expose them to the command line... which I did using the handy gnu.getopts library...
    Are you trying to write a general purpose
    command-line processing library?Yes, I'm trying to write a general purpose command-line processing library? one that's "cleaner" to use than the gnu.getopts.
    I've been hacking away for a few hours and haven't gotten very far... what I have discovered is that gnu.getopts class is in fact very clever (surprise surprise)... and my idea to "simplify" it's usage leads to loss of flexibility. So, I'm starting to think I'm completely barking up the wrong tree... and that I was somewhat vanglorious thinking that I (a newbie) could improve upon it.
    Are you trying to write a command-line app to do
    pattern matching?Yep, That too... That's where I started... with an example from Sun's tutorials... where it's used to parse a long series of patterns and strings, exploring java's regex capabilities.
    I think I'll just give up on "improving" on gnu.getopts... my options processing code is ugly, and so be it.
    Thanx for your interest anyway.
    Keith.

  • Reading files in java

    Hi people!
    Help needed one again. Basically i have a buffered reader which i use tto read a file. After i read a line i send it to a method to break down into sentences by full stop, which is done using regualr expression Pattern[,]. However when the text is written as follows:
    Berline wall falls down
    by Peter.
    It sent the first line and then the second line. What i need it to do is keep reading until it reaches a full stop. Ive posted my code below. Any help would be much appreciated.
    import java.io.*;
    import java.util.*;
    import java.util.regex.*;
    public class BreakSentence
    Vector sentence = new Vector(500);
    public BreakSentence(String fileName)
         try
              BufferedReader input = new BufferedReader(new FileReader(fileName));
              String line = input.readLine();
              while(line!=null)
                   makeSentence(line); //Call method to split text file into sentences based on full stop.
                   line = input.readLine();
              input.close();
         catch(FileNotFoundException e)
              System.out.println(e);
         catch(IOException e2)
              System.out.println(e2);
    private void makeSentence(String a)
         Pattern p = Pattern.compile("[.]");
              // Split input with the pattern
              String[] sentences = p.split(a);
              for( int i=0; i < sentences.length; i++ )
                   if(sentences[i] == null || sentences.length() == 1)
                   else
                   //String noPunc = removePunctuation(sentences[i]);               //remove puncuations
                   //System.out.println( "With punctuation: " + sentences[i] );
                   //System.out.println( "Without punctuation: " + noPunc );
                   //sentence.add( sentences[i].trim() );
                   //sentence.add(noPunc.trim());
                   sentence.add(sentences[i]);
    public void printInformation()
         for(int i =0; i<sentence.size(); i++)
              System.out.println("Position"+ i+ " " +sentence.get(i)  );
    public static void main(String[]args)
    BreakSentence x = new BreakSentence( "output2.txt" );
    x.printInformation();

    Please use code tags.
    Check out [url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html]Pattern.[url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#compile(java.lang.String,%20int)] compile(String regex, int flags) and [url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#MULTILINE]MULTILINE

  • RegEx Problem with flag COMMENTS

    Hello,
    I have the following Exception:
    java.util.regex.PatternSyntaxException: Unclosed group near index 9
    when my program is running with this flags:
    Pattern patt = Pattern.compile("^(@#@.+)$", Pattern.MULTILINE | Pattern.COMMENTS);but when I run this:
    Pattern patt = Pattern.compile("^(@#@.+)$", Pattern.MULTILINE);it works fine.
    Any COMMENTS ;-) for this problem? The entire RegEx is much bigger. I want to comment it.
    Thanks sacrofano

    Hi,
    thanks for your help, but it did not work.I did not suggest anything that would work! I was trying to point out that the Javadoc says that everything from # to the end of the pattern is treated as comment.
    I run this
    Pattern patt =
    Pattern.compile("^(?:(@#@.+))$",(Pattern.COMMENTS));[/
    code]So why, based on reading the Javadoc, would you expect this RE to compile? Everything after the # is treated as comment so your effective regular expression is "^(?:(@" which is obviously an invalid RE!
    with same exception as above.
    Is there a problem with the Flag Pattern.COMMENTSNo! RTFD.

  • Regex: Multiple pattern flags

    hi,
    in java.util.regex.Pattern, to create a new Pattern, i have to use compile(String pattern, int flags) method and i need to use it with several flags ... but how ?
    is it something like :
    Pattern.compile(pattern, Pattern.MULTILINE + Pattern.CASE_INSENSITIVE);
    or
    Pattern.compile(pattern, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
    or
    Pattern.compile(pattern, Pattern.MULTILINE & Pattern.CASE_INSENSITIVE);
    or ... something else ?
    thanks in advance

    I got the same problem
    and I used
    Pattern.compile(pattern, Pattern.MULTILINE + Pattern.CASE_INSENSITIVE);
    or
    Pattern.compile(pattern, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
    or
    Pattern.compile(pattern, Pattern.MULTILINE & Pattern.CASE_INSENSITIVE);
    but I can't get the desirable result.

  • Tell me how much my regex sucks, and help me make it better

    uncle alice,
    can you look at this and see if you see anything wrong with it, or better yet, do you know of a better solution using regex?
    following regex is used to extract all links from an html page (href, img src) both absolute and relative:
    (?im)(?:(?:(?:href)|(?:src))[ ]*?=[ ]*?[\"'])(((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))|((?:\\/{0,1}[\\w\\.]+)+))[\"']
    String absolute = m.group(2);
    String relative = m.group(3);

    There's a lot of good material in that regex, pedagogically speaking. :D Although you've solved your problem another way, I'd like to comment on some common errors I see.
    {color:#008000}(?im){color} : For anyone who doesn't know, these are inline flags, whose effect is the same as the compiler flags CASE_INSENSITIVE & MULTILINE. But you don't need the multiline flag. People often assume they have to use that flag when they're searching for strings that may span line breaks, but all it does is change the meaning of the ^ and $ metacharacters. (By default, they match the beginning and end, respectively, of the target string; with the multiline flag set, they also match the beginning and end of logical lines within the target string.) You aren't using those anchors, so that flag is irrelevant.
    {color:#008000}(?:(?:href)|(?:src)){color} : The outer set of parentheses is needed to contain the effect of the pipe, but the inner sets are just noise. In fact, most of the parens in your regex are unnecessary. Excessive grouping can significantly affect the performance of the regex if you really get carried away with it, although it takes quite a bit more than you've got here. The real problem is the visual complexity they add; regexes don't need any help in that department! :-/
    {color:#008000}[ ]*?{color} : You don't need to put the space character in square brackets to match it, although doing so can make the regex a little easier to read. More importantly, an HTML tag can contain any whitespace character at that point, not just spaces, so you should use &#92;&#92;s instead. Also, you shouldn't use a reluctant quantifier unless the thing it's quantifying can match something you don't want it to. Since they're inherently slower than normal (greedy) quantifiers, you should take care not to use them where they aren't needed, which is the case here.
    {color:#008000}(?:http|https){color} : Whenever you have two alternatives, of which one is a prefix of the other, you should list the longer one first. The alternatives are tried in the order they're listed, so listing them in the wrong order can reduce the efficiency of your regex in much the same way that using reluctant quantifiers inappropriately can. It isn't a problem here, since the next thing the regex has to match is so definite (i.e., "://"), but you should get in the habit of following that rule. In this case, you can just make the final letter optional: {color:#008000}https?{color}
    {color:#008000}&#92;&#92;/{2}{color} and {color:#008000}&#92;&#92;/{0,1}{color} : You don't need to escape the forward-slash in Java regexes; that's only necessary in languages like Perl and JavaScript that have language-level support for regexes. They use the forward-slash by default as the quoting character for regex literals, so they have to escape it for the same reason we have to escape the double-quote (but some languages also let you choose different quoting characters each time). Also, I agree with paulcw that the {2} just adds unnecessary complexity in this case. As for the '{0,1}', its meaning is exactly the same as '?', so why not use that instead?.
    {color:#008000}[&#92;&#92;/|&#92;&#92;.]{color} : First, you don't need any of those backslashes. The forward-slash is never special, and the period loses its special meaning inside the square brackets. The pipe is just a pipe, too, so your character class matches a slash, a pipe, or a period, which is probably not what you meant. You need to understand that character classes are like a language within a language. A regex is effectively a set of linear instructions: match this AND then this AND then this, etc.. If you want to create an OR branch, you have to do so explicitly, using a pipe or a quantifier. But the semantics change drastically when you go inside the square brackets. Since a character class only matches one character at a time, AND is irrelevant and OR is implicit: match this character OR this one OR one from this range, etc.. The only metacharacters that are needed in character classes are those that are used for set operations: the caret for NOT, hyphen for ranges, etc.; everything else is just a character.
    If you'd like me to keep going, I'll need to know more about your exact intentions. Do you want the protocol (e.g., "http://") to be optional? What about the quotes around the URL? Finally, I don't understand what the final pipe in your regex is supposed to do, but I'm pretty sure it isn't working. :D

  • Getting more than one result from regex groups

    Thanks to everyone in advance -
    I cannot seem to figure out why I wouldnt receive multiple groups back from this match. I would assume I would receive:
    [hello]
    [john]
    instead i am getting:
    [hello]
    [hello]
    It seems like the regex stops after the first match is found, which leads me to believe that it has to do with some sort of flag -
         String Format = "[hello] my name is [sam]";
              String RegexPattern = "(\\[.*?\\])";
              Pattern MyPattern = Pattern.compile(RegexPattern, Pattern.CASE_INSENSITIVE | Pattern.DOTALL );
              Matcher MyMatcher = MyPattern.matcher(Format);
              if(MyMatcher.find()) {
                   for(int i = 0; i <= MyMatcher.groupCount(); i++) {
                        out.print(MyMatcher.group(i) +"<br>");
              }Thanks,
    Sam

    Groups are a static concept. You only have one group.
    while(MyMatcher.find()) {
        out.print(MyMatcher.group(1) +"<br>");
    }

  • Boost regex not working inside indesign plugin

    Hi, While i was writing the below code in a separate project inside visual studio express, It works fine!
    Now when I am using the same code in a Adobe InDesign plugin then boost::regex_search fails..
    I am not getting the exact reason...
    Any idea for resolving this will be great help.
    void MTSTestFunctions::ParseAllMarker(std::wstring& inText, std::vector &outMarkerInfoVec)
        std::wstring::const_iterator start = inText.begin();
        std::wstring::const_iterator end = inText.end();
        boost::wregex pattern(L"((<.*?>)|(\\[[^[].*?[^]]\\])|(\\[\\[.*?\\]\\]))");
        boost::wsmatch what;
        boost::match_flag_type flags = boost::match_default;
        int32 index = 0;
        try
            while(boost::regex_search(start, end, what, pattern, flags))
                MarkerInfo tmpMarkerInfo;
                tmpMarkerInfo.mMarkerText.assign(what[0]);
                tmpMarkerInfo.mStartIndex = (what.position() + index);
                index += what.position();
                tmpMarkerInfo.mEndIndex = (index += what.position());
                tmpMarkerInfo.mMarkerTextLength = (index + what.length());
                index += what.length();
                // update search position:
                start = what[0].second;
                // update flags:
                flags |= boost::match_prev_avail;
                flags |= boost::match_not_bob;
        catch(std::runtime_error ex)
    Thanks

    Hi,
    If I modify by above code as below for using std::tr1::regex the I gets the same crash...and error.
    void MTSTestFunctions::ParseAllMarker(std::wstring& inTextO, std::vector<MarkerInfo> &outMarkerInfoVec)
        std::wstring inText;
        inText.assign(inTextO);
        std::wstring::const_iterator start = inText.begin();
        std::wstring::const_iterator end = inText.end();
        std::tr1::wregex pattern(L"((<.*?>)|(\\[[^[].*?[^]]\\])|(\\[\\[.*?\\]\\]))");
        std::tr1::wsmatch what;
        std::tr1::regex_constants::match_flag_type flags =  std::tr1::regex_constants::match_default;
        int32 index = 0;
        try
            while(std::tr1::regex_search(start, end, what, pattern, flags))
                MarkerInfo tmpMarkerInfo;
                tmpMarkerInfo.mMarkerText.assign(what[0]);
                tmpMarkerInfo.mStartIndex = (what.position() + index);
                index += what.position();
                tmpMarkerInfo.mEndIndex = (index += what.position());
                tmpMarkerInfo.mMarkerTextLength = (index + what.length());
                index += what.length();
                // update search position:
                start = what[0].second;
                // update flags:
                flags |=  std::tr1::regex_constants::match_prev_avail;
                //flags |=  std::tr1::regex_constants::match_not_bob;
        catch(std::runtime_error ex)

  • Regex & java.util.Scanner

    I am trying to make a simple txt parser using regular expressions but the problem has
    appeared.
    The program's code is too long so I have stated only the part of the code implementing
    the method data_types() which doesn't work properly, it reads only two types (String) and (Boolean). If someone could help me I would be very gratefull.Why method doesn't read the rest of data types in my data_xml.xml file?
    here is the code >
    class SimpleScann{
           enum PARSE{
              TABLE_NAME("(\\w*)"),COLUMN_NAME("(\\w*\\Q(\\E)"),DATA_TYPE("(\\Q(\\E\\w*\\Q)\\E)");
              private String $pattern;
              PARSE(String pattern){
                   $pattern=pattern;
              public String PATTERN(){
                   return $pattern;
         static void data_types() throws Exception{     
              File parse_file= new File("data_type.txt");
              Scanner     scann_input = new Scanner(parse_file);     
              int flag= Pattern.CASE_INSENSITIVE;
              Pattern pattern=Pattern.compile(PARSE.DATA_TYPE.PATTERN(),flag);
              Matcher matcher=null;
              while(scann_input.hasNextLine()){
                   matcher=pattern.matcher(scann_input.nextLine());
                   if(matcher.find()){
                        System.out.printf("%s\n",matcher.group());
         public static void main(String args[])
              try{
                   data_types();
              }catch(Exception e){
                   e.printStackTrace();
    and here is the data_type.txt<table          > Table radi
    ako su zatvoreni tagovi     <>
    <column>
         Ime(String), Prezime(String), JMBG(Integer) ,
         Enabled(Boolean)
    <\column>
    best regards,
    Nikola

    The reason you're only matching two items is because you're reading the file one line at a time and applying the regex once per line. As Tim said, you can fix that by using while instead of if, but the real problem is much deeper: you're trying to write a scanner in the sense of a lexical analyzer, and that isn't what java.util.Scanner is for. I strongly recommend you start over, this time using Pattern and Matcher directly, not Scanner. If you happen to have a copy of MRE 3ed, there's an example of what you're trying to do on page 400. (Unfortunately, Friedl has just moved back to Japan, and hasn't had time to update the book's web site, or I could point you to the code online.) I don't have time to go into this right now, but you should pay particular attention to the find(int) method and the \G anchor.

  • Java Regex Pattern

    Hello,
    I have parsed a text file and want to use a java regex pattern to get the status like "warning" and "ok" ("ok" should follow the "warning" then need to parser it ), does anyone have idea? How to find ok that follows the warning status? thanks in advance!
    text example
    121; test test; test0; ok; test test
    121; test test; test0; ok; test test
    123; test test; test1; warning; test test
    124; test test; test1; ok; test test
    125; test test; test2; warning; test test
    126; test test; test3; warning; test test
    127; test test; test4; warning; test test
    128; test test; test2; ok; test test
    129; test test; test3; ok; test testjava code:
    String flag= "warning";
              while ((line= bs.readLine()) != null) {
                   String[] tokens = line.split(";");
                   for(int i=1; i<tokens.length; i++){
                        Pattern pattern = Pattern.compile(flag);
                        Matcher matcher = pattern.matcher(tokens);
                        if(matcher.matches()){
    // save into a list

    sorry, I try to expain it in more details. I want to parse this text file and save the status like "warning" and "ok" into a list. The question is I only need the "ok" that follow the "warning", that means if "test1 warning" then looking for "test1 ok".
    121; content; test0; ok; 12444      <-- that i don't want to have
    123; content; test1; warning; 126767
    124; content; test1; ok; 1265        <-- that i need to have
    121; content; test9; ok; 12444      <-- that i don't want to have
    125; content; test2; warning; 2376
    126; content; test3; warning; 78787
    128; content; test2; ok; 877666    <-- that i need to have
    129; content; test3; ok; 877666    <-- that i need to have
    // here maybe a regex pattern could be deal with my problem
    // if "warning|ok" then list all element with the status "warning and ok"
    String flag= "warning";
              while ((line= bs.readLine()) != null) {
                   String[] tokens = line.split(";");
                   for(int i=1; i<tokens.length; i++){
                        Pattern pattern = Pattern.compile(flag);
                        Matcher matcher = pattern.matcher(tokens);
                        if(matcher.matches()){
    // save into a list

  • Java regex stop after first occurrence

    When using code like the following:
    while (matcher.find()) {
    string1=matcher.group(1).trim();
    System.out.println(charset);
    the program goes on looking all through the input string and prints out the final match.
    What should be done to find the first occurrence and to stop searching through the input string after the first match has been found? i.e. I want to exit the while loop after the first match is found.

    The first .* in your regex matches as much as it can at first, and becuase you used the DOTALL flag, it's able to gobble up the whole remaining string. Then it starts backtracking, trying to match the rest of the regex, and it has to backtrack almost all the way to beginning of the string again before it gets back to the META tag where it's supposed to match (unless it finds a false match elsewhere first). That's just an example of greedy quantifiers at work; by calling it a loop you sent us all barking up the wrong scent trail.
    Making that dot-star reluctant is not the solution though; the regex would then match everything from the first occurrence of "<meta" to the first occurrence of "charset", where "charset" could be in a separate META tag or just hanging loose later in the string. Getting rid of the DOTALL flag might restrict the match to just one META tag, but you can't count on that. Try this: REGEX = "<meta\\s[^<>]*?charset=([^\\s\"]+)";
    pattern = Pattern.compile(REGEX, Pattern.CASE_INSENSITIVE); // the only flag you need{code} Also, if you aren't familiar with this website, you'll probably find it useful:
    http://www.regular-expressions.info/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • RegEx to ID file path that uses Wildcard appropriately

    I am looking for a Regex expression to identify paths with wildcards, but pilchards only allowed in the file name and extension portion of the path. So both of these should return true
    C:\Path\*.*
    \\SRV\Path\*.txt
    However, \\*\Path\*.* should return false since a wildcard anywhere but the end is invalid.
    My Google Fu has failed me, as all my searches are coming up with discussions of wild cards in the RegEx, rather than * occurring in a specific pattern. 
    To give some context, I have a Copy function that is going to take Source and Destination arguments, as well as optional Breadcrumb and Overwrite arguments, and it will handle the actual copy differently depending on the source being a file, folder or wildcard
    and destination being a file or folder. And of course it will return an error in the log if the destination is a wildcard.

    I couldn't figure out a way to do it with regex, although I did learn a lot about regex in the process.  For anyone else that might find this thread, here is the best link I've found explaining regex:
    http://www.freeformatter.com/regex-tester.html
    Unfortunately, it wasn't good enough (or I'm just not smart enough) to puzzle this one out.  The following code works, but is obviously much less elegant than a straight up regex.  I post it only because I spent the time on it and if it can ever
    possibly help anyone, it was worth the effort:
    $a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
    Foreach ($target in $a) {
    $flag = $null
    $path = $target -split ("\\")
    $count = $path.count
    If ($path[$path.count -1] -match "\*") {
    Foreach ($index in 0..($path.count -2)) {
    If ($path[$index] -match "\*") {$flag = 1}
    If (!($flag)){$target}
    PS C:\Users\user> $a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
    PS C:\Users\user> Foreach ($target in $a) {
    >> $flag = $null
    >> $path = $target -split ("\\")
    >> $count = $path.count
    >> If ($path[$path.count -1] -match "\*") {
    >> Foreach ($index in 0..($path.count -2)) {
    >> If ($path[$index] -match "\*") {$flag = 1}
    >> }
    >> If (!($flag)){$target}
    >> }
    >> }
    >>
    \\path\folder\job*.*
    \\blah\blagaa\blahgahah\a\f\*
    That being said, I'm eager to see the 10 character or less regex expression to do it that someone is sure to post soon and make me feel silly.
    I hope this post has helped!

  • Cyclomatic Complexity Using Regex

    / Cyclomatic Complexity Program               /
    /          Program does not ignore comments in pattern /
    / Program looks for 1 pattern keywords then moves down a line for next search/
    / java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html     /                                   /
    / Using Java Regular Expression Class               /
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;     
    import java.io.*;
    class Cyclomatic
              // uses the java i/o.*
              static BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in));
              protected String txtFileName;
              protected int count = + 1; // start count at one then no need to add 1 to count!
              private static BufferedReader reader; // Uses the java.io.*;
              public void Cyclomatic()
              try {
                        // open the file
                        System.out.println("------------------------------" );
                        System.out.println("CYCLOMATIC COMPLEXITY PROGRAM " );
                        System.out.println("------------------------------\n\n" );
                        System.out.println("Enter file name to be read: " );
                        // Create object to read textfile from keyboard
                        txtFileName = new String(keyboard.readLine());
                        System.out.println("\n \n");
                        // the buffered reader object
                        reader = new BufferedReader(new FileReader(txtFileName));
                        // Create a pattern object and split the key words using pipes |||
                        Pattern pattern = Pattern.compile("if|for|while|case|switch",Pattern.MULTILINE);
                        Matcher m = pattern.matcher(txtFileName);
                        boolean b = m.matches(); // return true if match found !
                        String line = null;
                        while((line = reader.readLine()) !=null)
                             m.reset(line);
                             if(m.find())
                                  count = count +1;
                                  System.out.println("KeyWord " + " found " + " start of line: " + m.start() + " ends at line: " m.end() " Keyword count = "+ count);
                        reader.close(); // close buffered reader!
                        if(count >10)
                             System.out.println("\n \nThis program according to McCabe has a COMPLEXITY OF: " + count +" \n");
                        else
                             System.out.println("\n \n This program is NOT COMPLEX \n \n");
                   catch(IOException e)
                        System.out.println("Error : " + e.getMessage());
         // Run the thing!
         public static void main(String[]args)
              // Create object Complex
              Cyclomatic Complex = new Cyclomatic();
              Complex.Cyclomatic();
    Does anyone have ideas as how to improve this program so that it can
    ignore keywords inside comments, it finds the first keyword on a line
    then jumps down to the next line to search. I know how to implement
    this program using the Stream Tockenizer Class using the slashStar
    comments, just interested in this alternative that I thought of. It works fine
    when givin the following test file. Just want to iorn the few problems out.
    Test File:
    // SAVE AS A TEXT FILE AND OPEN WITH PROGRAM //
    // Cyclomatic Complxity for this file is 17 //
    1.     if
    2. if
    3.     while
    4.     for
    5.     if
    6.     case
    7.     case
    8.     if
    9.     switch
    10.     for
    11.     if
    12.     while
    13.     if
    14.     if
    no
         dont
         count
         this
    15.     if
    16.     for
    Gives Correct CC for this layout.

    Please use [code] tags when posting source code.
    End-of-line comments are easy to handle, but the multiline varieties complicate the task quite a bit. They can span multiple lines, but they don't have to, and keywords can occur after the end of a multiline comment. Since you're reading the file line-by-line, you need to use a flag to handle comments that actually span multiple lines. For the rest, you've got capturing groups and the find(int) method:      // Pattern for keywords and the start of comments
          Pattern p1 = Pattern.compile("(/\\*)|(//)|(if|for|while|case|switch)");
          Matcher m1 = p1.matcher("");
          // Pattern for the end of multiline comments
          Pattern p2 = Pattern.compile("\\*/");
          Matcher m2 = p2.matcher("");
          boolean inComment = false;
          int lineNum = 0;
          String line = null;
          while ((line = reader.readLine()) != null)
            lineNum++;
            int startAt = 0;
            if (inComment)
              // In multiline comment; see if it ends in this line
              if (m2.reset(line).find())
                inComment = false;
                startAt = m2.end();
              else
                continue;
            m1.reset(line);
            while (m1.find(startAt))
              if (m1.start(1) != -1)
                // Start of multiline comment
                if (m2.reset(line).find(m1.end()))
                  // If it ends in this line, we'll keep looking for keywords
                  startAt = m2.end();
                else
                  // ...otherwise, just set the flag
                  inComment = true;
                  break;
              else if (m1.start(2) != -1)
                // End-of-line comment
                break;
              else
                // It's a keyword
                count++;
                // If you aren't using Java 5, go back to the old way
                System.out.printf("Keyword found in line %2d at position %2d; Keyword count = %2d\n",
                                  lineNum, m1.start(), count);
                // We only care about the first one
                break;
          }Here's the test data I used:1. if
    2. if
    3. while
    4. don't count this // switch
    5. for
    6. if
    7. don't count this /* case
    8. for */ ...or this
    9. case
    10. /* yes count this */ case
    11. if
    12. switch
    13. for
    14. if
    15. while
    16. if
    17. if
    18. no
    19. dont
    20. count
    21. this
    22. if
    23. for

  • Need Help for RegEx

    Hi all,
    I want regular expression for the following script:
    <script language="javascript1.1" type="text/javascript">
    <!--
    cmCreatePageviewTag("checkout2/shippinginfo.tmpl", "850");
    //-->
    </script>
    I want to print the whole data between <script>to</script>, I write a R.E. like this
    <script.*>.*</script>
    but it not works , it only prints the first line. It should print whole script. I think i am getting problem with new line characters. Can anyone please tell me what will be the R.E. for this . It should work for any script.
    Please reply as soon as possible.
    Thanx,
    Vinayak

    http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
    There you'll find things that could be of some help, such as :
    - DOTALL flag
    - reluctant quantifiers

Maybe you are looking for

  • Displaying computer on TV

    I have hooked up the macbook air by HDMI/Mini Display cable, now how do I get it to display the computer on the tv?

  • New features in Pr - Coming Soon!

    http://provideocoalition.com/ssimmons/story/an-early-look-at-the-next-version-of-adobe-pre miere-pro Thanks for listening to some of my feature-requests, Adobe!

  • Does defragmenting the start up drive have implications for Time Machine?

    Will Time Machine have to create a totally new backup if I defrag the startup drive?

  • Adding on a free year

    One of the companies I work for gave me a complimentary 1 year free of Creative Cloud.  I was already a member so I wanted to use this when my agreement ended, which it's going to do at the end of february.  I wanted to know if there was some way I c

  • Maximum number of lines reached in FI - No PGI allowed

    Hi, Does anyone know how this issue with posting lines more than 500 lines in a delivery can be resolved?  i.e. does something need to be done in the copy control to split the delivery?  This needs to be contolled from the SD side and not FI summariz