Regex: UNGREEDY flag or (?U)
Hi,
I'd like to port a generic text processing tool, Texy!, from PHP to Java.
This tool does ungreedy matching everywhere, using `preg_match_all("/.../U")`. So I am looking for a library, which has some `UNGREEDY` flag.
I know I could use the `.*?` syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version.
I've checked
* ORO - seems to be abandoned
* Jakarta Regexp - no support
* java.util.regex - no supportIs there any such library?
Thanks, Ondra
Edited by: OndraZizka on 12.10.2009 2:48
dcminter wrote:
I know I could use the `.*?` syntax, but there are really many regular expressions I would have to overwrite, and check them with every updated version.I'm not being funny, but couldn't you write a regex to rewrite your regexes?I thought about this when I first read the post but I can't see it being easy as 'star' and +++ are not always meta characters and one also has to look for the condition where the reluctant qualifier is already applied. 'uncle_alice' might be able to do it but us mere mortals could find it dangerous to get too close to the sun.
!!! How the ???? does one get a flippin single 'star' char in this silly markup?
Similar Messages
-
Can anyone tell me why my regex is not working on my cfinput textbox please. I only want to allow a-z as available charactors to enter and the below code fails.
<cfinput type="text"
id="surname"
name="surname"
class="txt"
title="Surname"
value="#presentsurname#"
validate="regex"
pattern="[a-z]"
message="Please enter a valid Surname."
maxlength="60" />
If i type in $$$ then the regex is flagged up and I get the error message displayed.
Although if I type in a$$, I get no error message.
Thanks,
GThanks for the quick replies guys, your really helpful and both regex examples you have given both work.
To cut a long story short I was give the following regex to use against my surname textbox, and I need to follow 'their' standards so I must use this pattern.
([A-Z'\-]*)|([A-Z'\-][A-Z '\-]*[A-Z'\-])
And this should only allow A-Z (in capitals) and any of the other special charactors that are mentioned in the regex.... Although this pattern is allowing me to enter A$$ and I do not understand why it is allowing the $ sign to be an enterable charactor when the dollar sign is not listed in the regex. -
Design Question - Command Line Argument Processor
Folks,
I'm a java and OO newbie... I've been going through Sun's java tutorials
I've "enhanced" Sun's RegexTestHarness.java (using Aaron Renn's gnu.getopt package) to expose the various Pattern.FLAGS on the command line.
Whilst it does work the arguement processing code is awkward so I want to rewrite it... but I'm pretty new to OO, so before I spend days hacking away at a badly designed ArgsProcessor package I thought I'd run my deign ideas past the guru's... and atleast see if my ideas are impossible, or just plain bad.
Any comments would be greatly appreciated.
The starting point is RegexTestHarness.java/**
*@source : C:\Java\src\Tutorials\Sun\RegexTestHarness.java
*@compile : C:\Java\src\Tutorials\Sun>javac -classpath ".;C:\Java\lib\java-getopt-1.0.13.jar" RegexTestHarness.java
*@run : C:\Java\src\Tutorials\Sun>java -classpath ".;C:\Java\lib\java-getopt-1.0.13.jar" RegexTestHarness -i
*@usage : RegexTestHarness [-vcixmslud]
//http://java.sun.com/j2se/1.5.0/docs/api/java/io/package-summary.html
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
//http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/package-summary.html
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.PatternSyntaxException;
//http://www.urbanophile.com/arenn/hacking/getopt/gnu.getopt.Getopt.html
import gnu.getopt.Getopt;
import gnu.getopt.LongOpt;
* private command line options interpreter class
class Options {
public boolean verbose = false;
public int flags = 0;
public Options(String progname, String[] argv) throws IllegalArgumentException {
LongOpt[] longopts = new LongOpt[9];
longopts[0] = new LongOpt("verbose", LongOpt.NO_ARGUMENT, null, 'v');
longopts[1] = new LongOpt("CANON_EQ", LongOpt.NO_ARGUMENT, null, 'c');
longopts[2] = new LongOpt("CASE_INSENSITIVE", LongOpt.NO_ARGUMENT, null, 'i');
longopts[3] = new LongOpt("COMMENTS", LongOpt.NO_ARGUMENT, null, 'x');
longopts[4] = new LongOpt("MULTILINE", LongOpt.NO_ARGUMENT, null, 'm');
longopts[5] = new LongOpt("DOTALL", LongOpt.NO_ARGUMENT, null, 's');
longopts[6] = new LongOpt("LITERAL", LongOpt.NO_ARGUMENT, null, 'l');
longopts[7] = new LongOpt("UNICODE_CASE", LongOpt.NO_ARGUMENT, null, 'u');
longopts[8] = new LongOpt("UNIX_LINES", LongOpt.NO_ARGUMENT, null, 'd');
Getopt opts = new Getopt(progname, argv, "vcixmslud", longopts);
opts.setOpterr(false);
int c;
//String arg;
while ( (c=opts.getopt()) != -1 ) {
//arg = opts.getOptarg();
//(char)(new Integer(sb.toString())).intValue()
switch(c) {
case 'v': verbose = true; break;
//http://java.sun.com/docs/books/tutorial/essential/regex/pattern.html
case 'c': this.flags |= Pattern.CANON_EQ; break;
case 'i': this.flags |= Pattern.CASE_INSENSITIVE; break;
case 'x': this.flags |= Pattern.COMMENTS; break;
case 'm': this.flags |= Pattern.MULTILINE; break;
case 's': this.flags |= Pattern.DOTALL; break;
case 'l': this.flags |= Pattern.LITERAL; break;
case 'u': this.flags |= Pattern.UNICODE_CASE; break;
case 'd': this.flags |= Pattern.UNIX_LINES; break;
case '?': throw new IllegalArgumentException("bad switch '"+(char)opts.getOptopt()+"'"); //nb: getopt() spits
public String toString() {
StringBuffer s = new StringBuffer(128);
if (verbose) s.append("verbose, ");
if ((this.flags & Pattern.CANON_EQ) != 0) s.append("CANON_EQ, ");
if ((this.flags & Pattern.CASE_INSENSITIVE) != 0) s.append("CASE_INSENSITIVE, ");
if ((this.flags & Pattern.COMMENTS) != 0) s.append("COMMENTS, ");
if ((this.flags & Pattern.MULTILINE) != 0) s.append("MULTILINE, ");
if ((this.flags & Pattern.DOTALL) != 0) s.append("DOTALL, ");
if ((this.flags & Pattern.LITERAL) != 0) s.append("LITERAL, ");
if ((this.flags & Pattern.UNICODE_CASE) != 0) s.append("UNICODE_CASE, ");
if ((this.flags & Pattern.UNIX_LINES) != 0) s.append("UNIX_LINES, ");
if (!s.equals("")) {
s.insert(0,"{");
s.replace(s.length()-2,s.length(),"");
s.append("}");
return(s.toString());
* public regular expression test harness
public class RegexTestHarness {
public static void main(String[] argv){
BufferedReader in = null;
try {
Options options = new Options("RegexTestHarness", argv);
//System.out.println(options);
in = new BufferedReader(new InputStreamReader(System.in));
System.out.println("RegexTestHarness");
System.out.println("----------------");
System.out.println();
System.out.println("usage: Enter your regex (none to exit), then the string to search.");
System.out.println("from: http://java.sun.com/docs/books/tutorial/essential/regex/index.html");
String regex = null;
while(true) {
try {
System.out.println();
System.out.print("regex: ");
regex = in.readLine();
if (regex.equals("")) break;
Pattern pattern = Pattern.compile(regex, options.flags);
System.out.print("string: ");
Matcher matcher = pattern.matcher(in.readLine());
if (options.verbose) System.out.printf("groupCount=%d%n", matcher.groupCount());
while (matcher.find()) {
System.out.printf("%d-%d:'%s'%n", matcher.start()+1, matcher.end(), matcher.group());
//start is a zero based offset, but one based is more meaningful to the user, Me.
} catch (PatternSyntaxException e) {
System.out.println("Pattern.compile("+regex+") " + e);
} catch (IllegalStateException e) {
System.out.println("matcher.group() " + e);
} //wend
} catch (IllegalArgumentException e) {
System.out.println(e);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {in.close();} catch(Exception e){}
}... I haven't got a clue if it's possible, but I want my ArgProcessor.getArgs method to return a hash (keyed on name) of Objects of the requested "mixed" types... for example a boolean, a String, and a String[].
I want the client code of my new fangled ArgProcessor to look something like this:class testArgProcessor {
public static void main(String[] args) {
//usage testArgProcessor [-v] [-o outfile] file ...
try {
HashMap<Arguement> args = ArgProcessor.getArgs( args,
{ //hasArg value, letter, name, type, value, default
{hasArg.NONE, 'v', 'verbose', 'boolean', true, false}
, {hasArg.REQUIRED, 'o', 'outfile', 'String', null, null}
, {hasArg.ARRAY, '', 'filelist', 'String[]', null, null}
if (args.outfile != null) {
out = new BufferedWriter(......);
} else {
out = System.out;
for (String file : filelist) {
if (args.verbose) System.out.println("processingFile: " + file)
... process the file ...
} catch (IllegalArgumentException e) { //from ArgProcessor.getArgs()
System.out.println(e);
}Paul,
What are you trying to do, and why?Sorry I should have made myself a lot clearer...
What I'm really trying to do is learn Java, and good OO design... so I'm going through the Sun tutorials, and I see that the standard Pattern class has a few handy switches, so I wanted to expose them to the command line... which I did using the handy gnu.getopts library...
Are you trying to write a general purpose
command-line processing library?Yes, I'm trying to write a general purpose command-line processing library? one that's "cleaner" to use than the gnu.getopts.
I've been hacking away for a few hours and haven't gotten very far... what I have discovered is that gnu.getopts class is in fact very clever (surprise surprise)... and my idea to "simplify" it's usage leads to loss of flexibility. So, I'm starting to think I'm completely barking up the wrong tree... and that I was somewhat vanglorious thinking that I (a newbie) could improve upon it.
Are you trying to write a command-line app to do
pattern matching?Yep, That too... That's where I started... with an example from Sun's tutorials... where it's used to parse a long series of patterns and strings, exploring java's regex capabilities.
I think I'll just give up on "improving" on gnu.getopts... my options processing code is ugly, and so be it.
Thanx for your interest anyway.
Keith. -
Hi people!
Help needed one again. Basically i have a buffered reader which i use tto read a file. After i read a line i send it to a method to break down into sentences by full stop, which is done using regualr expression Pattern[,]. However when the text is written as follows:
Berline wall falls down
by Peter.
It sent the first line and then the second line. What i need it to do is keep reading until it reaches a full stop. Ive posted my code below. Any help would be much appreciated.
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class BreakSentence
Vector sentence = new Vector(500);
public BreakSentence(String fileName)
try
BufferedReader input = new BufferedReader(new FileReader(fileName));
String line = input.readLine();
while(line!=null)
makeSentence(line); //Call method to split text file into sentences based on full stop.
line = input.readLine();
input.close();
catch(FileNotFoundException e)
System.out.println(e);
catch(IOException e2)
System.out.println(e2);
private void makeSentence(String a)
Pattern p = Pattern.compile("[.]");
// Split input with the pattern
String[] sentences = p.split(a);
for( int i=0; i < sentences.length; i++ )
if(sentences[i] == null || sentences.length() == 1)
else
//String noPunc = removePunctuation(sentences[i]); //remove puncuations
//System.out.println( "With punctuation: " + sentences[i] );
//System.out.println( "Without punctuation: " + noPunc );
//sentence.add( sentences[i].trim() );
//sentence.add(noPunc.trim());
sentence.add(sentences[i]);
public void printInformation()
for(int i =0; i<sentence.size(); i++)
System.out.println("Position"+ i+ " " +sentence.get(i) );
public static void main(String[]args)
BreakSentence x = new BreakSentence( "output2.txt" );
x.printInformation();Please use code tags.
Check out [url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html]Pattern.[url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#compile(java.lang.String,%20int)] compile(String regex, int flags) and [url http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#MULTILINE]MULTILINE -
RegEx Problem with flag COMMENTS
Hello,
I have the following Exception:
java.util.regex.PatternSyntaxException: Unclosed group near index 9
when my program is running with this flags:
Pattern patt = Pattern.compile("^(@#@.+)$", Pattern.MULTILINE | Pattern.COMMENTS);but when I run this:
Pattern patt = Pattern.compile("^(@#@.+)$", Pattern.MULTILINE);it works fine.
Any COMMENTS ;-) for this problem? The entire RegEx is much bigger. I want to comment it.
Thanks sacrofanoHi,
thanks for your help, but it did not work.I did not suggest anything that would work! I was trying to point out that the Javadoc says that everything from # to the end of the pattern is treated as comment.
I run this
Pattern patt =
Pattern.compile("^(?:(@#@.+))$",(Pattern.COMMENTS));[/
code]So why, based on reading the Javadoc, would you expect this RE to compile? Everything after the # is treated as comment so your effective regular expression is "^(?:(@" which is obviously an invalid RE!
with same exception as above.
Is there a problem with the Flag Pattern.COMMENTSNo! RTFD. -
Regex: Multiple pattern flags
hi,
in java.util.regex.Pattern, to create a new Pattern, i have to use compile(String pattern, int flags) method and i need to use it with several flags ... but how ?
is it something like :
Pattern.compile(pattern, Pattern.MULTILINE + Pattern.CASE_INSENSITIVE);
or
Pattern.compile(pattern, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
or
Pattern.compile(pattern, Pattern.MULTILINE & Pattern.CASE_INSENSITIVE);
or ... something else ?
thanks in advanceI got the same problem
and I used
Pattern.compile(pattern, Pattern.MULTILINE + Pattern.CASE_INSENSITIVE);
or
Pattern.compile(pattern, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
or
Pattern.compile(pattern, Pattern.MULTILINE & Pattern.CASE_INSENSITIVE);
but I can't get the desirable result. -
Tell me how much my regex sucks, and help me make it better
uncle alice,
can you look at this and see if you see anything wrong with it, or better yet, do you know of a better solution using regex?
following regex is used to extract all links from an html page (href, img src) both absolute and relative:
(?im)(?:(?:(?:href)|(?:src))[ ]*?=[ ]*?[\"'])(((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))|((?:\\/{0,1}[\\w\\.]+)+))[\"']
String absolute = m.group(2);
String relative = m.group(3);There's a lot of good material in that regex, pedagogically speaking. :D Although you've solved your problem another way, I'd like to comment on some common errors I see.
{color:#008000}(?im){color} : For anyone who doesn't know, these are inline flags, whose effect is the same as the compiler flags CASE_INSENSITIVE & MULTILINE. But you don't need the multiline flag. People often assume they have to use that flag when they're searching for strings that may span line breaks, but all it does is change the meaning of the ^ and $ metacharacters. (By default, they match the beginning and end, respectively, of the target string; with the multiline flag set, they also match the beginning and end of logical lines within the target string.) You aren't using those anchors, so that flag is irrelevant.
{color:#008000}(?:(?:href)|(?:src)){color} : The outer set of parentheses is needed to contain the effect of the pipe, but the inner sets are just noise. In fact, most of the parens in your regex are unnecessary. Excessive grouping can significantly affect the performance of the regex if you really get carried away with it, although it takes quite a bit more than you've got here. The real problem is the visual complexity they add; regexes don't need any help in that department! :-/
{color:#008000}[ ]*?{color} : You don't need to put the space character in square brackets to match it, although doing so can make the regex a little easier to read. More importantly, an HTML tag can contain any whitespace character at that point, not just spaces, so you should use \\s instead. Also, you shouldn't use a reluctant quantifier unless the thing it's quantifying can match something you don't want it to. Since they're inherently slower than normal (greedy) quantifiers, you should take care not to use them where they aren't needed, which is the case here.
{color:#008000}(?:http|https){color} : Whenever you have two alternatives, of which one is a prefix of the other, you should list the longer one first. The alternatives are tried in the order they're listed, so listing them in the wrong order can reduce the efficiency of your regex in much the same way that using reluctant quantifiers inappropriately can. It isn't a problem here, since the next thing the regex has to match is so definite (i.e., "://"), but you should get in the habit of following that rule. In this case, you can just make the final letter optional: {color:#008000}https?{color}
{color:#008000}\\/{2}{color} and {color:#008000}\\/{0,1}{color} : You don't need to escape the forward-slash in Java regexes; that's only necessary in languages like Perl and JavaScript that have language-level support for regexes. They use the forward-slash by default as the quoting character for regex literals, so they have to escape it for the same reason we have to escape the double-quote (but some languages also let you choose different quoting characters each time). Also, I agree with paulcw that the {2} just adds unnecessary complexity in this case. As for the '{0,1}', its meaning is exactly the same as '?', so why not use that instead?.
{color:#008000}[\\/|\\.]{color} : First, you don't need any of those backslashes. The forward-slash is never special, and the period loses its special meaning inside the square brackets. The pipe is just a pipe, too, so your character class matches a slash, a pipe, or a period, which is probably not what you meant. You need to understand that character classes are like a language within a language. A regex is effectively a set of linear instructions: match this AND then this AND then this, etc.. If you want to create an OR branch, you have to do so explicitly, using a pipe or a quantifier. But the semantics change drastically when you go inside the square brackets. Since a character class only matches one character at a time, AND is irrelevant and OR is implicit: match this character OR this one OR one from this range, etc.. The only metacharacters that are needed in character classes are those that are used for set operations: the caret for NOT, hyphen for ranges, etc.; everything else is just a character.
If you'd like me to keep going, I'll need to know more about your exact intentions. Do you want the protocol (e.g., "http://") to be optional? What about the quotes around the URL? Finally, I don't understand what the final pipe in your regex is supposed to do, but I'm pretty sure it isn't working. :D -
Getting more than one result from regex groups
Thanks to everyone in advance -
I cannot seem to figure out why I wouldnt receive multiple groups back from this match. I would assume I would receive:
[hello]
[john]
instead i am getting:
[hello]
[hello]
It seems like the regex stops after the first match is found, which leads me to believe that it has to do with some sort of flag -
String Format = "[hello] my name is [sam]";
String RegexPattern = "(\\[.*?\\])";
Pattern MyPattern = Pattern.compile(RegexPattern, Pattern.CASE_INSENSITIVE | Pattern.DOTALL );
Matcher MyMatcher = MyPattern.matcher(Format);
if(MyMatcher.find()) {
for(int i = 0; i <= MyMatcher.groupCount(); i++) {
out.print(MyMatcher.group(i) +"<br>");
}Thanks,
SamGroups are a static concept. You only have one group.
while(MyMatcher.find()) {
out.print(MyMatcher.group(1) +"<br>");
} -
Boost regex not working inside indesign plugin
Hi, While i was writing the below code in a separate project inside visual studio express, It works fine!
Now when I am using the same code in a Adobe InDesign plugin then boost::regex_search fails..
I am not getting the exact reason...
Any idea for resolving this will be great help.
void MTSTestFunctions::ParseAllMarker(std::wstring& inText, std::vector &outMarkerInfoVec)
std::wstring::const_iterator start = inText.begin();
std::wstring::const_iterator end = inText.end();
boost::wregex pattern(L"((<.*?>)|(\\[[^[].*?[^]]\\])|(\\[\\[.*?\\]\\]))");
boost::wsmatch what;
boost::match_flag_type flags = boost::match_default;
int32 index = 0;
try
while(boost::regex_search(start, end, what, pattern, flags))
MarkerInfo tmpMarkerInfo;
tmpMarkerInfo.mMarkerText.assign(what[0]);
tmpMarkerInfo.mStartIndex = (what.position() + index);
index += what.position();
tmpMarkerInfo.mEndIndex = (index += what.position());
tmpMarkerInfo.mMarkerTextLength = (index + what.length());
index += what.length();
// update search position:
start = what[0].second;
// update flags:
flags |= boost::match_prev_avail;
flags |= boost::match_not_bob;
catch(std::runtime_error ex)
ThanksHi,
If I modify by above code as below for using std::tr1::regex the I gets the same crash...and error.
void MTSTestFunctions::ParseAllMarker(std::wstring& inTextO, std::vector<MarkerInfo> &outMarkerInfoVec)
std::wstring inText;
inText.assign(inTextO);
std::wstring::const_iterator start = inText.begin();
std::wstring::const_iterator end = inText.end();
std::tr1::wregex pattern(L"((<.*?>)|(\\[[^[].*?[^]]\\])|(\\[\\[.*?\\]\\]))");
std::tr1::wsmatch what;
std::tr1::regex_constants::match_flag_type flags = std::tr1::regex_constants::match_default;
int32 index = 0;
try
while(std::tr1::regex_search(start, end, what, pattern, flags))
MarkerInfo tmpMarkerInfo;
tmpMarkerInfo.mMarkerText.assign(what[0]);
tmpMarkerInfo.mStartIndex = (what.position() + index);
index += what.position();
tmpMarkerInfo.mEndIndex = (index += what.position());
tmpMarkerInfo.mMarkerTextLength = (index + what.length());
index += what.length();
// update search position:
start = what[0].second;
// update flags:
flags |= std::tr1::regex_constants::match_prev_avail;
//flags |= std::tr1::regex_constants::match_not_bob;
catch(std::runtime_error ex) -
Regex & java.util.Scanner
I am trying to make a simple txt parser using regular expressions but the problem has
appeared.
The program's code is too long so I have stated only the part of the code implementing
the method data_types() which doesn't work properly, it reads only two types (String) and (Boolean). If someone could help me I would be very gratefull.Why method doesn't read the rest of data types in my data_xml.xml file?
here is the code >
class SimpleScann{
enum PARSE{
TABLE_NAME("(\\w*)"),COLUMN_NAME("(\\w*\\Q(\\E)"),DATA_TYPE("(\\Q(\\E\\w*\\Q)\\E)");
private String $pattern;
PARSE(String pattern){
$pattern=pattern;
public String PATTERN(){
return $pattern;
static void data_types() throws Exception{
File parse_file= new File("data_type.txt");
Scanner scann_input = new Scanner(parse_file);
int flag= Pattern.CASE_INSENSITIVE;
Pattern pattern=Pattern.compile(PARSE.DATA_TYPE.PATTERN(),flag);
Matcher matcher=null;
while(scann_input.hasNextLine()){
matcher=pattern.matcher(scann_input.nextLine());
if(matcher.find()){
System.out.printf("%s\n",matcher.group());
public static void main(String args[])
try{
data_types();
}catch(Exception e){
e.printStackTrace();
and here is the data_type.txt<table > Table radi
ako su zatvoreni tagovi <>
<column>
Ime(String), Prezime(String), JMBG(Integer) ,
Enabled(Boolean)
<\column>
best regards,
NikolaThe reason you're only matching two items is because you're reading the file one line at a time and applying the regex once per line. As Tim said, you can fix that by using while instead of if, but the real problem is much deeper: you're trying to write a scanner in the sense of a lexical analyzer, and that isn't what java.util.Scanner is for. I strongly recommend you start over, this time using Pattern and Matcher directly, not Scanner. If you happen to have a copy of MRE 3ed, there's an example of what you're trying to do on page 400. (Unfortunately, Friedl has just moved back to Japan, and hasn't had time to update the book's web site, or I could point you to the code online.) I don't have time to go into this right now, but you should pay particular attention to the find(int) method and the \G anchor.
-
Hello,
I have parsed a text file and want to use a java regex pattern to get the status like "warning" and "ok" ("ok" should follow the "warning" then need to parser it ), does anyone have idea? How to find ok that follows the warning status? thanks in advance!
text example
121; test test; test0; ok; test test
121; test test; test0; ok; test test
123; test test; test1; warning; test test
124; test test; test1; ok; test test
125; test test; test2; warning; test test
126; test test; test3; warning; test test
127; test test; test4; warning; test test
128; test test; test2; ok; test test
129; test test; test3; ok; test testjava code:
String flag= "warning";
while ((line= bs.readLine()) != null) {
String[] tokens = line.split(";");
for(int i=1; i<tokens.length; i++){
Pattern pattern = Pattern.compile(flag);
Matcher matcher = pattern.matcher(tokens);
if(matcher.matches()){
// save into a listsorry, I try to expain it in more details. I want to parse this text file and save the status like "warning" and "ok" into a list. The question is I only need the "ok" that follow the "warning", that means if "test1 warning" then looking for "test1 ok".
121; content; test0; ok; 12444 <-- that i don't want to have
123; content; test1; warning; 126767
124; content; test1; ok; 1265 <-- that i need to have
121; content; test9; ok; 12444 <-- that i don't want to have
125; content; test2; warning; 2376
126; content; test3; warning; 78787
128; content; test2; ok; 877666 <-- that i need to have
129; content; test3; ok; 877666 <-- that i need to have
// here maybe a regex pattern could be deal with my problem
// if "warning|ok" then list all element with the status "warning and ok"
String flag= "warning";
while ((line= bs.readLine()) != null) {
String[] tokens = line.split(";");
for(int i=1; i<tokens.length; i++){
Pattern pattern = Pattern.compile(flag);
Matcher matcher = pattern.matcher(tokens);
if(matcher.matches()){
// save into a list -
Java regex stop after first occurrence
When using code like the following:
while (matcher.find()) {
string1=matcher.group(1).trim();
System.out.println(charset);
the program goes on looking all through the input string and prints out the final match.
What should be done to find the first occurrence and to stop searching through the input string after the first match has been found? i.e. I want to exit the while loop after the first match is found.The first .* in your regex matches as much as it can at first, and becuase you used the DOTALL flag, it's able to gobble up the whole remaining string. Then it starts backtracking, trying to match the rest of the regex, and it has to backtrack almost all the way to beginning of the string again before it gets back to the META tag where it's supposed to match (unless it finds a false match elsewhere first). That's just an example of greedy quantifiers at work; by calling it a loop you sent us all barking up the wrong scent trail.
Making that dot-star reluctant is not the solution though; the regex would then match everything from the first occurrence of "<meta" to the first occurrence of "charset", where "charset" could be in a separate META tag or just hanging loose later in the string. Getting rid of the DOTALL flag might restrict the match to just one META tag, but you can't count on that. Try this: REGEX = "<meta\\s[^<>]*?charset=([^\\s\"]+)";
pattern = Pattern.compile(REGEX, Pattern.CASE_INSENSITIVE); // the only flag you need{code} Also, if you aren't familiar with this website, you'll probably find it useful:
http://www.regular-expressions.info/ -
RegEx to ID file path that uses Wildcard appropriately
I am looking for a Regex expression to identify paths with wildcards, but pilchards only allowed in the file name and extension portion of the path. So both of these should return true
C:\Path\*.*
\\SRV\Path\*.txt
However, \\*\Path\*.* should return false since a wildcard anywhere but the end is invalid.
My Google Fu has failed me, as all my searches are coming up with discussions of wild cards in the RegEx, rather than * occurring in a specific pattern.
To give some context, I have a Copy function that is going to take Source and Destination arguments, as well as optional Breadcrumb and Overwrite arguments, and it will handle the actual copy differently depending on the source being a file, folder or wildcard
and destination being a file or folder. And of course it will return an error in the log if the destination is a wildcard.I couldn't figure out a way to do it with regex, although I did learn a lot about regex in the process. For anyone else that might find this thread, here is the best link I've found explaining regex:
http://www.freeformatter.com/regex-tester.html
Unfortunately, it wasn't good enough (or I'm just not smart enough) to puzzle this one out. The following code works, but is obviously much less elegant than a straight up regex. I post it only because I spent the time on it and if it can ever
possibly help anyone, it was worth the effort:
$a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
Foreach ($target in $a) {
$flag = $null
$path = $target -split ("\\")
$count = $path.count
If ($path[$path.count -1] -match "\*") {
Foreach ($index in 0..($path.count -2)) {
If ($path[$index] -match "\*") {$flag = 1}
If (!($flag)){$target}
PS C:\Users\user> $a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
PS C:\Users\user> Foreach ($target in $a) {
>> $flag = $null
>> $path = $target -split ("\\")
>> $count = $path.count
>> If ($path[$path.count -1] -match "\*") {
>> Foreach ($index in 0..($path.count -2)) {
>> If ($path[$index] -match "\*") {$flag = 1}
>> }
>> If (!($flag)){$target}
>> }
>> }
>>
\\path\folder\job*.*
\\blah\blagaa\blahgahah\a\f\*
That being said, I'm eager to see the 10 character or less regex expression to do it that someone is sure to post soon and make me feel silly.
I hope this post has helped! -
Cyclomatic Complexity Using Regex
/ Cyclomatic Complexity Program /
/ Program does not ignore comments in pattern /
/ Program looks for 1 pattern keywords then moves down a line for next search/
/ java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html / /
/ Using Java Regular Expression Class /
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.*;
class Cyclomatic
// uses the java i/o.*
static BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in));
protected String txtFileName;
protected int count = + 1; // start count at one then no need to add 1 to count!
private static BufferedReader reader; // Uses the java.io.*;
public void Cyclomatic()
try {
// open the file
System.out.println("------------------------------" );
System.out.println("CYCLOMATIC COMPLEXITY PROGRAM " );
System.out.println("------------------------------\n\n" );
System.out.println("Enter file name to be read: " );
// Create object to read textfile from keyboard
txtFileName = new String(keyboard.readLine());
System.out.println("\n \n");
// the buffered reader object
reader = new BufferedReader(new FileReader(txtFileName));
// Create a pattern object and split the key words using pipes |||
Pattern pattern = Pattern.compile("if|for|while|case|switch",Pattern.MULTILINE);
Matcher m = pattern.matcher(txtFileName);
boolean b = m.matches(); // return true if match found !
String line = null;
while((line = reader.readLine()) !=null)
m.reset(line);
if(m.find())
count = count +1;
System.out.println("KeyWord " + " found " + " start of line: " + m.start() + " ends at line: " m.end() " Keyword count = "+ count);
reader.close(); // close buffered reader!
if(count >10)
System.out.println("\n \nThis program according to McCabe has a COMPLEXITY OF: " + count +" \n");
else
System.out.println("\n \n This program is NOT COMPLEX \n \n");
catch(IOException e)
System.out.println("Error : " + e.getMessage());
// Run the thing!
public static void main(String[]args)
// Create object Complex
Cyclomatic Complex = new Cyclomatic();
Complex.Cyclomatic();
Does anyone have ideas as how to improve this program so that it can
ignore keywords inside comments, it finds the first keyword on a line
then jumps down to the next line to search. I know how to implement
this program using the Stream Tockenizer Class using the slashStar
comments, just interested in this alternative that I thought of. It works fine
when givin the following test file. Just want to iorn the few problems out.
Test File:
// SAVE AS A TEXT FILE AND OPEN WITH PROGRAM //
// Cyclomatic Complxity for this file is 17 //
1. if
2. if
3. while
4. for
5. if
6. case
7. case
8. if
9. switch
10. for
11. if
12. while
13. if
14. if
no
dont
count
this
15. if
16. for
Gives Correct CC for this layout.Please use [code] tags when posting source code.
End-of-line comments are easy to handle, but the multiline varieties complicate the task quite a bit. They can span multiple lines, but they don't have to, and keywords can occur after the end of a multiline comment. Since you're reading the file line-by-line, you need to use a flag to handle comments that actually span multiple lines. For the rest, you've got capturing groups and the find(int) method: // Pattern for keywords and the start of comments
Pattern p1 = Pattern.compile("(/\\*)|(//)|(if|for|while|case|switch)");
Matcher m1 = p1.matcher("");
// Pattern for the end of multiline comments
Pattern p2 = Pattern.compile("\\*/");
Matcher m2 = p2.matcher("");
boolean inComment = false;
int lineNum = 0;
String line = null;
while ((line = reader.readLine()) != null)
lineNum++;
int startAt = 0;
if (inComment)
// In multiline comment; see if it ends in this line
if (m2.reset(line).find())
inComment = false;
startAt = m2.end();
else
continue;
m1.reset(line);
while (m1.find(startAt))
if (m1.start(1) != -1)
// Start of multiline comment
if (m2.reset(line).find(m1.end()))
// If it ends in this line, we'll keep looking for keywords
startAt = m2.end();
else
// ...otherwise, just set the flag
inComment = true;
break;
else if (m1.start(2) != -1)
// End-of-line comment
break;
else
// It's a keyword
count++;
// If you aren't using Java 5, go back to the old way
System.out.printf("Keyword found in line %2d at position %2d; Keyword count = %2d\n",
lineNum, m1.start(), count);
// We only care about the first one
break;
}Here's the test data I used:1. if
2. if
3. while
4. don't count this // switch
5. for
6. if
7. don't count this /* case
8. for */ ...or this
9. case
10. /* yes count this */ case
11. if
12. switch
13. for
14. if
15. while
16. if
17. if
18. no
19. dont
20. count
21. this
22. if
23. for -
Hi all,
I want regular expression for the following script:
<script language="javascript1.1" type="text/javascript">
<!--
cmCreatePageviewTag("checkout2/shippinginfo.tmpl", "850");
//-->
</script>
I want to print the whole data between <script>to</script>, I write a R.E. like this
<script.*>.*</script>
but it not works , it only prints the first line. It should print whole script. I think i am getting problem with new line characters. Can anyone please tell me what will be the R.E. for this . It should work for any script.
Please reply as soon as possible.
Thanx,
Vinayakhttp://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
There you'll find things that could be of some help, such as :
- DOTALL flag
- reluctant quantifiers
Maybe you are looking for
-
Creative Cloud has an ObjC "leak" in latest Mac release
The latest download of the Mac Creative Cloud app has a bug. After receiving the latest CC (today) my system console continues to inform me of a "leak error" at the rate of abot four per minute. For example: "10/6/14 5:18:09.479 PM com.adobe.AdobeC
-
My IPhone 3GS just died, it stopped Vibrating a few days ago and now it won't charge.
About a month ago, my IPhone 4 got stolen so I replaced it with an old 3GS. The 3GS was working perfectly fine up until about 3 days ago when it stopped vibrating on silent. I checked the settings and it said that the vibrate was on so it was for sur
-
Airport Express will connect wired, but can't get it to connect wireless
I purchased three APE units specifically to connect my iTunes library to multiple house receivers. I went through the setup for two of them, but as soon as I unplug the ethernet cable, the wireless network won't recognize them. I tried rebooting th
-
Can't get Apple TV working with Airport Extreme
Just picked up the new AEBS and I am having a major issue with my Apple TV. It won't show in ITunes. I have tried everything and It just will not come through. I'm not posting this in the Apple TV forum or ITunes forum because I have narrowed is down
-
Which is better for framerate -- onboard sound or SBLive 5.1?
Heya, I have an SBLive 5.1 and I was just going to pitch it when I assembled my new rig with the K8T Neo. But I read a review on HardOCP (here) and it said the onboard sound is incredibly CPU intensive and Q3 took a 21% (!) framerate hit with the on