Util.regex matcher.groupCount()

Hello all. I am trying to parse some text using regex. What I am parsing may have 1 or more matches per line and I need access to each match independantly. The code shown below works well in finding all matches except for the m.groupCount() always returns 0. Thus I can't to anything with individual matches. How can get the groupCount() to function properly?
Thanks in advance.
f (currentLine.startsWith("LOCUSLINK")){
 line++;
 String pattern = "[0-9]+";
 Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
 Matcher m = p.matcher(currentLine);
 while(m.find()) {
 int count = m.groupCount();
 for (int x = 0; x <= m.groupCount(); x++)
 System.out.println(line+"="+x+"="+count);

There aren't capturing groups and really don't need to use in this case.
Try this simple way:
String re = "\\d+";
Matcher m = Pattern.compile(re).matcher(anyString);
for (int j=1; m.find(); j++) {
System.out.println("matching " + j + ": " + m.group(0));
..

Similar Messages

[bug]Jdev 11g:NullPointerException at java.util.regex.Matcher.getTextLength

Hi,
Jdev 11.1.1.0.31.51.56
If somebody of you get the following trace stack when running a jspx using ViewCriteriaRow.setOperator :
There is bug 7534359 and metalink note 747353.1 available.
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
at java.util.regex.Matcher.reset(Matcher.java:291)
at java.util.regex.Matcher.<init>(Matcher.java:211)
at java.util.regex.Pattern.matcher(Pattern.java:888)
at oracle.adfinternal.view.faces.model.binding.FacesCtrlSearchBinding._loadFilter
CriteriaValues(FacesCtrlSearchBinding.java:3695)
Truncated. see log file for complete stacktrace
Workaround:
If you use
 vcr.setAttribute("Job",job);
or
 vcr.setAttribute("Job","="+job);
than add following line of code:
 vcr.setOperator("Job","="); regards
Peter

Hi,
useful to mention that this happens when setting the equal operator or LIKE operator
vcr.setAttribute("Job","= '"+job+"'");
or
vcr.setOperator("Job","=");
Frank

Parsing xhtml using java.util.regex

I am parsing an XHTML file using the java.util.regex package and I am perplexed at why the following doesn�t work.
The lines I wish to match are either like this:
Some String.</td>
or
Some String.</td>
The code I use to try to achieve this is:
Pattern somePattern = Pattern.compile(".*()?(.*)[.]()?</td>.*");
String s = null;
while((s = br.readLine()) != null) {
if(somePattern.matcher(s).matches()) {
System.out.println("0:" + eventMatcher.group(0));
System.out.println("1:" + eventMatcher.group(1));
System.out.println("2:" + eventMatcher.group(2));
System.out.println("3:" + eventMatcher.group(3));
I expect to get as output
0:Some String.</td> 1:
2:Some String
3:
or
0:Some String.</td>
1:null
2:Some String
3:null
depending on which lines provide the match as mentioned above. Instead I get:
0:Some String.</td>
1:null
2:(empty string)
3:
or
0:Some String.</td>
1:null
2:(empty string)
3:null
Any ideas? Thanks in advance.

Consider the terms of ".*()?(.*)[.]()?</td>.*"
.* - greedily collect characters
()? - optionallly collect information taht will always be matched by the previous .* pattern so will be empty.
(.*) - greedily collect characters that will also have been swallowed by the first .* so will be empty
[.] - a single .
()? - optionally collection
</td> - must be there
.* - collect the rest of the charcters in the line.
Therefore in general groups 1 and 2 will be empty because the first .* will have collected the information you wanted to capture!
You could just make the first .* non-greedy by using .*? but this may fail for other reasons.
So, in general terms, what are you trying to extract?

Using the java.util.regex package

I am trying to use the regex package to grep out portions of a string that match a regular expression. Some code from my class
 String fileList = "1_tmp.txt 2_tmp.txt 3_tmp.txt 1_inpt.txt 2_inpt.txt 3_inpt.txt 1_out.txt 2_out.txt 3_out.txt";
 String regex = "[0-9].*_out.txt" ;
 Pattern pat = Pattern.compile(regex);
 Matcher m = pat.matcher(fileList);
 boolean succ = m.find();
 for (int i =0; i< m.groupCount(); i++) {
 System.out.println(m.group(i));
 }I was expecting to see
1_out.txt
2_out.txt
3_out.txt
This doesnt work this way, nothing gets printed out. Anybody has any idea what is wrong here?
I am basically trying to get the same functionality as
ls | grep "*_out.txt"
Also is there any way to do a OR in the regular expression, like
match [0-9]*_out.txt OR [0-9]*_out.txt
Thanks
-kn

I don't see a difference in your OR, but look in the Pattern JavaDocs under logical operators.
Second, groups are not appropriate in your situation. You have a single pattern which you wish to repeatedly find. Groups are used when you wish to pick out one subsection of a matched item. For example, a pattern dealing with a phone number might have three groups, the area code, the prefix, and the rest.
Still, group 0 is by definition the entire matched item, so if your pattern was right, it should have worked.
I think the critical piece is the ".*" section. * is a greedy marker and combined with the 'any chacter' marker will eat as many characters as it can. In this case, it's probably eating the rest of the string, which leaves nothing left for the "_out.txt" part of the pattern. Try using the *?, which is a reluctant marker and will basically look ahead in the pattern and eat the minimum number of characters it can.

Doubt in Regular Expressions : java.util.regex

I want to identify words starting with capital letters in a sentence and I want to replace the matched word with "#" added in front of it.... For example, if my input sentence is
"Christopher Williams asked Mike Muller a question"
my output should be,
"#Christopher #Williams asked #Mike #Muller a question"
How do I do that using java.util.regex ?
In perl we can can use *"back references"* in *"replacement string"* . Perl replacement accepts back references whereas java replacement method accepts only strings....
Please help me.....

Your replacement is swallowing the space before the uppercase character, and won't match at the beginning of the line.
Also, it's unnecesarily verbose. String has a replaceAll method (that calls the same methods of Pattern and Matcher under the covers)sentence = s.replaceAll("(^| )([A-Z])", "$1#$2");Disclaimer: I'm no prome, sabre or u/a :-) That can probably be simplified.
db

Regular expressions with java.util.regex

Hello Guys,
I wrote last time this
* Uses split to break up a string of input separated by
* commas and/or whitespace.
* See: http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/
* Change: I have slightly modified the source
import java.util.regex.*;
public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[<>\\s]+");
// Split input with the pattern
String[] result =
p.split("<element attributname1 = \"attributwert1\" attributname2 = \"attributwert2\">");
for (int i=0; i<result.length; i++)
if (result.equals(""))
System.out.println("EMPTY");
else
System.out.println(result[i]);
int res = result.length - 1;
System.out.println("\nStringlaenge: " + res);
I wonder, why I got an empty element in reult[0]. Have anyone an idea?
We'll come together next time
... �nhan Inay ([email protected])

What is wrong with this Pattern?
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
This time i used following Split:
p.split("<element attributname1=\"attributwert1\" attributname2=\"attributwert2\">");
I've got a compilation error:
U:\qms_neu\htdocs\inay\Source\myWork\Regex-Samples>javac Splitter.java
Splitter.java:14: illegal start of expression
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: illegal character: \92
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: illegal character: \92
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: unclosed string literal
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:17: ')' expected
p.split("<element attributname1=\"attributwert1\" attributname2
=\"attributwert2\">");
^
Splitter.java:14: unexpected type
required: variable
found : value
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:18: cannot resolve symbol
symbol : variable result
location: class Splitter
for (int i=0; i<result.length; i++)
^
Splitter.java:19: cannot resolve symbol
symbol : variable result
location: class Splitter
if (result.equals("")){
^
Splitter.java:21: cannot resolve symbol
symbol : variable result
location: class Splitter
System.out.println(result[0]);
^
Splitter.java:24: cannot resolve symbol
symbol : variable result
location: class Splitter
System.out.println(result[i]);
^
Splitter.java:25: cannot resolve symbol
symbol : variable result
location: class Splitter
int res = result.length - 1;
^
11 errors

Util.regex.Pattern documentation

The 1.5 documentation for util.regex.Pattern defines quantifiers that are greedy, reluctant, or possessive. The definitions of these quantifiers seem to be the same. For example, X?, X??, and X?+ are each defined as "X, once or not at all." Is this a mistake? If not, what's that difference among greedy, reluctant, and possessive?

It's not a mistake, it's just incomplete. A normal (greedy) quantifier matches as many times as it can, but will back off if necessary to achieve an overall match. A reluctant quantifier matches the minimum number of times that it has to, and only tries to match more if that's the only way to achieve an overall match. A greedy quantifier matches as many times as it can and never backs off, even if that makes an overall match impossible. Here's a demonstration:import java.util.regex.*;
public class Test
public static void main(String[] args)
    String input = "XXXXX";
    Pattern p1 = Pattern.compile("(X+)(X+)");
    Pattern p2 = Pattern.compile("(X+?)(X+)");
    Pattern p3 = Pattern.compile("(X++)(X+)");
    Matcher m = p1.matcher(input);
    if (m.matches())
       System.out.println("p1:\t" + m.group(1) + "\t" + m.group(2));
    m = p2.matcher(input);
    if (m.matches())
       System.out.println("p2:\t" + m.group(1) + "\t" + m.group(2));
    m = p3.matcher(input);
    if (m.matches())
       System.out.println("p3:\t" + m.group(1) + "\t" + m.group(2));
p1:     XXXX    X
p2:     X       XXXXIn p1, the X+ in the first group initially matches all five X's, then hands off to the second group. The X+ there has to match at least one X, but there are none left. So the first group gives up one of its X's, the second group matches it, and Bob's your uncle.
In p2, the X+? has to match at least one X, so it does, then hands off to the second group, which happily gobbles up the rest of the input.
In p3, the X++ matches all the X's, but refuses to back off and give the X+ in the second group the one X it needs, so the match fails.

WHERE to find [b]java.util.regex.*[\b] package?

Does anyone know where to obtain a copy of the java.util.regex.* package (separate package)? This is a new package included in version 1.4.0.

Does anyone know where to obtain a copy of the
java.util.regex.* package (separate package)?
This is a new package included in version 1.4.0.Simple:
Go to your java sdk directory
$ jar xvf src.jar
$ cd java/util/regex
$ javac ASCII.java
$ javac Matcher.java
$ javac Pattern.java
$ javac PatternSyntaxException.java
$ cd ../../..
$ jar cvf regex.jar java/util/regex/*.class
et voila; how difficult can that be ? There is no native, JVM depending stuff in there, although I did not check for dependencies on other new stuff inside the jdk.
I'm also unaware if this isn't illegal under the agreement with Sun.

Ava.util.regex.pattern and * - + /

hi...
i'm korean... so I can't speak english.. sorry..^^
but i hava a problem..
import java.util.regex.*;
public class Operator
/     public static void main(String args[])
          String operator="/";
////////////////////////////////////////////////////////////// error point..
          Pattern pattern=Pattern.compile(operator);
          Matcher m=pattern.matcher("- ----* / */* /+");
          int count=0;
          while(m.find()) {
     count++;
          System.out.println(count);
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+
operator : / - : ok...
operator : * + : error...
i had to use + *..
what's problem??

Are you using matches()? Then keep in mind that it requires that the entire String is matched by the RE.
pattern.matcher("about:foobar").matches(); //will return false, as "foobar" is not matched by your pattern
pattern.matcher("about:").matches(); //will return true
pattern.matcher("about:foobar").find(); //will return true
pattern.matcher("notabout:foobar").find(); // will return false

Pattern regex matching advice needed

Hi All,
Many thanks for any/all advice :)
Here's my problem. I'm trying to scan a text file for...
\foo(parm1|parm2)
...in which I want the sub-string "parm1|parm2"
So... [\\]foo matches the first section. No problem...
It's when I try adding the '(' or ')' that I'm getting errors.
java.util.regex.PatternSyntaxException: Unclosed character class near index
[\]foo(.*)
Basically, I'm trying to create a pattern, which can recognize \foo(parms), and extract the parms sections.
Any ideas?

Yes you can do this. It is not allowed in basic java but there are always around the syntax rules. What you can do it use AspectJ plugin in for eclipse and define a cutpoint and make it extend from two classes. What it does is it parses the byte code and inputs the code directly into the byte code. It's pretty neat.
A simplier approach would be to have two classes A and B. Have A extend BASE and then have B Extend A and then therefore B "isa" A and a BASE.
Hope this helps.

Ignore word (Java.util.regex )

Hello All,
Can anyone help me to solve this probelm:
Probelm: I have a text file and i want to search a word or combination of words in that using java.util.regex
Example : in the sentence "things like the Forestry in the Commission (FC)." i want to search "Forestry Commission" while ignoring the word "in the". This ignoring criteria is specific i.e. search return true only if it ignore "in the" word not any other word.
Also how i ignore multiple words in ignore condition.
Thanks in advance.

Try out this line of code:
System.out.println("Forestry in the Commission".replaceAll("Forestry\\s(.*?\\s)?Commission", "Hello"));In EBNF, it looks like this:
match ::= "Forestry" <whitespace> [<character> <whitespace>] "Commission"
whitespace ::= <space> | \t | \n | \x0B | \f | \r
character ::= (any one character)
This, of course, is an ambiguous EBNF definition. The breakdown of the expression, however, reveals why this works. In the string, "\\s" refers to a character of whitespace. "(.*?\\s)?" is where the magic happens: it causes ".*?\\s" to happen either not at all or once. ".*?" will consume as few "." (any character) as possible to make the match, and the following "\\s" is to make sure that strings like "Forestry deCommission" don't match. The EBNF's ambiguity comes from EBNF's lack of ability to describe "reluctant qualifiers": qualifiers that indicate that as few of the given expression should be matched as possible.
Cheers!

Regex - match a word except when it's preceeded by another word

Does anyone know how to write a regular expression that will match an occurrence of a word except when it's preceeded by another word? I'm trying to match all occurrences of the word "function" except when it's part of the phrase "end function". Is that possible in a single regular expression?

Maybe this is just how it works, but I'm not sure why a string
with one space wouldn't match but a string with two would.At the beginning of the spaces, the lookbehind causes the match to fail, but then the Matcher bumps ahead one position and tries again. At that point, the lookbehind expression doesn't apply anymore, so you get a match. (You should be able to confirm this by counting the spaces in your output.) I tried using the "aggressive plus" to force it to treat all the spaces as one atom, but that didn't work:
Pattern p = Pattern.compile("(?<!end)(\\s++)function");I don't see how to do this using "pure" lookaround, but if you don't mind matching the preceding word, this will work:
Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
 Pattern.MULTILINE);Getting pretty hairy, I know, but it matches the word "function", either as the first thing on the line, or preceded by a word that is not "end" (those first couple of \b's are there to ensure that only the whole word "end" will block the match). Here's how you would use this pattern to replace "function" with "method", except when it's preceded by "end":
import java.util.regex.*;
public class Test
public static void main(String[] args)
 String target = "end function\n"
 + "function test\n"
 + "functioning test\n"
 + "test function\n"
 + "test function end\n"
 + "end function\n"
 + "ending function\n"
 + "rend function\n"
 + "end functioning\n";
 Pattern p = Pattern.compile("(^|(?!end\\b)\\b\\w+ +)function\\b",
 Pattern.MULTILINE);
 Matcher m = p.matcher(target);
 target = m.replaceAll("$1method");
 System.out.println(target);
}Here's the output I get:
end function
method test
functioning test
test method
test method end
end function
ending method
rend method
end functioningOf course, if you do know that there will always be exactly one space between "end" and "function", none of this is necessary; you can just use dcostakos's original lookbehind regex--except that I would add word boundaries:
Pattern p = Pattern.compile("(?<!end\\s)\\bfunction\\b");

Please help on java.util.regex.*

Hi all,
My RTF file looks like this:
Project Num\tab N/A
\par Project Name\tab Hook-up Installation and Service
\par
My intension is to read the file until the \tab and store Project Num as a string into a
variable. Similarly read until \par and store the value of Project Num into another variable.
So that i can use those variables further in my program.
I used java.util.regex.* package for this purpose. I could successfully split the sentence whenever it sees \tab and \par but don't know how to get the text before and after the delimeters and store them into variables.
The code which i wrote is:
import java.util.regex.*;
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;
public class RegexDemo{
public static void main(String[] args){
// Create a pattern to match breaks
Pattern p = Pattern.compile("\\\\.[a-z][a-z]",Pattern.DOTALL);
try{
File file = new File("sample.rtf");
FileInputStream fis = new FileInputStream(file);
FileChannel fc = fis.getChannel();
// Get a CharBuffer from the source file
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int)fc.size());
Charset cs = Charset.forName("8859_1");
CharsetDecoder cd = cs.newDecoder();
CharBuffer cb = cd.decode(bb);
// Run some matches
Matcher m = p.matcher(cb);
while (m.find())
System.out.println("Found comment: "+m.group());
}catch(Exception e){
e.printStackTrace();
Please somebody help me in this regard. I have spent lot of time searching the forums but couldn't find any solution.
Thanks in advance
rnallu

Just put target inside parenthesis with delimiters at boundaries.
Example: "(\\w+)\\t(\\d)\\s" will match occurrences of a word followed with a tab char then a digit followed with a whitespace. If target string matches pattern then m.group(1) contains the word and m.group(2) contains the digit.

Java.util.regex error

Hello,
I checked JavaDoc multiple times but do not see what is wrong with
myString.replaceAll("D:\\web\\mars","")which results in
java.util.regex.PatternSyntaxException: Illegal/unsupported escape squence near index 7
D:\web\mars
 ^
 at java.util.regex.Pattern.error(Unknown Source)
 at java.util.regex.Pattern.escape(Unknown Source)
 at java.util.regex.Pattern.atom(Unknown Source)
 at java.util.regex.Pattern.sequence(Unknown Source)
 at java.util.regex.Pattern.expr(Unknown Source)
 at java.util.regex.Pattern.compile(Unknown Source)
 at java.util.regex.Pattern.<init>(Unknown Source)
 at java.util.regex.Pattern.compile(Unknown Source)
 at java.lang.String.replaceAll(Unknown Source)
 at ArticleImageImportProcessor.main(ArticleImageImportProcessor.java:40)
Exception in thread "main" please, every suggestion/hint is most appeciated

You have to "encode" backslash twice, first for String purpose and second time because of special meaning of '\' in regular expressions.
It should looks like
myString.replaceAll("D:\\\\web\\\\mars","")

Who use sql-mapping with java.util.regex?

Hi everyone:
I use the IBatis SQL-Mapping and I think it is very good.Now I want to add the search function to my BBS forum.I also want to display the content high light like jive.I mean that if I want to find the string "ibatis",then the search result "ibatis" will be high light displayed.
So I must use the java.util.regex in jsdk1.4.But the problem is that what I get is a List if I use sql-mapping.For example:
          String resource="conf/XML/sql/lyo-sql-map.xml";
          Reader reader=Resources.getResourceAsReader(resource);
          sqlmap=XmlSqlMapBuilder.buildSqlMap(reader);
          List articlelist=sqlmap.executeQueryForList("selectSiteArticle","%"+icontent+"%");
The result I get is a List and I have no time to use regex.
I don't know whether I could do this:
Iterate the List,use the regex and later place all the object back to the List.
It's right?
How to use regex with sql-mapping?Thks

Any idea? :(

Util.regex matcher.groupCount()

Similar Messages

Maybe you are looking for