Strange result about regular expressions
Hello everybody,
I write these codes to try regular expressions in Java, but there are some strang results. I read the reference like Sun Java Tutorials. however, I cann't find the problem.
Environnement:
WindowsXP Home + NetBeans IDE 5.0 + JDK 1.5
Input String:
"I write these codes to try regular expressions in Java, but it doesn't work. I read some reference like Sun Java Tutorials. Then, always cann't find the problem. Could you help me? Thanks."
My codes:
public static void main(String[] args) throws Exception, IOException {
P.rintln("Let's go!");
Date start = new Date();
if(args.length != 1) {
P.rintln("Input Error! Input format: java javaclass [directory path]");
System.exit(0);
StringBuffer sb = new StringBuffer();
String input = TextFile.read(args[0]);
sb = addSectionEelement(input, "re");
P.rintln(sb.toString());
P.rintln("Ok, it's over");
Date end = new Date();
System.out.println("It spends " + (end.getTime() - start.getTime()) + " ms.");
public static StringBuffer addSectionEelement(String input, String regex) {
Matcher m = Pattern.compile(regex).matcher(input);
StringBuffer sb = new StringBuffer();
int count = 0;
while(m.find()) {
count++;
P.rintln(m.group());
P.rintln("Found " + count + " fois.");
return sb;
Output:
run:
Let's go!
Found 0 fois.
Ok, it's over
It spends 16 ms.
BUILD SUCCESSFUL (total time: 0 seconds)
However if I change the Bold line by
sb = addSectionEelement(input, "r");
The resultats become:
run:
Let's go!
r
r
r
r
r
r
r
r
r
r
r
Found 11 fois.
Ok, it's over
It spends 15 ms.
BUILD SUCCESSFUL (total time: 0 seconds)
I have no idea about it. And you?
Thanks
Hi guys,
I re-examine the codes. In fact, it's the problem of encodings of the input file.
See u
Similar Messages
-
Off Topic: Books about Regular Expression
Hi
Somebody can to indicate books about Regular Expression in Oracle ?
ThanksRegex tag of Blog of Volder.
http://volder-notes.blogspot.com/search/label/Regular%20Expressions
This entry mentions my regex solution :-)
http://volder-notes.blogspot.com/2007/10/removing-duplicate-elements-from-string.html
By the way
My regex homepage mentions regex problems of perl like regex (regex of EmEditor).
http://www.geocities.jp/oraclesqlpuzzle/regex/
example questions (written by Japanese language)
http://www.geocities.jp/oraclesqlpuzzle/regex/regex-2-1.html
http://www.geocities.jp/oraclesqlpuzzle/regex/regex-3-5.html
http://www.geocities.jp/oraclesqlpuzzle/regex/regex-4-4.html -
Help About Regular Expression.
Hello,
I am trying to parse string buffer by using Regular Expression.
Suppose my string buffer is:
Hi , How are you?
Hello: abc
hurrey : [ this is test msg
Pls reply to this mail
Hello: xyz
Test1
I want to search string: "Hello: anystring till end of line" which is
not included in [].
So In above example my Regular expression should only find
first "Hello: abc".
Is it possible by using Regular expression?Can we have Regular Expression which will get both "Hello: string"
suppose my string buffer is:
Hi , How are you?
Hello: abc
hurrey : [ this is test msg
Pls reply to this mail
Hello: xyz
Test1
happy: [ test2
my test
Hello: abc
then result should be :
Hello: abc
[ this is test msg
Pls reply to this mail
Hello: xyz
Test1
[ test2
my test
Hello: abc
] -
Question about Regular Expressions, please help!
I have created an app which reads files and extracts certain data using regular expressions in JDK1.4 using Pattern and Matcher classes.
However it needs to run on JDK1.2.2 (dont ask). The regular expression classes are not available in 1.2.2 (the Pattern and Matcher class) so i am looking for something similiar which i can use?
I need something that loops through all the matches found in the file like how Matcher works i.e.
while (matcher.find())
// do this
Help!http://jakarta.apache.org/regexp/
-
This question was posted in response to the following article: http://help.adobe.com/en_US/ColdFusion/10.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38 f-7fff.html
"ColdFusion supplies four functions that work with regular expressions" should be "ColdFusion supplies six functions that work with regular expressions,"
-
Basic question about regular expressions
Hello,
I am a beginner to regular expressions. I want to rewrite the following expression:
public static final String REGULAR_EXP_SOFTWARE_PART_NUMBER = "([0-9]{7}[a-z]{1})(\\-{1})([a-z]{1})";I want THIS match
(\\-{1})to occur EITHER if a hyphen is encountered OR if a space is encountered (instead of just the hyphen).
How do I rewrite this?
Thanks in advance,
Julien.Hello and thanks for your feedback,
I have created a small class as follows:
package regExpr;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author Martin
public class RegExprTest {
private static String stringToBeParsed = "3800157w-e26";
public static void main(String[] args) {
Pattern pattern = Pattern.compile("" +
"([0-9]{7})" +
"([a-z]{1})" +
"(( |-){1})" +
"([a-z]{1})" +
"([0-9]{2})" +
Matcher matcher = pattern.matcher(stringToBeParsed);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
System.out.println(matcher.group(6));
}the class is trying tobreak down the following string "3800157w-e26" as follows:
3800157(seven digits)
w(one letter)
-(hyphen)
e(one letter)
26(two digits)
Oddly enough the output of the class is as follows:
3800157
w
e
26
I have to call the group method six times and I get two hyphens!
Can anyone help?
Thanks in advance,
Julien -
Simple question about regular expressions
Hi,
Using Java's regular expression syntax, what is the correct pattern string to detect strings like the following :-
AnnnnnA
where A = a single (fixed) alphabetic character and
n = at least one but possibly many digits [0-9].
Example strings to be searched :-
A45A (this should match)
A3A (this should match)
A3446655577A (this should match)
A hello world A (this should NOT match as no digits are present between the A's).
Thanks.A least one digit "A.*\\d.*A"
Only digits "A\\d+A" -
One question about Regular Expression!!!
I need to creat such a regular expression to match the format "[ ][ ][ ]".
For example, there is a context,
(1), " The project manager defines [1][0.400][+goals] for iterations."
Suppose that there are some spaces or "\n" characters in this way,
(2), " The project manager defines [ 1 ] [ 0.400 ]
[ +goals] for iterations."
If the pattern match the format succefully, (2) strings should be replaced by (1)strings, in order words, the format of (1) is what I need finally,
I had ever tried creating a regular expression likes \\[([^\n\s]]+)\\]\\[([^\n\s]]+)\\]\\[([^\n\s]]+)\\] , but it does not work well!
DO YOU HOW TO IMPLEMENT IT IN JAVA?
Thanks for your any reply!What I really need is that, via the regular
expression, all the spaces and \n characters in
square brackets [ and ], ] and [, will be thrown
away.
For example,
Original:
1) "The project manager defines [ 1 ] [
0.400 ]
[ +goals] for iterations with the support"
After matching:
2) "The project manager defines [1][0.400][ [+goals]
for iterations with the support"
String 2) is what I need finally!
Thanks for your any reply!Well I gave you the answer to that one already :-)
If you need to preserve the spaces in between words use this one. I'm sure there's a better way to do it, I'm no RegEx master.
public static void main(String[] args)
String s = "[ 1 ] [ 0.400 ]\n[ +go als]";
System.out.println( "Before: " + s );
System.out.println( "\n\n" );
s = s.replaceAll( "\\[\\s+", "[" );
s = s.replaceAll( "\\s+\\]", "]" );
s = s.replaceAll( "\\]\\s+\\[", "][" );
System.out.println( "After: " + s );
} -
Beginner question about Regular expression
Hi all !
I'd like to use a regular expression to parse a string like this:
*<ID>4</ID><GROUP>5</GROUP>....*
So for example to retrieve the ID I have built the following regular expression:
Pattern p = Pattern.compile("<ID>(.*?)</ID>");
Matcher m = p.matcher(handle);
if (m.find()) {
System.out.println("->"+m.group());
} else {
System.out.println("No match!");
}The function m.group returns "<ID>4</ID>" but I want just the value (4) between the tag. Is there
a way to get it ?
thanks a lot
markfmarchioniscreen wrote:
thank you very much, that's exactly what I needed.
But it looks like you're parsing some XML like data: probably better to use a proper parser on it. Well it's a very short string containing XML tags. it's used in a marginal area of the application so I prefer just using a regular expression to fetch the values
thanks again
MarkYou could use XPath to get the value. -
Simple question about regular expression
Hi
I have a little problem with
select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
answer: M
I excecute this query in SQLPlus and SQL Developer result is this same.
select regexp_substr('123 Mapla Avenue','[M]') my_test from dual;
answer: M
select regexp_substr('123 Mapla Avenue','[a]') my_test from dual;
answer: a
I used oracle 10g
Thanks for your helphm wrote:
In the oracle documentation of regexp_substr you can find:Do not confuse pattern and sort. Pattern [a-z] means any lowercase letter. REGEXP_SUBSTR parameter match_param value i tells REGEXP to treat uppercase letters same as lowercase letters and vice versa. And setting NLS_SORT can do the same. As you can see it is not that straight-forward. To make it transparent use exact pattern you need. In this particular case use:
select regexp_substr('123 Mapla Avenue','[[:alpha:]]') my_test from dual;where class [:alpha:] is POSIX predefined class of all letters (regardless of case). This way you are not dependent of client side settings like NLS_SORT and the above will always return first letter within a string. If you want first uppercase letter use:
select regexp_substr('123 Mapla Avenue','[[:upper:]]') my_test from dual;Or, for first lowercase letter:
SQL> alter session set nls_sort=binary;
Session altered.
SQL> select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
M
a
SQL> select regexp_substr('123 Mapla Avenue','[[:lower:]]') my_test from dual;
M
a
SQL> alter session set nls_sort=binary_ci;
Session altered.
SQL> select regexp_substr('123 Mapla Avenue','[a-z]') my_test from dual;
M
M
SQL> select regexp_substr('123 Mapla Avenue','[[:lower:]]') my_test from dual;
M
a
SQL> SY. -
An additional question about regular expressions with String.matches
does the String.matches() method match expressions when some substring of the String matches, or does it have to match the entire String? So, if i have the String "123ABC", and i ask to match "1 or more letters" will it fail because there are non-letters in the String, but then pass if i add "1 or more letters AND 1 or more digits"? so, in the latter every character in the String is accounted for in the search, as opposed to the first. Is that correct, or are there ways to JUST match some substring in the String instead of the whole thing? i WILL make some examples too... but does that make sense?
It has to match the whole String. Use Matcher.find() to match on just a sub-string()
-
Question about Regular Expressions
Hi averyone!
Could any one help me to create RegEx for string: <object>
Thanks!
Kind Regards, Dmitry."<object>"
-
Hi everyone,
I have a question about regular expressions.
Let's say I want my program to extract last 10-digits from any URL that will be found (every found URL will end up on 10digit number!) and insert that number in the middle of other URL.
Would anyone tell me please how to do that?
Thank youI am not sure how to do that either...
Actually I just figured out that there is no garantee that the URL will be ended on 10-digit number.
Ok, my program is meant to search for the movie info on Yahoo (user enters keyword to search and chooses either 'title', 'actor', 'trailer', 'review' in drop-down menu). After the 'search' button is clicked the appropriate page is supposed to be found.
For example, if the user types in 'shrek' and chooses 'trailer', the result is supposed to be this link http://movies.yahoo.com/movie/1808405861/trailer and not the
following ones:
http://movies.yahoo.com/mv/search?p=shrek
or
http://movies.yahoo.com/shop?d=hv&cf=info&id=1808405861
So in my program the line for the 'title' search is
url = "http://movies.yahoo.com/mv/search?type=all&p=";and it works for the titles. I thought if the found link has 10 -digit number on the end I can somehow 'catch' that number and insert into another link -so the page with trailers would be pulled up (it's an ID number in yahoo database) .
But now since I am not sure if 10-digit number is going to be in the found link at all, I have no idea how to 'catch' that number.
Does anyone have any ideas for my case? -
Regular expressions in Format Definition add-on
Hello experts,
I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
Summary of my problem:
In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
:61:071222D208,00N026
:86:P 12345678BELASTINGDIENST F8R03782497 $GH
$0000009 BETALINGSKENM. 123456789123456
0 1234567891234560
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN
:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02
- I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
I am looking forward to the right solutions, I can give more info if you need any.Hello Hendri,
Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
Answer: Format Definition uses .Net regular expression.
You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
Example 6
Description:
To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
Regular expression:
(?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
Text:
:61:0711211121C216,08N051NONREF
Matches:
1
Tips:
1. All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
Example 7
Description:
Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
Regular expression:
(?<=\b)(?<!\.)\d+(?=\b)(?!\.)
Text:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
Matches:
6
Tips:
1. The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
2. It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
You may lose some real values like u201C3187u201D before the u201C-u201D.
Example 8
Description:
Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
Regular expression:
(?<=^(P|0))\d
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
2. u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
:86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
The regular expression becomes
(?<=:86:(P|0))\d
Example 9
Description:
Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
Regular expression:
(?<=^(P|0)\d)[a-zA-Z. ]*
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. In this case, put BP account number regular expression into the look behind assertion.
Example 10
Description:
Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc. A possible document identification sub-record is made up of the following parts:
u2022 keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
u2022 an optional group made up of following:
a separator of either a dot, hyphen or slash, and
an optional space, and
an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
an optional space
u2022 and finally document identification in digits
Regular expression:
(?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
Kind Regards
-Yatsea -
Help with java regular expressions
Hi all ,
i am going to match a patternstring against an input string and print the result here is my code:
import java.util.regex.*;
import java.util.*;
public class Main {
private static final String CASE_INSENSITIVE = null;
public static void main(String[] args)
CharSequence inputStr = "i have 5 years FMCG saLEs exp on java/j2ee and i worked on java and j2ee and 2 projects on telecom java j2ee domain with your with saLEs maNAger experience of java j2ee and c# having very good on c++ exposure in JAVA"
String patternStr = "\"java j2ee\" and \"c#\"";
StringTokenizer st = new StringTokenizer(patternStr,"\",OR");
Matcher matcher=null;
while(st.hasMoreTokens()){
String s=st.nextToken();
Pattern pattern = Pattern.compile(s,Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(inputStr);
while (matcher.find()) {
String result = matcher.group();
if(!result.equalsIgnoreCase(" "))
System.out.println("result:"+result);
when i compile this code i am getting the expected result...ie
result:java j2ee
result:java j2ee
result: and
result: and
result: and
result: and
result: and
result: and
result:c#
but when i replace String patternStr = "\"java j2ee\" and \"c#\""; with
String patternStr = "\"java j2ee\" and \"c++\""; i am just getting c in the result instead of c++ ie i am getting result :
result:java j2ee
result:java j2ee
result: and
result: and
result: and
result: and
result: and
result: and
result:C
result:c
result:c
result:c
result:c
result:c
result:c
In the last lines i should get result:c++ instead of result: c
Any ideas please
ThanksIn the last lines i should get result:c++ instead of result: cThe regular expression parser considers the plus sign '+' a special
character; it means: one or more times the previous regular expression.
So 'c++' means one or more 'c's on or more times. Obviously you don't
want that, you want a literal '+' plus sign. You can do that by prepending
the '+' with a backslash '\'. Unfortunately, the javac compiler considers
a backslash a special character and therefore you have to 'escape'
the backslash also, by adding another backslash. The result looks
like this:"c\\+\\+"kind regards,
Jos
Maybe you are looking for
-
Hey all, Hope someone can help me, not very good at all this technology stuff so prob asking a really obvious question but here goes........My current laptop that im using to back up my iphone 4 and syncing is pretty much dead and i have a new laptop
-
Maximum size of a compiled JSP page
Has anybody else come accross a problem with weblogic not displaying .jsp pages greater than
-
I want to select the same column name from 2 tables, combine them, and give them a certain order but I am getting: ORA-01785: ORDER BY item must be the number of a SELECT-list expression select a from table1 where a = 'TYY000007' union all select a f
-
ASA Iphone, Ipad VPN client pre-shared key (PSK) special characters bug
I ran into this in a deployment of IPSec clients with apple ipad and iphone native vpn client. Here are details: Cisco ASA 8.2.5 OS Ipad, running 5.0.1 Iphone i4S, running OS 5.0.1 Special characters make your pre-shared key more secure, so i used a
-
How do you authorize an ipod?
My computer and Itunes account is under my name. My wife has an ipod, and simply wants to copy over some of the music, but I tunes says her Ipod isn't "authorized". I see how to authorize a computer for itunes, and in fact de-authorized, then reaut