REGEX: question about finding Overlapping matches using regular expressions
I have the following problem.
Say for my pattern I use:
Pattern pattern = Pattern.compile("AAA");
Matcher matcher = pattern.matcher("AAAAAA");when I run a loop
while (matcher.find())
System.out.println("Match Found: "+matcher.start()+" "+matcher.end());I get 2 Hits shown in the following output:
Match Found: 0 3
Match Found: 3 6
therefore the regex is seeing the first AAA then the second AAA.
I want it to find the other AAA's in there that are overlapping the other two finds i.e. I want the output to find
AAA from 0 to 3
AAA from 1 to 4
AAA from 2 to 5 and finally
AAA from 3 to 6
thereby including the overlapping finds.
How can I do this using regex? what am I missing that prevents the overlapping matches to be found? Do I need a quantifier?
Thanks for the help!
While the solutions above work fine with the given input, they don't really find all overlapping matches. They just find the longest possible match at each start position. Here's a more thorough approach:import java.util.*;
import java.util.regex.*;
public class Test
public static List<String> matchAllWays(String rgx, String str)
Pattern p = Pattern.compile(rgx);
Matcher m = p.matcher(str);
List<String> result = new ArrayList<String>();
int len = str.length();
int start = 0;
int end = len;
while (start < len && m.region(start, len).find())
start = m.start();
do
result.add(m.group());
end = m.end() - 1;
} while (end > start && m.region(start, end).find());
start++;
return result;
public static void main(String[] args)
List<String> matches = matchAllWays("a.*a", "abracadabra");
System.out.println(matches);
}This approach requires JDK 1.5 or later; that's when the regions API was added to Matcher.
Similar Messages
-
Pattern matching using Regular expression
Hi,
I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
I used the condition
AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
First of all, is this possible to match multiple patterns in one condition?
If yes, please suggest.
Thanksuser2544469 wrote:
Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
- query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
WITH got_token_cnt AS
SELECT cat
, book -- Added March 22
, REPLACE (codes, '+', '*') AS codes -- If desired. Changed March 22
, LENGTH (codes) - LENGTH ( TRANSLATE ( codes
, 'x*+-'
, 'x'
) AS token_cnt
FROM test_cat
, cntr AS
SELECT LEVEL AS n
FROM ( SELECT MAX (token_cnt) AS max_token_cnt
FROM got_token_cnt
CONNECT BY LEVEL <= max_token_cnt
, got_tokens AS
SELECT t.cat
, t.book -- Added March 22
, REGEXP_SUBSTR ( t.codes
, '[*+-]'
, 1
, c.n
) AS token_type
, SUBSTR ( REGEXP_SUBSTR ( t.codes
, '[*+-][^*+-]*'
, 1
, c.n
, 2
) AS token
FROM got_token_cnt t
JOIN cntr c ON c.n <= t.token_cnt
, got_must_have_cnt AS
SELECT cat, book -- Changed March 22
, COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
FROM got_tokens
GROUP BY cat, book -- Changed March 22
SELECT mh.cat
, vn.vn_no
FROM got_must_have_cnt mh
JOIN vn ON mh.book = vn.book -- Changed March 22
LEFT OUTER JOIN got_tokens gt ON mh.cat = gt.cat
AND INSTR (vn.codes, gt.token) > 1
GROUP BY mh.cat
, mh.must_have_cnt
, vn.vn_no
HAVING COUNT (CASE WHEN gt.token_type = '*' THEN 1 END) = mh.must_have_cnt
AND COUNT (CASE WHEN gt.token_type = '-' THEN 1 END) = 0
ORDER BY mh.cat
- query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
When your query takes too long ...
HOW TO: Post a SQL statement tuning request - template posting
Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
Edited by: Frank Kulash on Mar 22, 2011 12:04 PM -
A question about using regular expression
Hi,
This is a part of HTML file.
<SPAN>how</span>
<SPAN>are</span>
<SPAN>you</span>
I want to search string between each pair of <SPAN> , </SPAN> tags by using Regular Expression.
For example:
how
are
you
If I use following method
String regx="<SPAN>(.+)<SPAN>";
Matcher m=Pattern.compile(regx).matcher(str);
int currentLoc=0;
while(currentLoc<str.length()){
if(m.find(currentLoc))
System.out.println(m.group(1));
currentLoc=m.end();
}The content between first <SPAN> and last </SPAN> will be searched.
How to solve this problem?Use a non-greedy match:
(?s)<SPAN>(.+?)<SPAN>(?s) makes the dot match the line terminator (same as setting the dot all option) -
Matching substrings between square brackets using regular expressions
Hello,
I am new at Java and have a problem with regular expressions. Let me describe the issue in 3 steps:
1.- I have an english sentence. Some words of the sentence stand between square brackets, for example "I [eat] and [sleep]"
2- I would like to match strings that are in square brackets using regular expressions (java.util.regex.*;) and here is the code I have written for the task
+Pattern findStringinSquareBrackets = Pattern.compile("\\[.*\\]");+
+ Matcher matcherOfWordInSquareBrackets = findStringinSquareBrackets.matcher("I [eat] and [sleep]");+
+//Iteration in the string+
+ while ( matcherOfWordInSquareBrackets.find() )+
+{+
+ System.out.println("Patter found! :"+ outputField.getText().substring(matcherOfWordInSquareBrackets.start(), matcherOfWordInSquareBrackets.end())+""); +
+ }+
3- the result I have after running the code described in 2 is the following: *Patter found!: [eat] and [sleep]*
That is to say that not only words between square brackets are found but also the substring "and". And this is not what I want.
What I would like to have as a result is:
*Patter found!: [eat]*
*Patter found!: [sleep]*
That is to say I want to match only the words between the square brackets and nothing else.
Does somebody know how to do this? Any help would be great.
Best regards,
AbouYou can find the words by looping through the sentence and then return the substring within the indexes.
int start=0;
int end=0;
for(int i=0; i<string.length(); i++)
if(string.substring(i,i+1).equals("[");
start=i;
if(start!=0)
if(string.substring(i,i+1).equlas("]");
end=i;
return string.substring(start,end+1);
}something like that. This code will only find the firt word however. I do not know much about regex so I cannot help anymore.
Edited by: elasolova on Jun 16, 2009 6:45 AM
Edited by: elasolova on Jun 16, 2009 6:46 AM -
Finding URLs using regular expression.
I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
"Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
My acutal problem is to find the URL irrespective its position within the string.
Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
Matcher matcher = urlPattern.matcher(plainText);
Map stringIndexMap = new HashMap();
//Searching the input string for urlPattern...
while(matcher.find()) {
String urlString = matcher.group();
//Storing the urls in a hashmap with their indices as keys....
stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
Set keySet = stringIndexMap.keySet();
Iterator it = keySet.iterator();
//Iterating over the hashmap containing urls...
while(it.hasNext()) {
String urlString = (String) stringIndexMap.get(it.next());
* Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
* using String index
clickableURLString.replace(clickableURLString.indexOf(urlString),
clickableURLString.indexOf(urlString) + urlString.length(),
"<a href=\"#\" onclick=\"window.open('" + urlString
+ "')\">" + urlString + "</a>");
return clickableURLString.toString();The end of the input is '$' as a regex.
import java.util.regex.*;
public class Prasanna{
public static void main(String[] args){
String text
= "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
// String regex = "(http|https)://.+?(?:\\s|$)"; // this works
String regex = "(http|https)://[^ ]+"; // this also works
Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher mat = pat.matcher(text);
while (mat.find()){
System.out.println(mat.group());
} -
How to use regular expression to find string
hi,
who know how to get all digits from the string "Alerts 4520 ( 227550 ) ( 98 Available )" by regular expression, thanks
br, AndrewLiu,
You can use RegEx as
d+
Whether you are using CL_ABAP_REGEX class then
report zars.
data: regex type ref to cl_abap_regex,
matcher type ref to cl_abap_matcher,
match type c length 1.
create object regex exporting pattern = 'd+'
ignore_case = ''.
matcher = regex->create_matcher( text = 'Test123tes456' ).
match = matcher->match( ).
write match
You can find more details regarding REGEX and POSIX examples here
http://www.regular-expressions.info/tutorial.html
a® -
Question about Finder-Load-Beans flag
Hi all,
I've read that the Finder-Load-Beans flag could yield some valuable gains in performance
but:
1) why is it suggested to do individual gets of methods within the same Transaction
? (tx-Required).
2) this strategy is useful only for small sets of data, isn't it? I imagine I
would choose Finder-Load-Beans to false (or JDBC) for larger sets of data.
3) A last question: its default value is true or false ?
Thanks
FrancescoBecause if there are different transactions where the get method is called
then the state/data of the bean would most be reloaded from the database. A
new transactions causes the ejbLoad method to be invoked in the beginning
and the ejbStore at the end. That is the usual case but there are other ways
to modify this behavior.
Thanks
Gaurav
"Francesco" <[email protected]> wrote in message
news:[email protected]...
>
Hi thorick,
I have found this in the newsgroup. It's from R.Woolen answering
a question about Finder-Load-Beans flag.
"Consider this case:
tx.begin();
Collection c = findAllEmployeesNamed("Rob");
Iterator it = c.iterator();
while (it.hasNext()) {
Employee e = (Employee) it.next(); System.out.println("Favorite color is:"+ e.getFavColor());
tx.commit();
With CMP (and finders-load-beans set to its default true value), thefindAllEmployeesNamed
finder will load all the employees with the name of rob. The getFavColormethods
do not hit the db because they are in the same tx, and the beans arealready loaded
in the cache.
It's the big CMP performance advantage."
So I wonder why this performance gain can be achieved when the iterationis inside
a transaction.
Thanks
regards
Francesco
thorick <[email protected]> wrote:
1) why is it suggested to do individual gets of methods within thesame Transaction
? (tx-Required).I'm not sure about the context of this question (in what document,
paragraph
is this
mentioned).
2) this strategy is useful only for small sets of data, isn't it? Iimagine I
would choose Finder-Load-Beans to false (or JDBC) for larger sets ofdata.
>
If you know that you will be accessing the fields of all the Beans that
you get back from a
finder,
then you will realize a significant performance gain. If one selects
100s or more beans
using
a finder, but only accesses the fields for a few, then there may be some
performance cost.
It could
depend on how large some of the fields are. I'd guess that the cost
of 1 hit to the DB per
bean vs.
the cost of 1 + maybe 1 more hit to the DB per bean, would usually be
less. A performance
test using
your actual apps beans would be the only way to know for sure.
3) A last question: its default value is true or false ?The default is 'True'
-thorick -
Find text using regular expression and add highlight annotation
Hi Friends
Is it possible to find text using regular expression and add highlight annotation using pluginA plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.
-
Find/Replace Using Regular Expressions
Can someone help me with this...I am using Regular expressions to
FIND:
http.*lid=([^&"]*)[^"]*
REPLACE:
$set(\1,ID_id,code)$
So that in the following it will change this:
a href="http://www.test.com/shc/s/home_10153_12605?lid=Search" rilt="Search"
To this:
a href="$set(Search,ID_id,code)$" rilt="Search
Those expressions work in Notepad++ but when i use dreamweaver it just replaces the http... with "$set(\1,ID_id,code)$" and doesnt reference the "search"
Any help?
ThanksLet me begin by saying I'm a complete idiot with DW's Reg Ex. I use Search Specific Tag whenever possible. See screenshot below.
Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
Good luck,
Nancy O. -
How to fetch substring using regular expression
Hi,
I am new to using regular expression and would like to know some basic details of how to use them in Java.
I have a String example= "http://www.google.com/foobar.html#*q*=database&aq=f&aqi=g10&fp=c9fe100d9e542c1e" and would like to get the value of "q" parameter (in bold) using regular expression in java.
For the same example, when we tried using javascript:
match = example.match("/^http:\/\/(?:(?!mail\.)[^\.]+?\.)?google\.[^\?#]+(?:.*[\?#&](?:as_q|q)=([^&]+))?/i}");
document.write('
' + match);
We are getting the output as: http://www.google.com/foobar.html#q=database,*database* where the bold text is the value of "q" parameter.
In Java we are trying to get the value of the q parameter separately or atleast resembles the output given by JavaScript. Please help me resolving this issue.
Regards
PraveenBalusC wrote:
Regex is a cumbersome solution for fixed patterns like URL's. String#substring() in combination with String#indexOf would most likely already suffice.I usually agree, although, in this case, finding the exact parameter might be difficult without a small regex, perhaps:
"\\wq=\\s*"in conjunction with Pattern/Matcher, used similarly to an indexOf() to find the start of the parameter value.
Winston -
Checking valid e-mail's using Regular expressions
Hey buddies ,
I desperately need some help here. I need to develop a generic method that wiull use regular expressions and patterns to validate an e-mail.
Does anyonw have any idea on how to do this. And if possible please share some code with me.
Thanks a lotYou can do regular expresions in java using java.util.regex.*:
import java.util.regex.*;
Pattern p = Pattern.compile("\\S++\\s++");
Matcher m = p.matcher(sInputLine);
if(m.find()){
sUser = m.group().trim();
}For more info on regular expressions in java, check out the javadocs:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html -
Searching for a substring using Regular Expression
I have a lengthy String similar to repetetion of the one below
String str="<option value='116813070'>Something1</option><option value='ABCDEF' selected>Something 2</option>"I need to search for the Sub string "<option value='ABCDEF' selected>" (need to get the starting index of sub string) and but the value ABCDEF can be anything numberic with varying length.
Is there any way i can do it using regular expressions(I have no other options than regular expression)?
thanks in advance.If you go through the tutorial then you will find this on the second page:
import java.io.Console;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexTestHarness {
public static void main(String[] args){
Console console = System.console();
if (console == null) {
System.err.println("No console.");
System.exit(1);
while (true) {
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
Matcher matcher =
pattern.matcher(console.readLine("Enter input string to search: "));
boolean found = false;
while (matcher.find()) {
console.format("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
found = true;
if(!found){
console.format("No match found.%n");
}It's does everything you need and a bit more. Adapt it to your needs then write a regular expression. Then if you have problems by all means come back and post them up here, but first at least attempt to solve it yourself. -
Matches from regular expression into collection
Hello,
I need to do the following:
I have a long string with some similar repeated data. I would like, using a regular expression, to extracts all matches in a collection. Is there a way of performing this task?
I have look through the owa_pattern package, but as far as I found out, I can extract only a simple match. Here is an exact quote:
"If multiple overlapping strings can match the regular expression, this function takes the longest match. " - http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/w_patt.htm
So what can I do if I want to get all the matches?
Thank you in anticipation. Any help would be appreciated.
Best regards,
beroetzI think your need a tokenizer-function.
If the string +:in_str+ is delimited by +:in_delimiter+ you could try this:
SELECT REGEXP_REPLACE(REGEXP_SUBSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ), :in_delimiter, '') TOKEN
BULK COLLECT INTO :my_nested_table
FROM DUAL
CONNECT BY REGEXP_INSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ) > 0
ORDER BY LEVEL ASC;
I wrote a string-to-textarray-tokenizer (and it's pendant) some times ago, being able to cut from certain positions within the string using regular expressions and return the elements into an nested table of varchar2. It looks like:
TYPE pos_arraytype IS TABLE OF POSITIVE ;
TYPE text_arraytype IS TABLE OF VARCHAR2(2000);
FUNCTION stringToTextarray(in_str IN VARCHAR2, in_pos_arr IN pos_arraytype, in_regexp_arr IN text_arraytype DEFAULT NULL, in_trim_strings IN BOOLEAN DEFAULT TRUE)
RETURN text_arraytype ;
in_str is the string to be tokenized
in_pos_arr is a table of positive values of positions in the string to be cut
in_regexp_arr is a table of regular expressions to use at each position declared by in_pos_arr
in_trim_strings is a flag, if the cutted element should be trimmed
using above for example:
in_str = 'Markus van Muster 347651234XY Musterdaam ABCDE'
in_pos_arr = (1, 13, 35, 35, 42)
in_regexp_arr = ('(.?){12}', '([^[:digit:]]?){22}', '[[:digit:]]{4}', '[[:alpha:]]{2}', '(.?){14}')
in_trim_strings = TRUE
RETURN collection ('Markus','van Muster','1234','XY','Musterdaam')
If you need the code, then tell me! I'm looking for....
Cheers,
Martin
Edited by: Nuerni on 17.10.2008 08:49 -
Date Validation (yyyy/MM/dd) Using Regular Expression
Hi Friends,
I want to validate date entered by user in yyyy/MM/dd format and for this I want to use Regular Expressions only. Also is there any tool that can be used to generate Regular Expression (for Win2000, Win NT)?
Regards,
Himanshu Rathoretry this
public class Test
public static void main(String [] args)
String regex = "\\d{4}/[01]\\d/[0-3]\\d";
System.out.println("2003/12/11".matches(regex));
System.out.println("2djd/kj3".matches(regex));
System.out.println("22/12/12".matches(regex));
System.out.println("2003/23/05".matches(regex));
System.out.println("1999/12/51".matches(regex));
System.out.println("2007/05/07".matches(regex));
}i'm not able to try on it because i only have jdk1.3.1 installed on my computer and these codes
required j2sdk1.4 -
Validate cfinput using regular expression
Hi,
can somebody help me valditing an cfinput field using regular expressions?
First digit must be a number or "R".
Digit 2-15 can be everything without special characters. Digit 2-15 can also be empty.
I try this, but it doesn't work (Sorry I'm a beginner using regex).
<cfinput type="text" name="field1" required="yes" validate="regular_expression" pattern="[0-9Rr][0-9a-zA-Z]*" maxlength="15">
Thank you in advice!
ClaudiaYou haven't told your regex to match the entire string, so it will "pass"
any string that has your match anywhere within it, so like as long as it's
got an R or a digit in it: it's OK.
To tell it to match the entire string, anchor it to the start and end of the
string with ^ and $ respectively.
Regular expression syntax: Using special characters
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f -7ffb.html#WSc3ff6d0ea77859461172e0811cbec0a38f-7fef
Adam
Maybe you are looking for
-
How can I prevent Firefox from closing when it crashes?
So, I'm using a Windows 8 computer, & when it crashes, it closes altogether & it's really annoying having to reload my session, but when I was using a Windows 7 computer, Firefox hardly ever closed altogether when it crashed- all it did was freeze. I
-
Please help. Can no longer connect to my Mac Mini Server
Up until two days ago, my iMacs were connecting to my Mac Mini Server normally. All of a sudden I could no longer transfer files without being prompted to type my password. Enter password and nothing happened, then got an error message stating that I
-
So I've installed and uninstalled itunes 3 times, using control panel to unintall everything, then ccleaner to make sure nothing is left over. No ipod service and stuff left. Each time I install, the defult folder where the Media Folder, and album a
-
Not able to see the accessories in IC Web Client and Web shop
Hi All, we have configured Method Schema for Product accessories and maintain the accessories in product master. I am facing an issue where in i cant see the accessories in IC Web Client and webshop. Any input will be a great help. Thanks in advance
-
PO history update for advance payment requests.
Hi , We are generating advance payment requests through F-47 based on the Purchase order. The generated F-47 requests are not updating my PO history though i am generating the requests with refernce to the PO numbere and the line item. The PO history