Pattern matching using Regular expression
Hi,
I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
I used the condition
AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
First of all, is this possible to match multiple patterns in one condition?
If yes, please suggest.
Thanks
user2544469 wrote:
Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
- query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
WITH got_token_cnt AS
SELECT cat
, book -- Added March 22
, REPLACE (codes, '+', '*') AS codes -- If desired. Changed March 22
, LENGTH (codes) - LENGTH ( TRANSLATE ( codes
, 'x*+-'
, 'x'
) AS token_cnt
FROM test_cat
, cntr AS
SELECT LEVEL AS n
FROM ( SELECT MAX (token_cnt) AS max_token_cnt
FROM got_token_cnt
CONNECT BY LEVEL <= max_token_cnt
, got_tokens AS
SELECT t.cat
, t.book -- Added March 22
, REGEXP_SUBSTR ( t.codes
, '[*+-]'
, 1
, c.n
) AS token_type
, SUBSTR ( REGEXP_SUBSTR ( t.codes
, '[*+-][^*+-]*'
, 1
, c.n
, 2
) AS token
FROM got_token_cnt t
JOIN cntr c ON c.n <= t.token_cnt
, got_must_have_cnt AS
SELECT cat, book -- Changed March 22
, COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
FROM got_tokens
GROUP BY cat, book -- Changed March 22
SELECT mh.cat
, vn.vn_no
FROM got_must_have_cnt mh
JOIN vn ON mh.book = vn.book -- Changed March 22
LEFT OUTER JOIN got_tokens gt ON mh.cat = gt.cat
AND INSTR (vn.codes, gt.token) > 1
GROUP BY mh.cat
, mh.must_have_cnt
, vn.vn_no
HAVING COUNT (CASE WHEN gt.token_type = '*' THEN 1 END) = mh.must_have_cnt
AND COUNT (CASE WHEN gt.token_type = '-' THEN 1 END) = 0
ORDER BY mh.cat
- query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
When your query takes too long ...
HOW TO: Post a SQL statement tuning request - template posting
Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
Edited by: Frank Kulash on Mar 22, 2011 12:04 PM
Similar Messages
-
REGEX: question about finding Overlapping matches using regular expressions
I have the following problem.
Say for my pattern I use:
Pattern pattern = Pattern.compile("AAA");
Matcher matcher = pattern.matcher("AAAAAA");when I run a loop
while (matcher.find())
System.out.println("Match Found: "+matcher.start()+" "+matcher.end());I get 2 Hits shown in the following output:
Match Found: 0 3
Match Found: 3 6
therefore the regex is seeing the first AAA then the second AAA.
I want it to find the other AAA's in there that are overlapping the other two finds i.e. I want the output to find
AAA from 0 to 3
AAA from 1 to 4
AAA from 2 to 5 and finally
AAA from 3 to 6
thereby including the overlapping finds.
How can I do this using regex? what am I missing that prevents the overlapping matches to be found? Do I need a quantifier?
Thanks for the help!While the solutions above work fine with the given input, they don't really find all overlapping matches. They just find the longest possible match at each start position. Here's a more thorough approach:import java.util.*;
import java.util.regex.*;
public class Test
public static List<String> matchAllWays(String rgx, String str)
Pattern p = Pattern.compile(rgx);
Matcher m = p.matcher(str);
List<String> result = new ArrayList<String>();
int len = str.length();
int start = 0;
int end = len;
while (start < len && m.region(start, len).find())
start = m.start();
do
result.add(m.group());
end = m.end() - 1;
} while (end > start && m.region(start, end).find());
start++;
return result;
public static void main(String[] args)
List<String> matches = matchAllWays("a.*a", "abracadabra");
System.out.println(matches);
}This approach requires JDK 1.5 or later; that's when the regions API was added to Matcher. -
Matching substrings between square brackets using regular expressions
Hello,
I am new at Java and have a problem with regular expressions. Let me describe the issue in 3 steps:
1.- I have an english sentence. Some words of the sentence stand between square brackets, for example "I [eat] and [sleep]"
2- I would like to match strings that are in square brackets using regular expressions (java.util.regex.*;) and here is the code I have written for the task
+Pattern findStringinSquareBrackets = Pattern.compile("\\[.*\\]");+
+ Matcher matcherOfWordInSquareBrackets = findStringinSquareBrackets.matcher("I [eat] and [sleep]");+
+//Iteration in the string+
+ while ( matcherOfWordInSquareBrackets.find() )+
+{+
+ System.out.println("Patter found! :"+ outputField.getText().substring(matcherOfWordInSquareBrackets.start(), matcherOfWordInSquareBrackets.end())+""); +
+ }+
3- the result I have after running the code described in 2 is the following: *Patter found!: [eat] and [sleep]*
That is to say that not only words between square brackets are found but also the substring "and". And this is not what I want.
What I would like to have as a result is:
*Patter found!: [eat]*
*Patter found!: [sleep]*
That is to say I want to match only the words between the square brackets and nothing else.
Does somebody know how to do this? Any help would be great.
Best regards,
AbouYou can find the words by looping through the sentence and then return the substring within the indexes.
int start=0;
int end=0;
for(int i=0; i<string.length(); i++)
if(string.substring(i,i+1).equals("[");
start=i;
if(start!=0)
if(string.substring(i,i+1).equlas("]");
end=i;
return string.substring(start,end+1);
}something like that. This code will only find the firt word however. I do not know much about regex so I cannot help anymore.
Edited by: elasolova on Jun 16, 2009 6:45 AM
Edited by: elasolova on Jun 16, 2009 6:46 AM -
Using regular expressions for validation in i18n
Can we use regular expressions for validation of inputs in a java application taking care of i18N aspects too. Zip code for different locales are different. Can we use regular expressions to validate zipcode inputs from different locales
hi,
For that shall i have to create individual patterns for matching the inputs from different locales or a single pattern will do in the case of validating phone nos. around the world, zip codes etc. In case different patterns are required, programmer should have a konwledge of difference in patters for different locales.
regards
sdas -
How to fetch substring using regular expression
Hi,
I am new to using regular expression and would like to know some basic details of how to use them in Java.
I have a String example= "http://www.google.com/foobar.html#*q*=database&aq=f&aqi=g10&fp=c9fe100d9e542c1e" and would like to get the value of "q" parameter (in bold) using regular expression in java.
For the same example, when we tried using javascript:
match = example.match("/^http:\/\/(?:(?!mail\.)[^\.]+?\.)?google\.[^\?#]+(?:.*[\?#&](?:as_q|q)=([^&]+))?/i}");
document.write('
' + match);
We are getting the output as: http://www.google.com/foobar.html#q=database,*database* where the bold text is the value of "q" parameter.
In Java we are trying to get the value of the q parameter separately or atleast resembles the output given by JavaScript. Please help me resolving this issue.
Regards
PraveenBalusC wrote:
Regex is a cumbersome solution for fixed patterns like URL's. String#substring() in combination with String#indexOf would most likely already suffice.I usually agree, although, in this case, finding the exact parameter might be difficult without a small regex, perhaps:
"\\wq=\\s*"in conjunction with Pattern/Matcher, used similarly to an indexOf() to find the start of the parameter value.
Winston -
Checking valid e-mail's using Regular expressions
Hey buddies ,
I desperately need some help here. I need to develop a generic method that wiull use regular expressions and patterns to validate an e-mail.
Does anyonw have any idea on how to do this. And if possible please share some code with me.
Thanks a lotYou can do regular expresions in java using java.util.regex.*:
import java.util.regex.*;
Pattern p = Pattern.compile("\\S++\\s++");
Matcher m = p.matcher(sInputLine);
if(m.find()){
sUser = m.group().trim();
}For more info on regular expressions in java, check out the javadocs:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html -
A question about using regular expression
Hi,
This is a part of HTML file.
<SPAN>how</span>
<SPAN>are</span>
<SPAN>you</span>
I want to search string between each pair of <SPAN> , </SPAN> tags by using Regular Expression.
For example:
how
are
you
If I use following method
String regx="<SPAN>(.+)<SPAN>";
Matcher m=Pattern.compile(regx).matcher(str);
int currentLoc=0;
while(currentLoc<str.length()){
if(m.find(currentLoc))
System.out.println(m.group(1));
currentLoc=m.end();
}The content between first <SPAN> and last </SPAN> will be searched.
How to solve this problem?Use a non-greedy match:
(?s)<SPAN>(.+?)<SPAN>(?s) makes the dot match the line terminator (same as setting the dot all option) -
Finding URLs using regular expression.
I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
"Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
My acutal problem is to find the URL irrespective its position within the string.
Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
Matcher matcher = urlPattern.matcher(plainText);
Map stringIndexMap = new HashMap();
//Searching the input string for urlPattern...
while(matcher.find()) {
String urlString = matcher.group();
//Storing the urls in a hashmap with their indices as keys....
stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
Set keySet = stringIndexMap.keySet();
Iterator it = keySet.iterator();
//Iterating over the hashmap containing urls...
while(it.hasNext()) {
String urlString = (String) stringIndexMap.get(it.next());
* Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
* using String index
clickableURLString.replace(clickableURLString.indexOf(urlString),
clickableURLString.indexOf(urlString) + urlString.length(),
"<a href=\"#\" onclick=\"window.open('" + urlString
+ "')\">" + urlString + "</a>");
return clickableURLString.toString();The end of the input is '$' as a regex.
import java.util.regex.*;
public class Prasanna{
public static void main(String[] args){
String text
= "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
// String regex = "(http|https)://.+?(?:\\s|$)"; // this works
String regex = "(http|https)://[^ ]+"; // this also works
Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher mat = pat.matcher(text);
while (mat.find()){
System.out.println(mat.group());
} -
Searching for a substring using Regular Expression
I have a lengthy String similar to repetetion of the one below
String str="<option value='116813070'>Something1</option><option value='ABCDEF' selected>Something 2</option>"I need to search for the Sub string "<option value='ABCDEF' selected>" (need to get the starting index of sub string) and but the value ABCDEF can be anything numberic with varying length.
Is there any way i can do it using regular expressions(I have no other options than regular expression)?
thanks in advance.If you go through the tutorial then you will find this on the second page:
import java.io.Console;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexTestHarness {
public static void main(String[] args){
Console console = System.console();
if (console == null) {
System.err.println("No console.");
System.exit(1);
while (true) {
Pattern pattern =
Pattern.compile(console.readLine("%nEnter your regex: "));
Matcher matcher =
pattern.matcher(console.readLine("Enter input string to search: "));
boolean found = false;
while (matcher.find()) {
console.format("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
found = true;
if(!found){
console.format("No match found.%n");
}It's does everything you need and a bit more. Adapt it to your needs then write a regular expression. Then if you have problems by all means come back and post them up here, but first at least attempt to solve it yourself. -
Validate cfinput using regular expression
Hi,
can somebody help me valditing an cfinput field using regular expressions?
First digit must be a number or "R".
Digit 2-15 can be everything without special characters. Digit 2-15 can also be empty.
I try this, but it doesn't work (Sorry I'm a beginner using regex).
<cfinput type="text" name="field1" required="yes" validate="regular_expression" pattern="[0-9Rr][0-9a-zA-Z]*" maxlength="15">
Thank you in advice!
ClaudiaYou haven't told your regex to match the entire string, so it will "pass"
any string that has your match anywhere within it, so like as long as it's
got an R or a digit in it: it's OK.
To tell it to match the entire string, anchor it to the start and end of the
string with ^ and $ respectively.
Regular expression syntax: Using special characters
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f -7ffb.html#WSc3ff6d0ea77859461172e0811cbec0a38f-7fef
Adam -
XML node name matching with regular expressions
Hello,
If i have an xml file that has the following:
<parameter>
<name>M2-WIDTH</name>
<value column="09" date="2004-10-31T19:56:30" row="03" waferID="PUK444150-20">10.4518</value>
</parameter>
<parameter>
<name>M2-GAP</name>
<value column="29" date="2004-10-31T19:56:30" row="06" waferID="PUK444150-03">2.864</value>
</parameter>
<parameter>
<name>RES-LENGTH</name>
<value column="29" date="2004-10-31T19:56:30" row="06" waferID="PUK444150-03">2.864</value>
</parameter>
Is there anyway i can get a list of nodes that match a certain pattern say where name=M2* ?
I cant seem to find any information where i can match a regular expression. I see how you can do:
String expression=/parameter[@name='M2-LENG']/value/text()";
NodeList nodes = (NodeList) xPath.evaluate(expression, inputSource, XPathConstants.NODESET);
But i want to be able to say:
String expression=/parameter[@name='M2-*']/value/text()";
Is this possible? if so how can i do this?
Thanks!As implemented in Java, XPath does not support regular expressions, but in most cases there are workarounds thanks to XPath functions. Correct me if I'm wrong, but setting your expression against the XML document (i.e. because there are no "name" attributes in the whole document) I think you mean to get the value of the <value> elements that have a <parameter> parent element and a <name> sibling element whose value starts with "M2-". If that is the case, you can use the following query expression:String expression = "//parameter/value[substring(../name,1,3)='M2-']";Sorry if I misunderstood the meaning of your expression, but I hope this will help you get the hang of using XPath functions as a substitute for regular expressions.
-
One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
How to get the below output using REGULAR EXPRESSIONS??
SQL> ed
Wrote file afiedt.buf
1* CREATE TABLE cus___addresses (full_address VARCHAR2(200 BYTE))
SQL> /
Table created.
SQL> PROMPT Address Format is: House #/Housename, street, City, Zip Code, COUNTRY
House #/Housename, street, City, Zip Code, COUNTRY
SQL> INSERT INTO cus___addresses VALUES('1, 3rd street, Lansing, MI 49001, USA');
1 row created.
SQL> INSERT INTO cus___addresses VALUES('3B, fifth street, Clinton, OK 74103, USA');
1 row created.
SQL> INSERT INTO cus___addresses VALUES('Rose Villa, Stanton Grove, Murray, TN 37183, USA');
1 row created.
SQL> SELECT * FROM cus___addresses;
FULL_ADDRESS
1, 3rd street, Lansing, MI 49001, USA
3B, fifth street, Clinton, OK 74103, USA
Rose Villa, Stanton Grove, Murray, TN 37183, USA
SQL> The REG EXP query shouLd output the ZIP codes: i.e. 49001, 74103, 37183 in 3 rows.Edited by: user12240205 on Jun 18, 2012 3:19 AMHi,
user12240205 wrote:
... Frank, ʃʃp's method, I understand. But your method, although correct, I find it difficult to understand.
Could you explain how you did this?? What does '.*(\d{5})\D*' and '\1' mean???
Your method is better because it uses only ONE reg expression function. ʃʃp's uses 2.In Oracle 10.2 (I believe) and higher, '\d' is equivalent to '[[:digit:]]', and '\D' is equivalent to '[^[:digit:]]'. I find '\d' and '\D' easier to type, but there's nothing wrong with using '[[:digit:]]' and '[^[:digit:]]'.
'.*' means "0 or more of any character".
'\D*' means "0 or more non-digits".
The whole expression, '.*(\d{5})\D*' means:
a. 0 or more characters (any characters)
b. 5 digits
c. 0 or more non-digits.
'\1' is a Backreference . It means the sub-string that matched the pattern after the 1st '(', up to (but not including) its matching ')'. In this case, that means the sub-string that matched '\d{5}', or b. using the explanation immediately above.
So the entire REGEXP_REPLACE call means "When you see a sub-string consisting of a., follwed immediately by b., followed immedately by c., replace that sub-string with b. alone." -
Matches from regular expression into collection
Hello,
I need to do the following:
I have a long string with some similar repeated data. I would like, using a regular expression, to extracts all matches in a collection. Is there a way of performing this task?
I have look through the owa_pattern package, but as far as I found out, I can extract only a simple match. Here is an exact quote:
"If multiple overlapping strings can match the regular expression, this function takes the longest match. " - http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/w_patt.htm
So what can I do if I want to get all the matches?
Thank you in anticipation. Any help would be appreciated.
Best regards,
beroetzI think your need a tokenizer-function.
If the string +:in_str+ is delimited by +:in_delimiter+ you could try this:
SELECT REGEXP_REPLACE(REGEXP_SUBSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ), :in_delimiter, '') TOKEN
BULK COLLECT INTO :my_nested_table
FROM DUAL
CONNECT BY REGEXP_INSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ) > 0
ORDER BY LEVEL ASC;
I wrote a string-to-textarray-tokenizer (and it's pendant) some times ago, being able to cut from certain positions within the string using regular expressions and return the elements into an nested table of varchar2. It looks like:
TYPE pos_arraytype IS TABLE OF POSITIVE ;
TYPE text_arraytype IS TABLE OF VARCHAR2(2000);
FUNCTION stringToTextarray(in_str IN VARCHAR2, in_pos_arr IN pos_arraytype, in_regexp_arr IN text_arraytype DEFAULT NULL, in_trim_strings IN BOOLEAN DEFAULT TRUE)
RETURN text_arraytype ;
in_str is the string to be tokenized
in_pos_arr is a table of positive values of positions in the string to be cut
in_regexp_arr is a table of regular expressions to use at each position declared by in_pos_arr
in_trim_strings is a flag, if the cutted element should be trimmed
using above for example:
in_str = 'Markus van Muster 347651234XY Musterdaam ABCDE'
in_pos_arr = (1, 13, 35, 35, 42)
in_regexp_arr = ('(.?){12}', '([^[:digit:]]?){22}', '[[:digit:]]{4}', '[[:alpha:]]{2}', '(.?){14}')
in_trim_strings = TRUE
RETURN collection ('Markus','van Muster','1234','XY','Musterdaam')
If you need the code, then tell me! I'm looking for....
Cheers,
Martin
Edited by: Nuerni on 17.10.2008 08:49 -
Using regular expressions in java
Does anyone of you know a good source or a tutorial for using regular expressions in java.
I just want to look at some examples....
Thanksthanks a lot... i have one more query
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
if i want to use the $ for comparing with string(text) then how can i use it.
Eg if it is $120 i got a hit
but if its other than that if should not hit. -
Using regular expressions for validating time fields
Similar to my problem with converting a big chunk of validation into smaller chunks of functions I am trying to use Regular Expressions to handle the validation of many, many time fields in a flexible working time sheet.
I have a set of FormCalc scripts to calculate the various values for days, hours and the gain/loss of hours over a four week period. For these scripts to work the time format must be in HH:MM.
Accessibility guidelines nix any use of message box pop ups so I wanted to get around this by having a hidden/visible field with warning text but can't get it to work.
So far I have:
var r = new RegExp(); // Create a new Regular Expression Object
r.compile ("^[00-99]:\\] + [00-59]");
var result = r.test(this.rawValue);
if (result == true){
true;
form1.flow.page.parent.part2.part2body.errorMessage.presence = "visible";
else (result == false){
false;
form1.flow.page.parent.part2.part2body.errorMessage.presence = "hidden";
Any help would be appreciated!Date and time fields are tricky because you have to consider the formattedValue versus the rawValue. If I am going to use regular expressions to do validation I find it easier to make them text fields and ignore the time patterns (formattedValue). Something like this works (as far as my very brief testing goes) for 24 hour time where time format is HH:MM.
// form1.page1.subform1.time_::exit - (JavaScript, client)
var error = false;
form1.page1.subform1.errorMsg.rawValue = "";
if (!(this.isNull)) {
var time_ = this.rawValue;
if (time_.length != 5) {
error = true;
else {
var regExp = /^([01]?[0-9]|2[0-3]):[0-5][0-9]$/;
if (!(regExp.test(time_))) {
error = true;
if (error == true) {
form1.page1.subform1.errorMsg.rawValue = "The time must be in the format HH:MM where HH is 00-23 and MM is 00-59.";
form1.page1.subform1.errorMsg.presence = "visible";
Steve
Maybe you are looking for
-
How much ram can U410 support?
Hi, Just wanna check with anyone here knows what is the maximum ram that U410 can support under normal circumstances(no hack, upgrade, tweak option until the graphic card cannot be use)? THank U Solved! Go to Solution.
-
Iphone4 without a service plan
I have bought an Iphone4 in Orlando, but without a voice/data service plan. When I turn on it, there is a message to insert a SIM card to activate it. How can I buy this SIM card? Is it prepaid? Thanks, Janete
-
How is it possible to activate debug mode for AMIE (Asset Management Integration Engine) ?
-
I can't See AP 1550 in WLC 5508
Hi Experts, I have 1550 Access points and I can't see it inside wlc 5508 and in the log message I found this message aaa authentication failure for username:mac user type: wlan user Just to know I have 1142 and 1600 access points and see them in wlc
-
Popup "New Software is available"
Hello, is there a way to disable the information "New Software is available" at the clients!? This is a nice feature but some users think they have to install all software which are available for them, and this information on the top of the taskbar w