Regular expression 's subexpression

hi.
while reading the oracle documention i have this example
The next statement returns the position in the source string of the first character in the fourth subexpression, which is '78':
SELECT REGEXP_INSTR('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
"REGEXP_INSTR" FROM DUAL;
REGEXP_INSTR
7
here the 4th subexpression is 78. but i am confuse what will be the 2nd subexpression. is it number 4. also here we have expression inside expression in this case how will the expression order will be determined.
thanks

here the 4th subexpression is 78. but i am confuse
what will be the 2nd subexpression. is it number 4.
also here we have expression inside expression in
this case how will the expression order will be
determined. Your query failed on my 10g.
So I looked the doc for 11g - yep, Oracle team has added one more parameter to REGEXP_INSTR() function.
And BTW the order of subexpressions evaluating is clearly explained with an example in the documentation : 11g REGEXP_INSTR
VB
http://volder-notes.blogspot.com/

Similar Messages

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Introduction to regular expressions ... continued.

After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
Having fun with regular expressions - Part 2
Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
CREATE OR REPLACE PACKAGE regex_utils AS
-- Regular Expression Utilities
-- Version 0.1
TYPE t_outrec IS RECORD(
 data VARCHAR2(255)
TYPE t_outtab IS TABLE OF t_outrec;
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED;
END regex_utils;
CREATE OR REPLACE PACKAGE BODY regex_utils AS
-- FUNCTION gen_data returns a collection of generated varchar2 elements
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED
IS
 TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
 v_counter t_counter;
 v_exit BOOLEAN;
 v_string VARCHAR2(255);
 v_outrec t_outrec;
BEGIN
 FOR max_length IN 1..p_length
 LOOP
 -- init counter loop
 FOR i IN 1..max_length
 LOOP
 v_counter(i) := 1;
 END LOOP;
 -- start data generation loop
 v_exit := FALSE;
 WHILE NOT v_exit
 LOOP
 -- start generation
 v_string := '';
 FOR i IN 1..max_length
 LOOP
 v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
 END LOOP;
 -- set outgoing record
 v_outrec.data := v_string;
 -- now pipe the result
 PIPE ROW(v_outrec);
 -- increment loop
 <<inc_loop>>
 FOR i IN REVERSE 1..max_length
 LOOP
 v_counter(i) := v_counter(i) + 1;
 IF v_counter(i) > LENGTH(p_charset) THEN
 IF i > 1 THEN
 v_counter(i) := 1;
 ELSE
 v_exit := TRUE;
 END IF;
 ELSE
 -- no further processing required
 EXIT inc_loop;
 END IF;
 END LOOP;
 END LOOP;
 END LOOP;
END gen_data;
END regex_utils;
/This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
SELECT *
FROM (SELECT data col1
 FROM TABLE(regex_utils.gen_data('+-.0', 5))
 ) t
WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
 '^([+-]?[^+-]+|[^+-]+[+-]?)$'
 '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
FROM dual
;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '( )\1*|.', '\1')))
FROM dual
;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
Let's compare our solutions than, shall we?
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions. !',
 '([a-z])+|.', '\1', 1, 0, 'i')), 0)
FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
WITH t AS (SELECT 'Where is the last space?' col1
 FROM dual)
SELECT INSTR(col1, ' ', -1)
FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')
FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
FROM dual
;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
SELECT REGEXP_REPLACE('10~20~30~40~50',
 '^(.*)~(.*)~(.*)~(.*)~(.*)$',
 '\5~\4~\3~\2~\1'
FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
Continued in Introduction to regular expressions ... last part..
C.
Fixed some typos and a flawed example ...
cd

Thank you very much C. Awaiting other parts.... keep going.
One german typo :-)
I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
tnsnames voor EDWH:
PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
 (HOST=blah.blah.blah.com)
 (PORT=5227)))
 (CONNECT_DATA=(SID=pcescrd1)))
db user: BW_I2_VIEWER / BW_I2_VIEWER_SCRD1Had to look for translators.
Cheers
Sarma.

Those darn regular expressions again ...

Greetings,
I feel reluctant to ask this question because I sincerely hate regular expressions, but here is a regular expression question:
Suppose I receive a byte stream from a device: after having received a, say, 'X' byte some more bytes follow, followed by a number: two or three digits followedby a dot and then some more digits follow. I am interested in that number so I cooked up this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
     public static void main(String[] args) {
          Pattern pat= Pattern.compile("X.*(\\d{2,3}\\.\\d*)");
          String str= "fooXbar123.456baz";
          Matcher mat= pat.matcher(str);
          if (mat.find())
               System.out.println(str.substring(mat.start(1), mat.end(1)));
}The regular expression represents what I want to see: a capital X, some bytes and the number. b.t.w. I convert those bytes to chars so there is no problem there. The output of this little snippet is "23.456", i.e. it takes the shortest variant of group #1 (the '1' is eaten by the "dot-star" subexpression.
I hope you understand why I hate those regular expressions so much; they're the devil's invention. My question boils down to: how can I find the longest variant of that group #1 expression? i.e. "123.456".
A bit more info: the 'bar' part can contain almost anything, including digits. It can even be totally empty. The capital 'X' has to be present and "bar" doesn't contain a capital 'X'.
kind regards,
Jos

JosAH wrote:
Never did get my head round the 'pimping lemon'!If a word x y z t u is in a language L and if x y^n z t^n u is also in that language the pumping lemma applies and L is a context free language; that is so trivial ... ;-)Was that a low flying jet or just the 'pimping lemon' going straight over my head!
>
I used to teach APL to non-(scientists/engineers/mathematicians) . One of the most unsatisfying jobs ever. I love APL but how do you teach APL to someone who does not understand matrices? How do you teach APL to someone who expects 1-1-1-1-1 to be -3? I still get agencies contacting me about APL jobs even though I last used it 25 years ago!I used to program APL on an old DEC LA/36 paper terminal. You had to lean over backwards because those APL symbols were printed on the front side of the keys. Therefore my knees blocked the paper feed so all my printouts turned into a mess ...I was lucky. I used an IBM (5110 springs to mind but at my age!) box and some IBM terminals attached to an IBM370. No silly paper tape. It just cost a fortune for every second one was connected to the IBM370.
>
How's the 'limp' Jos? Any better?Not really; I put too much strength on my left leg (the 'good' one) and my body thought something was wrong so it started a bad inflamation in that leg. I'm on inflamation suppressing pills now. dammit.
We are a real pair of crocks! I mixed and laid 6 ton of concrete and disposed of 8 ton of hardcore during Dec, Jan and Feb. My knee is swollen like a balloon and I'm on anti-inflammatory pills right now. BUT I have to eat a full meal before taking them and that means I have to have to take more pills to suppress my excess stomach acid. God I wish I was 50 again!
I shall have to visit Holland sometime before we are both confined to wheelchairs.

Fun fun regular expressions

My goal: to create a regular expression that will match a four character hexadecimal value that is left-padded with spaces.
Examples:" 1", " A4", " 51C", "FFAB"All the alpha characters will be uppercase.
My solution:"[0-9A-F]{4}| [0-9A-F]{3}| [0-9A-F]{2}| [0-9A-F]"My question is:
Is there a more concise way to represent this? I am parsing a line of text that is formatted into columns and delimited with spaces, so this is actually a subexpression in a capturing group and the spaces are important.
Thank you for any help you can provide.

It's not easy, taken out of context, had to put the two delimiting spaces in to get a grip.
I can make an alternative regex, but it is not "better" as such.
Sorry for using perl, but when people talk regex ...#!/bin/perl/bin/perl.exe
use strict;
$^W = 1;
$\ = "\n";
my (@good) = ( " 1", " AB", " 123", "cafe", );
my (@bad) = ("123 ", " ", "123", "12345", "gggg", " 123", " ", "",
 "a a ", " a a", "1 1", );
my $s;
for $s (@good) {
 ismatch($s);
for $s (@bad) {
 ismatch($s);
sub ismatch {
 $_ = ' '.$_[0].' ';
 # if (/^ ([\dA-Fa-f]{4}| [\dA-Fa-f]{3}| [\dA-Fa-f]{2}| [\dA-Fa-f]) $/) {
 if (/^ (?! *[\d\w]+ +[\d\w])[ \dA-Fa-f]{4}(?<! ) $/) {
 print "Match: \"$_\"";
 else {
 print "No match: \"$_\"";
}

Regular Expression Trouble

Hello All, Happy Holidays.
I'm creating a component in an application that is similar to
Facebook's "News Feed" feature. This feature lists rows from a
database which may contain some 'special text' that needs to be
converted to HTML. Note: I am storing the 'special text' in the
database instead of the actual html code to save space in the
database in case the name of the object is very long.
Here's what I'm trying to do:
In the database a record might look like this:
On 12/25/06 [user:253]John Smith[/a] Logged On.
When I display the table I want to convert it to the
following:
On 12/25/06 <a href="user_info.cfm?userId=253">John
Smith</a> Logged On.
I've been able to create the following regular expression
which does a good job converting the text to a link, but I am
unable to get the uniqueID (253) from the string. Note: Description
is the name of the column that is being processed.
<CFSET temp = ReReplace(Description,'\[User:.\]','<a
href="user_info.cfm?userID=253">','ALL')>
<CFSET temp =
ReReplace(temp,'\[/a\]','</a>','ALL')>
Is there a way I can convert the string in one regular
expression? Also, how can I get the number value after the colon
(:) and insert it into the replacement string?
Also, to complicate things, the string may have multiple
instances of 'special text'. For example:
On 12/25/06 [user:253]John Smith[/a] modified
[user:262]Captain Picard's[/a] account
Thanks for your help!

If I read your requirement correctly, you should replace
this:
\[user:([^\]]*)](.*)?\[/a]
With this:
<a href="user_info.cfm?userId=\1">\2</a>
The bit you were missing from your regex was capturing the
subexpressions
for the ID and the name. Can I suggest you read this:
http://livedocs.macromedia.com/coldfusion/7/htmldocs/00000990.htm,
and have
a bit of an experiment.
Also, Regex Coach is good for testing stuff:
http://weitz.de/regex-coach/
Adam

Regular expression help needed

Hello experts,
I am looking to implement a search & replace regular expression
my regular expressions are as follows:
search regular expression = (test\\s+--\\s*)?this is a test(.*)?
replace regular expression = (new) brand new test$2
i.e. The results I require are
case 1
input string = test -- this is a test 1999
correct result = (new) brand new test 1999
or (since I require the regular expression to be optional)
case 2
input string = this is a test
correct result = brand new test
How do I implement this using pattern and matcher? Sample code would be useful
I am having difficulties because matcher.appendReplacement will always replace because my regular expressions are optional. (which is incorrect)
i.e. I am getting the following incorrect result ((new) is being appended)
input string = this is a test
incorrect result = (new) brand new test
At the moment my non working code is
StringBuffer sb = new StringBuffer();
Pattern pattern = Pattern.compile("(test\s+--\s*)?this is a test(.*)?");
Matcher matcher = pattern.matcher("this is a test");
if(matcher.find())
matcher.appendReplacement(sb, "(new) brand new test$2");
String result = sb.toString();
System.out.println(result);
}In the above scenario I want the output to be 'brand new test' without the (new) because the input string did not contain 'test --'
Hope this makes sense
Thanks

For example: StringBuffer sb = new StringBuffer();
Pattern pattern = Pattern.compile("(test\s+--\s*)?this is a test(.*)");
Matcher matcher = pattern.matcher("this is a test");
if(matcher.find())
matcher.appendReplacement(sb, ""); // copy everything before the match
if (matcher.start(1) != -1)
sb.append("(new) ");
sb.append("brand new test");
sb.append(matcher.group(2));
matcher.appendTail(sb); // copy everything after the match
System.out.println(sb.toString()); Because the first group is optional, you need to find out whether it participated in the match before you add the "(new) " bit. The second group doesn't need to be optional because (1) the subexpression with the group can match nothing, and (2) you don't need to perform a different action depending on what that group did. You just append the captured text, which may be an empty string.

Regular Expression Subgroups

In .NET regular expressions, you can query for the values of a subgroup. For instance, for the reg. exp. "A(B+C+)*" and the input "ABBCCCBBBCBC", the values for subgroup 1 would be "BBCCC", "BBBC", and "BC". In the Jakarta library, there is a method RE.getParen() method, but this accepts only a single integer parameter, the "Nesting level of subexpression." Thus, I cannot ask it for the third instance of subexpression 1. Calling getParen() returns the first instance only.
Are there any regular expression libraries for Java which provide a way to get every instance of a subexpression?

No. The only regex implementations I know of that have that capability are .NET and Perl 6.

Regular expressions URGENT !!!

i am using the gnu.regexp package for regular expression matching can any body tell me how do i get the (regular expression)subexpression index that the string matched to without iterating
detailed problem desc:
package :gnu.regexp
regular expression : lots of regulat expression combined with "|"
method : getall matches
requirment: need to get the (regular expression's)sub expression index that the current string matched from REmatch or some other class without iteration or another call to getall matches
note: if there is no solution can it be done using java.util.regexp or jregex ( if so i need the equivalent of getall matches)
thanks in advace to all java gurus

Why you don't use groups?
In java.util.regex you can access the content of a group with "String group(int index)". Or did I missunderstand the meening of your "subexpression index" ?
Regards,
Finn

Logical AND in Java Regular Expressions

I'm trying to implement logical AND using Java Regular Expressions.
I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
Any ideas, how to do it ?
Baken

First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

Help in Regular expression

Hello..
I wanted to write a regular expression to match the foll string..
 
 NEW ORLEANS, Louisiana (CNN) 
-- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi, residents say much of America has forgotten their plight.
 
I tried doing..
Matcher matcher= Pattern.compile(" ([^<^>]+?)", Pattern.CASE_INSENSITIVE).matcher(story);
Its not working...
is there any other soln?

Theres probably a better way to do this but here's a way that works.
import java.util.regex.*;
public class RegexTester{
public static void main(String[] args){
 String text =
 " " +
 " NEW ORLEANS, Louisiana (CNN) " +
 "-- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi," +
 "residents say much of America has forgotten their plight." +
 " ";
 String regex = ">((?:\\s*[\\S&&[^<>]]+\\s*)*?)<";
 Pattern p = Pattern.compile(regex);
 Matcher m = p.matcher(text);
 while(m.find()){
 System.out.println("Match: '" + m.group(1) + "'");
}

Help in regular expression matching

I have three expressions like
1) [(y2009)(y2011)]
2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
3) [(y2009M1d20)(y2011M12d31)]
i want regular expression pattern for the above three expressions
I am using :
REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
but its giving results for all above expressions while i want different expression for each.
i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
Thanks

I dont get your question Can you post your desired output? and also give some sample data.
Please consider the following when you post a question.
1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
You can use the following query and do a copy past of the output.
select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
that are asked the answer is already there.
3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
I have the following table called sales
with sales
as
 select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
 union all
 select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
select *
from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
For example in the above sales table, I want to know the total quantity and number of invoice for each product.
The output should look like this
Prod_id sum_qty count_inv
1 145 2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
use the {noformat}{noformat} tags.
The usage of the tag is like this.
<place your code here>\
7. If you are posting a *Performance Related Question*. Please read
 {thread:id=501834} and {thread:id=863295}.
 Following those guide will be very helpful.
8. Please keep in mind that this is a public forum. Here No question is URGENT.
 So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude.

Regular expression alphabets

Hi
I want to retrieve the data if the data contains a character or a space or '-' thru select query .
Please help me in writing the combination of 3 with regular expression.
Thanks!!

VT wrote:
Hi,
Try this
SELECT *
FROM <TABLE> WHERE REGEXP_LIKE(<COLUMN>, '[a-z -][A-Z -]');cheers
VTThat won't work as it's expecting at least two characters with the first having to be a-z (lower case) or space or "-" followed by A-Z (upper case) or space or "-".
The correct way is either:
[a-zA-Z -]or
[[:alpha:] -]using the alpha set is often preferable as it can work differently with different character sets/languages rather than restricting to just the a-zA-Z ranges.
Generating a reference for your own database characterset/language can be useful...
SQL> select level-1 as asc_code, decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), CHR(level-1)) as chr,
2 decode(chr(level-1), regexp_substr(chr(level-1), '[[:graph:]]'), 1) is_graph,
3 decode(chr(level-1), regexp_substr(chr(level-1), '[[:blank:]]'), 1) is_blank,
4 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alnum:]]'), 1) is_alnum,
5 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alpha:]]'), 1) is_alpha,
6 decode(chr(level-1), regexp_substr(chr(level-1), '[[:digit:]]'), 1) is_digit,
7 decode(chr(level-1), regexp_substr(chr(level-1), '[[:cntrl:]]'), 1) is_cntrl,
8 decode(chr(level-1), regexp_substr(chr(level-1), '[[:lower:]]'), 1) is_lower,
9 decode(chr(level-1), regexp_substr(chr(level-1), '[[:upper:]]'), 1) is_upper,
10 decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), 1) is_print,
11 decode(chr(level-1), regexp_substr(chr(level-1), '[[:punct:]]'), 1) is_punct,
12 decode(chr(level-1), regexp_substr(chr(level-1), '[[:space:]]'), 1) is_space,
13 decode(chr(level-1), regexp_substr(chr(level-1), '[[:xdigit:]]'), 1) is_xdigit
14 from dual
15 connect by level <= 256
16 /
ASC_CODE C IS_GRAPH IS_BLANK IS_ALNUM IS_ALPHA IS_DIGIT IS_CNTRL IS_LOWER IS_UPPER IS_PRINT IS_PUNCT IS_SPACE IS_XDIGIT
 0 1
 1 1
 2 1
 3 1
 4 1
 5 1
 6 1
 7 1
 8 1
 9 1 1
 10 1 1
 11 1 1
 12 1 1
 13 1 1
 14 1
 15 1
 16 1
 17 1
 18 1
 19 1
 20 1
 21 1
 22 1
 23 1
 24 1
 25 1
 26 1
 27 1
 28 1
 29 1
 30 1
 31 1
 32 1 1 1
 33 ! 1 1 1
 34 " 1 1 1
 35 # 1 1 1
 36 $ 1 1 1
 37 % 1 1 1
 38 & 1 1 1
 39 ' 1 1 1
 40 ( 1 1 1
 41 ) 1 1 1
 42 * 1 1 1
 43 + 1 1 1
 44 , 1 1 1
 45 - 1 1 1
 46 . 1 1 1
 47 / 1 1 1
 48 0 1 1 1 1 1
 49 1 1 1 1 1 1
 50 2 1 1 1 1 1
 51 3 1 1 1 1 1
 52 4 1 1 1 1 1
 53 5 1 1 1 1 1
 54 6 1 1 1 1 1
 55 7 1 1 1 1 1
 56 8 1 1 1 1 1
 57 9 1 1 1 1 1
 58 : 1 1 1
 59 ; 1 1 1
 60 < 1 1 1
 61 = 1 1 1
 62 > 1 1 1
 63 ? 1 1 1
 64 @ 1 1 1
 65 A 1 1 1 1 1 1
 66 B 1 1 1 1 1 1
 67 C 1 1 1 1 1 1
 68 D 1 1 1 1 1 1
 69 E 1 1 1 1 1 1
 70 F 1 1 1 1 1 1
 71 G 1 1 1 1 1
 72 H 1 1 1 1 1
 73 I 1 1 1 1 1
 74 J 1 1 1 1 1
 75 K 1 1 1 1 1
 76 L 1 1 1 1 1
 77 M 1 1 1 1 1
 78 N 1 1 1 1 1
 79 O 1 1 1 1 1
 80 P 1 1 1 1 1
 81 Q 1 1 1 1 1
 82 R 1 1 1 1 1
 83 S 1 1 1 1 1
 84 T 1 1 1 1 1
 85 U 1 1 1 1 1
 86 V 1 1 1 1 1
 87 W 1 1 1 1 1
 88 X 1 1 1 1 1
 89 Y 1 1 1 1 1
 90 Z 1 1 1 1 1
 91 [ 1 1 1
 92 \ 1 1 1
 93 ] 1 1 1
 94 ^ 1 1 1
 95 _ 1 1 1
 96 ` 1 1 1
 97 a 1 1 1 1 1 1
 98 b 1 1 1 1 1 1
 99 c 1 1 1 1 1 1
 100 d 1 1 1 1 1 1
 101 e 1 1 1 1 1 1
 102 f 1 1 1 1 1 1
 103 g 1 1 1 1 1
 104 h 1 1 1 1 1
 105 i 1 1 1 1 1
 106 j 1 1 1 1 1
 107 k 1 1 1 1 1
 108 l 1 1 1 1 1
 109 m 1 1 1 1 1
 110 n 1 1 1 1 1
 111 o 1 1 1 1 1
 112 p 1 1 1 1 1
 113 q 1 1 1 1 1
 114 r 1 1 1 1 1
 115 s 1 1 1 1 1
 116 t 1 1 1 1 1
 117 u 1 1 1 1 1
 118 v 1 1 1 1 1
 119 w 1 1 1 1 1
 120 x 1 1 1 1 1
 121 y 1 1 1 1 1
 122 z 1 1 1 1 1
 123 { 1 1 1
 124 | 1 1 1
 125 } 1 1 1
 126 ~ 1 1 1
 127 1
 128 Ç 1 1 1
etc.
{code}

Help in query using regular expression

HI,
I need a help to get the below output using regular expression query. Please help me.
SELECT REGEXP_SUBSTR ('PWRPKG(P/W+P/L+CC)', '[^+]+', 1, lvl) val, lvl
FROM DUAL,(SELECT LEVEL lvl FROM DUAL
CONNECT BY LEVEL <=(SELECT MAX ( LENGTH ('PWRPKG(P/W+P/L+CC)') - LENGTH (REPLACE ('PWRPKG(P/W+P/L+CC)','+',NULL))+ 1) FROM DUAL));
I need the output as
correct result:
==============
val lvl
P/W 1
P/L 2
CC 3
But i tried the above it is not coming the above result. Please help me where i did a mistake.
Thanks in advance

Frank gave you a solution in your other thread. You could simplify it if you are on 11g:
SQL> select * from table_x
2 /
TXT
TECHPKG(INTELLI CC+FRT SONAR)
PWRPKG(P/W+P/L+CC)
select txt,
 regexp_substr(
 txt,
 '(.*\()*([^+)]+)',
 1,
 column_value,
 null,
 2
 ) element,
 column_value element_number
from table_x,
 table(
 cast(
 multiset(
 select level
 from dual
 connect by level <= regexp_count(txt,'\+') + 1
 as sys.OdciNumberList
order by rowid,
 column_value
TXT ELEMENT ELEMENT_NUMBER
TECHPKG(INTELLI CC+FRT SONAR) INTELLI CC 1
TECHPKG(INTELLI CC+FRT SONAR) FRT SONAR 2
PWRPKG(P/W+P/L+CC) P/W 1
PWRPKG(P/W+P/L+CC) P/L 2
PWRPKG(P/W+P/L+CC) CC 3
SQL> SY.

Query help in regular expression

Hi all,
SELECT * FROM emp11
WHERE INSTR(ENAME,'A',1,2) >0;
Please let me know the equivalent query using regular expressions.
i have tried this after going through oracle regular expressions documentation.
SELECT * FROM emp11
WHERE regexp_LIKE(ename,'A{2}')
Any help in this regard would be highly appreciated .
Thanks,
P Prakash

please go here
Introduction to regular expressions ...
Thanks,
P Prakash

Regular expression 's subexpression

Similar Messages

Maybe you are looking for