Bracket in Regular Expression constant?

I am a bit puzzled by the behavior I am experiencing in LV 2011. I hope to get some light from experts out there.
I am trying to parse a messy ASCII header file and after having split it into individual lines (strings), I use the "Match Regular Expression" function to remove some of the info before the substantial information.
Some of the strings include square brackets ([, ]), which are special characters for the function, therefore, as documented in the help, one needs to precede them with a backslash.
Example:
I want to parse the following line:
#PR [PR_DEV,I,2]
One way (which I am using because of considerations related to the rest of the header) is the the following:
Note that the first string constant is using "Code Display" whereas the second one is using "Normal Display".
Why did I not put a backslash in front of the bracket in the first string, you may ask? Well, I did, but it disappeared after I typed the other characters. And reverting to "Normal Display" did not restore it.
Of course, the first version does not parse the input string correctly, whereas the second one does it fine.
In other words, the custom display string (which is convenient for cryptic codes such as \s* or to distinguish between space and tab...or simply ENTER tabs!) seems to mess up with the \[ combo (likewise with the \] one).
It is not a huge deal. I can use the "Normal Display" mode, but I tend to think that this qualifies as a hidden "feature". And again, it is still a pain in the ... when dealing with special characters such as tabs, etc...
Solved!
Go to Solution.

I think that [ is a special character which needs to be preceded by a backslash, but it is not one of the defined backslash characters (like \s). So, you need to put in two \\ to get one \ while in '\' Codes Display.
You can put in any character by using \xx where the xx is a hex character using only upper case letters for A..F. I converted the strings to byte arrays and tried to see what made the arrays match and the Match work.
Lynn

Similar Messages

Matching substrings between square brackets using regular expressions

Hello,
I am new at Java and have a problem with regular expressions. Let me describe the issue in 3 steps:
1.- I have an english sentence. Some words of the sentence stand between square brackets, for example "I [eat] and [sleep]"
2- I would like to match strings that are in square brackets using regular expressions (java.util.regex.*;) and here is the code I have written for the task
+Pattern findStringinSquareBrackets = Pattern.compile("\\[.*\\]");+
+ Matcher matcherOfWordInSquareBrackets = findStringinSquareBrackets.matcher("I [eat] and [sleep]");+
+//Iteration in the string+
+ while ( matcherOfWordInSquareBrackets.find() )+
+{+
+ System.out.println("Patter found! :"+ outputField.getText().substring(matcherOfWordInSquareBrackets.start(), matcherOfWordInSquareBrackets.end())+""); +
+ }+
3- the result I have after running the code described in 2 is the following: *Patter found!: [eat] and [sleep]*
That is to say that not only words between square brackets are found but also the substring "and". And this is not what I want.
What I would like to have as a result is:
*Patter found!: [eat]*
*Patter found!: [sleep]*
That is to say I want to match only the words between the square brackets and nothing else.
Does somebody know how to do this? Any help would be great.
Best regards,
Abou

You can find the words by looping through the sentence and then return the substring within the indexes.
int start=0;
int end=0;
for(int i=0; i<string.length(); i++)
 if(string.substring(i,i+1).equals("[");
start=i;
if(start!=0)
if(string.substring(i,i+1).equlas("]");
end=i;
return string.substring(start,end+1);
}something like that. This code will only find the firt word however. I do not know much about regex so I cannot help anymore.
Edited by: elasolova on Jun 16, 2009 6:45 AM
Edited by: elasolova on Jun 16, 2009 6:46 AM

Regular Expression Q.

Dear all,
I have been try to remove non printable characters from a string but wants to exclude CHR(10). How can it be done?. I have tried below but strips out every non printable character including the line-feed character. I have tried the \0xA0 but it is ignored by oracle since the documentation says that oracle evaluate by byte and not the display character. Any help is much appreciated. Thanks.
SELECT regex_replace( address, '[[:cntrl:]]','') FROM emp_data;
Regards,
Kueh.

KA Kueh wrote:
I wanted to strip out all control character except the chr(10) and with the [:cntrl:] character class it will strip out all control characters inclusive of the chr(10). So your solution still does not do the trick. Thanks.
Oops, I missed that.You could use regexp_replace to produce a list of control characters in address string, then strip CHR(10) and also strip CHR(0) which either has a special meaning to regexp or is a bug:
SQL> select regexp_replace('A'||chr(0)||'B',chr(0)) from dual;
REG
A B
SQL>
SQL> select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual;
select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual
ERROR at line 1:
ORA-12726: unmatched bracket in regular expression
SQL> select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual;
select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual
ERROR at line 1:
ORA-12726: unmatched bracket in regular expression
SQL> Anyway:
SQL> with t as (
2             select 'ABC'||CHR(0)||CHR(1)||CHR(10)||CHR(11)||'DEF'||CHR(5)||CHR(1)||'GHI' str from dual
3            )
4 select regexp_replace(
5                         replace(str,chr(0)),
6                         '['||replace(regexp_replace(replace(str,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
7                        )
8    from t
9 /
REGEXP_REP
ABC
DEFGHI
SQL> So you can try:
select regexp_replace(
                       replace(address,chr(0)),
                       '['||replace(regexp_replace(replace(address,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
from emp_data
/ SY.

How to write the regular expression for Square brackets?

Hi,
I want regular expression for the [] Square brackets.
I have tried to insert in the below code but the expression not validate the [] square brackets.
If anyone knows please help me how to write the regular expression for [] Square brackets.
private static final Pattern DESC_PATTERN = Pattern.compile("({1}[a-zA-Z])" +"([a-zA-Z0-9\\s.,_():}{/&#-]+)$");Thanks
Raghav

Since square brackets are meta characters in regex they need to be escaped when they need to be used as regular characters so prefix them with \\ (the escape character).

RW_NO_STL & extended regular expressions

Hi
I wonder if anyone can help me. We are working on a solaris 2.6 and using the SUN Workshop C++ (v5.1) compiler. I'm trying to compile one of the roguewave examples
?/SUNWspro/WS6/examples/Tools.h++/rw/manual/extrgexp.cpp
this is an extended regular expression example. My compile statement is
CC -I/opt/SUNWspro/WS6/include/CC/Cstd -I/opt/SUNWspro/WS6/include/CC/Cstd -o -c extrgexp.o extrgexp.cpp -library=rwtools7,iostream
I get the following error
"/opt/SUNWspro/WS6/include/CC/rw7/rw/re.h", line 184: Warning: #error You must have both Standard Library and exceptions to use this class..
"extrgexp.cpp", line 43: Error: RWCRExpr is not defined.
"extrgexp.cpp", line 43: Error: Cannot use const char* to initialize int.
"extrgexp.cpp", line 45: Error: match is not a member of RWCString.
The 3 error are generated as a result of the warning which results from RW_NO_STL being defined in compiler.h . To get the definition of RWCRExpr requires RW_NO_STL to be undefined. I've read previous threads and these seem to suggest that I needed to use the -compat switch. I have tried this and get the same result so I examined compiler.h, the relevant bit is
#if __SUNPRO_CC == 0x420 || __SUNPRO_CC_COMPAT == 4
# define RW_NO_STL 1
# define RW_NO_TYPENAME 1
# define RW_HEADER_WITH_EXTENSION 1
# define RW_NO_IO_SENTRY 1
# define RW_NO_IOSTD 1
#endif
#if __SUNPRO_CC_COMPAT == 5 || __SUNPRO_CC_COMPAT == 6
# define RW_NO_STL 1
/* # define RW_NO_UNBUFFERED 1 */
/* # define RW_STD_RELOPS_IN_NAMESPACE 1 */
# define RW_NO_IO_SENTRY 1
# define RW_NO_IOSTD 1
#endif
strikes me that whatever compat option I choose RW_NO_STL will be defined. Has anyone had more luck than me on this example
cheers
Gareth

If I read between the lines, it looks like you expect \b to match a blank character, right?
According to man re_format:
the bracket expressions `\[\[:<:\]\]' and `\[\[:>:\]\]' match the null string at the beginning and end of a word respectively.
Therefore, this seems to be what you're after:
sed -E -e 's/\[\[:<:\]\]in/aaa/g' testfile
The same man page states that \[:upper:\] and \[:lower:\] are the appropriate tags for upper and lowercase characters, respectively

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Regular expressions-how to replace [ and ] characters from a string

Hi,
my input String is "sdf938 [98033]". Now from this given string, i would like to replace the characters occurring within square brackets to empty string, including the square brackets too.
my output String needs to be "sdf938" in this case.. How should I do it using regular expressions? I tried several possible combinations but didn't get the expected results.

"\\s*\\[[^\\]]+\\]"

Help needed regarding regular expressions

hello
i need to write a program that recieves a matematical expression and evaluates
it...in other words a calculator :)
i know i need to use regular expressions inorder to determine if the input is legal or not ,but i'm really having trouble setting the pattern
the expression can be in the form : Axxze2223+log(5)+(2*3)*(5+4)
where Axxze2223 is a variable(i.e a combination of letters and numbers.)
where as: l o g (5) or log() or Axxx33aaaa or () are illegal
i tried to set the pattern but i got exceptions or it just didnt work the way i wanted it .
here's what i tried to do at least for the varibale form:
"\\s*(*([a-zA-Z]+\\d)+)*\\s*";
i'm really new to this...and i can't seem to set the pattern by using regular expressions,how can i combine all the rules to one string?
any help or references would be appreciated
thanks

so i'll explain
let's say i got token "abc22c"(let's call it "token")
i wan't to check if it's legal
i define:
String varPattern = "\\s*[a-zA-Z]+\\d+\\s*";If you want to check a sequence of ASCII characters, longer than one, followed by a single digit, the whole possibly surrounded by spaces -- yes.
>
now i want to check if it's o.k
so i check:
token.matches(varPattern);
am i correct?Quite. It's better to compile the Pattern (Pattern.compile(String)), create a java.util.regex.Matcher (Pattern#matcher(CharSequence)), and test the Matcher for Matcher#matches().
(Class.method -> static method, Class#method -> instance method)
>
now i'm having problem defining pattern for log()
sin() cos()
that brackets are mandatory ,and there must be an
expression inside
how do i do that?First, I'd check the overall function syntax (a valid name, brackets), then whether what's inside the brackets is a valid expression (maybe empty), then whether that expression is valid for that function (presumably always?).
I might add I'm no expert on parsing, so that's more a supposition than a guide.

Pllllllease help!!!! Regular expressions

Hi....i've been trying for almost 40 hours to write a regular expression and i don't succeed.......
I need a regex that matches a polinomial number.
that polinomal number "divided" in bracets with a complex number between them.
the enviorment i'm using is java eclipse with the REGEX library .
Example for a correct input:
avi=(25.0+5.0i)x^2+(15.3+2.85i)x^1
this is the regex i wrote
^[\w]+=[\$]{1}[-+]?[\d]+[\\.]{1}[\d]++[+-]{1}[\d]+[\\.]{1}[0-9]+i]*[\$]{1}[xX]{1}[\\^]{1}[\d]+$
the regex has to start with a name than = than 1 bracet than possibly a + or - than a number with a decimal than + or minus than i than 1 bracket (closing bracket) than x or X than 1 "^" sign than atleast one number than i want the pattern e.g (15.3+2.85i)x^1
the regex currently supports only this case : avi=(25.0+5.0i)x^2
but i want it to support this: avi=(25.0+5.0i)x^2+(15.3+2.85i)x^1
and the "+" sign between the two polinoms must be a "+" and not a "-"
how do i define that the pattern will repeat it self more once or more - when i say the pattern I mean this one : (25.0+5.0i)x^2
in conclusion. how do i fix it??
plz plz plz help me i'm going nuts and me and java's api are close buddies after this weekend still i don't succeed...
tnx alot in advance...........
(:

Arg! Why do you double-post???
I've just taken considerable time in answering your other post ( http://forum.java.sun.com/thread.jspa?messageID=10018850 ) and then found out that you posted here as well with additional information.
It's considered rude to make people duplicate their effort by posting the same question twice. Keep to one thread.

Introduction to regular expressions ... continued.

After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
Having fun with regular expressions - Part 2
Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
CREATE OR REPLACE PACKAGE regex_utils AS
-- Regular Expression Utilities
-- Version 0.1
TYPE t_outrec IS RECORD(
 data VARCHAR2(255)
TYPE t_outtab IS TABLE OF t_outrec;
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED;
END regex_utils;
CREATE OR REPLACE PACKAGE BODY regex_utils AS
-- FUNCTION gen_data returns a collection of generated varchar2 elements
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED
IS
 TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
 v_counter t_counter;
 v_exit BOOLEAN;
 v_string VARCHAR2(255);
 v_outrec t_outrec;
BEGIN
 FOR max_length IN 1..p_length
 LOOP
 -- init counter loop
 FOR i IN 1..max_length
 LOOP
 v_counter(i) := 1;
 END LOOP;
 -- start data generation loop
 v_exit := FALSE;
 WHILE NOT v_exit
 LOOP
 -- start generation
 v_string := '';
 FOR i IN 1..max_length
 LOOP
 v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
 END LOOP;
 -- set outgoing record
 v_outrec.data := v_string;
 -- now pipe the result
 PIPE ROW(v_outrec);
 -- increment loop
 <<inc_loop>>
 FOR i IN REVERSE 1..max_length
 LOOP
 v_counter(i) := v_counter(i) + 1;
 IF v_counter(i) > LENGTH(p_charset) THEN
 IF i > 1 THEN
 v_counter(i) := 1;
 ELSE
 v_exit := TRUE;
 END IF;
 ELSE
 -- no further processing required
 EXIT inc_loop;
 END IF;
 END LOOP;
 END LOOP;
 END LOOP;
END gen_data;
END regex_utils;
/This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
SELECT *
FROM (SELECT data col1
 FROM TABLE(regex_utils.gen_data('+-.0', 5))
 ) t
WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
 '^([+-]?[^+-]+|[^+-]+[+-]?)$'
 '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
FROM dual
;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '( )\1*|.', '\1')))
FROM dual
;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
Let's compare our solutions than, shall we?
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions. !',
 '([a-z])+|.', '\1', 1, 0, 'i')), 0)
FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
WITH t AS (SELECT 'Where is the last space?' col1
 FROM dual)
SELECT INSTR(col1, ' ', -1)
FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')
FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
FROM dual
;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
SELECT REGEXP_REPLACE('10~20~30~40~50',
 '^(.*)~(.*)~(.*)~(.*)~(.*)$',
 '\5~\4~\3~\2~\1'
FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
Continued in Introduction to regular expressions ... last part..
C.
Fixed some typos and a flawed example ...
cd

Thank you very much C. Awaiting other parts.... keep going.
One german typo :-)
I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
tnsnames voor EDWH:
PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
 (HOST=blah.blah.blah.com)
 (PORT=5227)))
 (CONNECT_DATA=(SID=pcescrd1)))
db user: BW_I2_VIEWER / BW_I2_VIEWER_SCRD1Had to look for translators.
Cheers
Sarma.

Regular Expressions in ABAP

Hi, all!
Are there any possibilities to make use of regular expressions in 4.6C (FMs, classes)?
Regards,
Maxim.

Hi Maxim and all others whoever may read this ,
try the following code - but be patient and leave my (c) where it is:::
You may also have a look at the specialities of JavaScipt RegEx.
Yours,
Johannes
* an Example Call:
DATA return_value TYPE string.
DATA: match type ztmatch,
lastindex TYPE i,
leftcontext TYPE string,
rightcontext TYPE string,
index TYPE i,
searchstring TYPE string,
modifier TYPE string,
regex TYPE string,
found TYPE boolean,
 error_message type string.
regex = 'b+(a)*(b+)'.
searchstring = 'abbbbabbaa'.
modifier = ''.
CALL METHOD ztr_bw_tools=>regex
IMPORTING
 LASTINDEX = lastindex
 LEFTCONTEXT = leftcontext
 RIGHTCONTEXT = rightcontext
 INDEX = index
 FOUND = found
 MATCH = match
 RETURN_VALUE = return_value
 ERROR_MESSAGE = error_message
CHANGING
 SEARCHSTRING = searchstring
 MODIFIER = modifier
 REGEX = regex
Changing SEARCHSTRING TYPE STRING DEFAULT '' "string to be regex applicated
Changing MODIFIER TYPE STRING DEFAULT '' "/gims/
Changing REGEX TYPE STRING DEFAULT '' "regular expression
Exporting LASTINDEX TYPE I
Exporting LEFTCONTEXT TYPE STRING
Exporting RIGHTCONTEXT TYPE STRING
Exporting INDEX TYPE I
Exporting FOUND TYPE BOOLEAN "boolean variable (X=true, -=false, space=unknown)
Exporting MATCH TYPE ZTMATCH "For use with regular expressions
Exporting RETURN_VALUE TYPE STRING
Exporting ERROR_MESSAGE TYPE STRING
method REGEX .
* (c) by Johannes Rumpf - 2006 -
* Matching-Table of part matches of brackets
*DATA: BEGIN OF ztmatch,
* comp TYPE string,
* END OF ztmatch.
DATA source TYPE string.
DATA js_processor TYPE REF TO cl_java_script.
js_processor = cl_java_script=>create( ).
* JavaScript --> ABAP variablen Mapping
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'regex'
 CHANGING data = regex ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'searchstring'
 CHANGING data = searchstring ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'modifier'
 CHANGING data = modifier ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'index'
 CHANGING data = index ).
js_processor->bind( EXPORTING name_obj = 'abap'
 name_prop = 'match'
 CHANGING data = match ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'lastindex'
 CHANGING data = lastindex ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'leftcontext'
 CHANGING data = leftcontext ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'rightcontext'
 CHANGING data = rightcontext ).
js_processor->bind( EXPORTING name_obj = ' '
 name_prop = 'found'
 CHANGING data = found ).
* eine Leerzeile hinzufügen
DATA: wa like line of match.
wa-comp = ' '.
append wa to match.
* JavaScript Code *REGEX*
CONCATENATE
'var re = new RegExp(regex, modifier);'
'var m = re.exec(searchstring);'
' if (m == null) {'
' found = false;'
' } else {'
' found = true; '
' index = m.index;'
' lastindex = m.lastIndex;'
' leftcontext = m.leftContext;'
' rightcontext = m.righContext; '
' var len = abap.match.length;'
' for (i = 0; i < m.length; i++) {'
' abap.match[len-1].comp = m;'
' abap.match.appendLine();'
' len++;'
INTO source SEPARATED BY cl_abap_char_utilities=>cr_lf.
return_value = js_processor->evaluate( source ).
error_message = js_processor->LAST_ERROR_MESSAGE.
endmethod.

Regular Expressions and String

How do I return a String array as follow using regular expression.
String[] strArray = {"Now is the time", "you can optionally preview your post","message by using a number of special tokens."}
from this string
<separator>Now is the time</separator><separator>you can optionally preview your post</separator><separator>message by using a number of special tokens.</separator>
Note: The string has the <separator> XML tag

How do I return a String array as follow using regular
expression.
String[] strArray = {"Now is the time", "you can
optionally preview your post","message by using a
number of special tokens."}
from this string
<separator>Now is the time</separator><separator>you
can optionally preview your
post</separator><separator>message by using a number
of special tokens.</separator>
Note: The string has the <separator> XML tag
This cannot be done using simple regular expressions - at least not if your number of <separator>s is random, which is what you seem to imply.
Simple regular expressions are one-off, that means it can have a String array as a result, but only to the amount of brackets in the regex.
a regex like:
<separator>([^<]*)</separator><separator>([^<]*)</separator><separator>([^<]*)</separator>
would return what you want, but I doubt that it is as flexible as you want it to be.

Using regular expressions

Hi Experts,
After going through some documentation on regular expressions in Oracle I have tried to draw some conclusions about the same. As I wasn’t much confident on how the patterns are built, I have tried to interpret them by looking at the output. It’s basically a reverse engineering I have tried to do.
Please let me know if my interpretations are correct. Any additions /suggestions/corrections are most welcome.
Some of the examples may lack conclusions, please ignore those.
select regexp_substr('1PSN/231_3253/ABc','^([[:alnum:]]*)') from dual;
Output: 1PSN
Interpreted as:
^ From the start of the source string
([[:alnum:]]*) zero or more occurrences of alphanumeric characters
select regexp_substr('@@/231_3253/ABc','@*([[:alnum:]]+)') from dual;
Output: 231
Interpreted as:
@* Search for zero or more occurrences of @
([[:alnum:]]+) followed by one or more occurrences of alphanumeric characters
Note: In the above example oracle looks for @(zero times or more) immediately followed by alphanumeric characters.
Since a '/' comes between @ and 231 the o/p is 0 occurences of @ + one or more occurrences of alphanumerics.
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]*)') from dual;
Output: @
Interpreted as:
@+ one or more ocurrences of @
([[:alnum:]]*) followed by 0 or more occurrences of alphanumerics
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]+)') from dual;
Output: Null
Interpreted as:
@+ one or more occurences of @
([[:alnum:]]+) followed by one or more occurences of aplhanumerics
select regexp_substr('@1PSN/231_3253/ABc125','([[:digit:]]+)$') from dual;
Output: 125
Interpreted as:
([[:digit:]]+) one or more occurences of digits only
$ at the end of the string
select regexp_substr('@1PSN/231_3253/ABc','([^[:digit:]]+)$') from dual;
output: /ABc
Interpreted as:
([^[:digit:]]+)$ one or more occurrences of non-digit literals at the end of the string
'^' inside square brackets marks the negation of the class
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.)
SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT
FROM dual;
Output: http://www.oracle.com
Interpreted as:
[[:alnum:]]+ one or more occurences of alplanumeric characters
\.? dot optionally (backslash represents escape sequence,? represents optionally)
{3,4} 3 or 4 times
/? followed by forward slash optionally
If you have www.oracle.co.uk; {3,4} extracts it for you as well
Validate email:
select case when
       REGEXP_LIKE('[email protected]',
                   '^([[:alnum:]]+(\_?|\.))([[:alnum:]]*)@([[:alnum:]]+)(.([[:alnum:]]+)){1,2}$') then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Interpreted as:
([[:alnum:]]+(\_?|\.)) one or more occurrences of alpha numerics optionally followed by an underscore or dot
([[:alnum:]]*) followed by 0 or more occurrences of alplhanumerics
@ followed by @
([[:alnum:]]+) followed by one or more occurrences of alplhanumerics
(.([[:alnum:]]+)){1,2} followed by a dot followed by alphanumerics from once till max of twice (Ex- .com or .co.uk)
Output: Match Found
Input: [email protected]
Output: Match Found
Input: [email protected]
Output: No Match Found
Truncate the part, ending with digits
select regexp_substr('Yahoo11245@US','^.*[[:digit:]]',1) from dual;
Output: Yahoo11245
select regexp_substr('*Yahoo*11245@US','^.*[[:digit:]]',1) from dual;
Output: *Yahoo*11245
Interpreted as:
.* zero or more occurrences of any characters (dot signifies any character)
Replace 2 to 8 spaces with single space
select regexp_replace('Hello   you      OPs       there','[[:space:]]{2,8}',' ')
from dual;
Search for control characters
select case when
       regexp_like('Super' || chr(13) || 'Star' ,'[[:cntrl:]]')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Search for lower case letters only with a string length varying from a min of 3 to max of 12
select case when
       regexp_like('terminator' ,'^[[:lower:]]{3,12}$')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
4th character must be a special character
select case when
       regexp_like('ter*minator' ,'^...[^[:alnum:]]')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Ouput: Match Found
Case Sensitive Search
select case when
       regexp_like('Republic Of Africa' ,'of','c')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: No match found
c stands for case sensitive
select case when
       regexp_like('Republic Of africa' ,'of','i')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
i stands for case insensitive
Two consecutive occurences of characters from a to z
select regexp_substr('Republicc Of Africaa' ,'([a-z])\1', 1,1,'i') from dual;
Output: cc
Interpreted as:
([a-z]) character set a-z
\1 consecutive occurence of any character
1 starting from 1st character in the string
1 First occurence
i case insensitive
Three consecutive occurences of characters from 6 to 9
select case when
       regexp_like('Patch 10888 applied' ,'([7-9])\1\1')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Phone validator:
select case when
       regexp_like('123-44-5555' ,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Input: 111-222-3333
Output: No match found
Interpreted as:
^ start of the string
[0-9]{3} three ocurrences of digits from 0-9
- followed by hyphen
[0-9]{2} two ocurrences of digits from 0-9
- followed by hyphen
[0-9]{4} four ocurrences of digits from 0-9
$ end of the string
************************************************************************Source Links:
http://www.psoug.org/reference/regexp.html
http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
Edited by: Preta on Feb 25, 2010 4:38 PM
Corrected the example for www.oracle.com
Edited by: Preta Incorported Logan's comments

Hi,
It looks like you have a good understanding of how regular expressions work.
You can put comments like the ones in your message directly in the code. For example, your validate e-mail code could be re-written
select      case
         when REGEXP_LIKE ( '[email protected]'
                    , '^'          || -- Starting from the beginning of the string
                    '('          || -- Begin \1
                      '[[:alnum:]]+'|| --     0 or more alphnumerics
                      '(\_?|\.)'     || --     optional underscore or dot
                    ')'          || -- End \1
                    '([[:alnum:]]*)'|| -- 0 or more alphnumerics
                    '@'          || -- @ sign
                    '([[:alnum:]]+)'|| -- 1 or more alpanumerics
                    '('          || -- Begin \5
                      '\.'          || --   dot
                      '([[:alnum:]]+)'
                              || --   1 or more alphanumerics
                    ')'          || -- End \5
                    '{1,2}'          || -- \5 can occur 1 or 2 times
                    '$'             -- End of string
         then 'Match Found'
                else 'No Match Found'
            end          as output
from      dual;I find this easier to debug and maintain.
There's no denying, it does make the code very long. You be the judge of when to do this.
You use parentheses and \ unnceccessarily sometimes. That's not really an error; if you find they make the code easier to develop and maintain, use them as much as you like.
For example, about the 4th line of the regular expression as I formatted it above:
'(\_?|\.)'     || --     optional underscore or dotUnderscore has no special meaning in regular expressions (only in LIKE), so you don't have to escape it.
I might write that line:
'(_|\.)?'     || --     optional underscore or dotjust because I think it's clearer.
I think you forgot a \ about 7 lines later:
'\.'          || --   dotBe very careful about testing patterns that include literal dots; always make sure that a random character, like ~ , fails in a place where a dot is expected.

Regular Expressions for converting HTML to Structured Plain Text

I'm writing a PL/SQL function that will convert HTML to plain text, but still preserve some of the formatting/line breaks. One of my challenges is in writing a regular expression to capture the text blocks while ignoring the markup. I'm trying to write an expression that will grab all of the text between start/end tags, but discard the tags. For example, to find all of the text between a start/end paragraph, I want to do something like:
REGEXP_REPLACE('This is the body of the paragraph', '<p.*>(.*)', '\1||v_crlf' )
where \1 returns the contents of the paragraph and v_crlf (declared earlier in the function) inserts a line break. I know there are more general expressions that will remove all tags, but I want to specifically identify the tags so I can process them appropriately. This way I can easily convert HTML to plain text for email and reporting without having to keep two versions around. Any help would be greatly appreciated. Once I get this worked out, I will repost with the function code for others to use. Thanks.
Edited by: jritschel on Oct 26, 2010 9:58 AM

Here's a function I wrote for an app. I'm not making in promises on it's accuracy as the app was just a proof of concept and never made it to production.
function strip_html( p_clob in clob )
return clob
is
 l_out clob;
 l_test number := 0;
 l_max_loops constant number := 20;
 i pls_integer := 0;
begin
 l_out := regexp_replace(p_clob,' | ',chr(13)||chr(10),1,0,'imn');
 l_out := regexp_replace(l_out,'',chr(13)||chr(10),1,0,'imn');
 l_out := replace(l_out,'<li>',chr(13)||chr(10)||'*<li>');
 l_out := regexp_replace(l_out,'(.+?)','*\1*',1,0,'imn');
 l_out := regexp_replace(l_out,'(.+?)','_\1_',1,0,'imn');
 loop
 l_test := regexp_instr(l_out,'<([A-Z][A-Z0-9]*)[^>]*>.*?</\1>',1,1,0,'imn');
 exit when l_test = 0 or i > l_max_loops;
 l_out := regexp_replace(l_out,'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>','\2',1,0,'imn');
 i := i + 1;
 end loop;
 return l_out;
end strip_html;{code}
The loop is there to handle nested HTML.
Tyler Muth
http://tylermuth.wordpress.com
"Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book
Edited by: Tyler on Oct 26, 2010 10:03 AM

Regular Expression in Java problem

what is wrong with the following regex?
query = query.replaceAll("SELECT.*?([WHERE.*?|GROUP BY.*?|HAVING.*?|ORDER BY.*?|LIMIT.*?]*?)","\\1");
when I put in this:
"SELECT WHERE MlsNumber=\'555555\', AdType=\'MyAdType\'"
I get this:
"1 WHERE MlsNumber='5100093', AdType='NytClass'"
Where is the 1 coming from? I know the backreference must be working or I wouldnt get the WHERE statement back.

There's a pretty good regex tutorial at this site: http://www.regular-expressions.info/ (I meant to include that in my first reply).
Basically what I'm trying to do is cut the SELECT and
anything after it up until it reaches the WHERE (and
text), GROUPBY, HAVING, ORDER BY, or LIMIT. I want to
remove The SELECT and text, but keep all instances of
WHERE text GROUP BY text, etc. I also want to
keep anything that is past then end of these
expressions (in case there are option I haven't
forseen).Try this: query = query.replaceFirst("SELECT.*?(?=WHERE|GROUP BY|HAVING|ORDER BY|LIMIT)", "");The (?=...) part is a lookahead; it will cause the .*? to stop matching at the first instance of "WHERE", "GROUP BY", "HAVING", "ORDER BY", or "LIMIT". (Check out the "Lookahead & Lookbehind" section in the tutorial for an explanation.) However, if the first thing after the SELECT clause is one of the unknown options you mentioned, it will be removed too. If you know that keywords will always be in all caps, and that none of the other text will be, you could try generalizing the regex like this: query = query.replaceFirst("SELECT.*? (?=[A-Z]{3,})", "");This regex assumes that any sequence of three or more capital letters preceded by a space is a keyword, which is probably not a safe assumption, but it gives you an idea of the kind of thing you can try.
I thought the brackets were there to allow you to
select choices from a group of items, if not I can
remove them.Alternation doesn't require special bracketing characters. You usually want to enclose it in parentheses, but that's just to isolate it from the rest of the regex (e.g., "abc(?:foo|bar)xyz"). Square brackets are used for character classes, which are a completely different breed of animal; look them up in the tutorial.
Apparantly the JDK docs are wrong, since they tell you
to use the \\1 instead of $1 (which works).If you want to use a backreference within the regex, you use \1. For instance, if you want to match a complete HTML element, you might use String regex = "<(\\w+)[^>]*>.*?</\\1>";But in the replacement string, you use $1. BTW, it's the Matcher docs that tell you that, not the Pattern docs.

Bracket in Regular Expression constant?

Similar Messages

Maybe you are looking for