Writing Regular expression

Hi All,
I want to use the regular expression for the following statement, So how should I write that.
If any string that is of letter (mr,Mr,mr.,Mr.,Mr .,mr .) should be trated as Male or M. Like wise for female also.
DECODE(title,'MR','M','MR.','M','MISS','F','MRS','F','MRS.','F','MS.','F','Unknown') AS male_femalePlease can anyone help to write the REG EXP?
thanks.

Hi Sid,
Assuming 'Mr .' is OK but 'Mr ' is not, you can use
with t as (
select 'MR' as title from dual union all
select 'Mr.' as title from dual union all
select 'Mr .' as title from dual union all
select 'Mr ' as title from dual union all
select 'MISS' as title from dual union all
select 'MRS' as title from dual union all
select 'MRS.' as title from dual union all
select 'ms.' as title from dual union all
select 'MISSTAKE' as title from dual
select
title
, case
    when regexp_like(title,'^MR( \.|\.?)$','i') then 'M'
    when regexp_like(title,'^(MISS|MRS\.?|MS.)$','i') then 'F'
    else 'Unknown'
end as Male_Female
from t
TITLE    MALE_FEMALE
MR       M
Mr.      M
Mr .     M
Mr       Unknown
MISS     F
MRS      F
MRS.     F
ms.      F
MISSTAKE Unknown     Regards,
Bob

Similar Messages

Need help in writing regular expressions involving \w

Hi,
Here is my requirement .
I have a string : GTA - 12AB TRA - 12AB
I need a regex that represent above string.
GTA - Constant - This wont change
12AB - This will be \w (alphanumeric)
Here I cannot have TRA within this 4 characters.
The question is :
How can I write an expression which says it can be a word(the positions where I have 12AB in example) but not TRA in sequence.
Is this doable?
Thanks in advance.

Use lookarounds: [http://www.regular-expressions.info/lookaround.html]
The regex:
(?!.?TRA).{4}matches any 4 characters (except line breaks) that does not contain 'TRA'.

Writing Regular Expression with a character ^, too difficult

I want to change "^1Mandrake ^3Style ^4DM" this sentence to "Mandrake Style DM".
(^ with number means color code)
So..I used String.replaceAll() method with regular expression.
But however hard I try, I cant find any solution for this.
In php I could use \^ as a ^ character, but java dosnt support \^.
How can I solve this problem?

Use \\^ in your regex (you have to escape the slash, too).

Writing a particular regular expression

I'm trying to come up with a regular expression that I would use to scan a list of strings and store the ones I want. I have most of it working properly but a section of it stumps me.
In order to make my problem as clear as possible, I'll assume that I have some strings that look like the following:
<catapult>
<yay>
<griffin>
<goo>
<dog_man>
<ok>
<>
What I would like to do, is to make the entire string a match, as long as it doesn't contain certain characters and certain substrings.
For instance, say, as long as it doesn't contain the characters "y" or "x", and doesn't contain "cat" or "riff", and is not "goo" then it's a good match. So, the first four in my list should not match and the last three should.
What I tried to do, was create a set of just negating the characters and the substrings contained in groups, which I didn't want. It looked something like this:
<(?:[[^yx]&&[^(?:.*cat.*)]&&[^(?:.*riff.*)]&&[^(?:goo)]]*)>
Apparently this did not produce the results I desired. I assume this can be done but I just can't figure out how.
I hope I made sense. Does anyone have any idea what I need to do?

Wow, thanks so much. That's really helpful. No wonder I couldn't get it. I looked over the tutorials so that helped to clear things up a bit.
Actually, sjasja, I would've done something like that but what I mentioned was just a portion of the patterns that I wanted to match, so although I'm inexperienced with complicated regexes, I figured it would be much neater and concise if I could get it to do what I wanted.
There is a slightly longer, but similar, expression that I would like to match but I suspect I might've made a mistake.
What I want is to have examine a line, and group every instance of what I find within '[ ]'. The substring that would match would be be found in between < >. The first part I would like to be a group of characters (or no characters at all) which don't contain cat, CAT, x or y (anywhere in this area). This time though, I would like something else before the end bracket, so I don't know if I should include the '>' like in the original expression you provided, or not. Following it should be 'cat' or CAT' and then '[ ] >'
What I'm trying to get this pattern to match, are strings like these:
<cat [yay]>
< something CAT [this] >
< dog cat [cat3]>
but not like these:
< cat CAT [yay] >
<catch [HI]>
< CAT CAT [ ] >
What I attempted to do was write the regex as
<\\s*(?:(?!cat|CAT)[^xy])*(?:cat|CAT)\\s*\\[(.*)\\]\\s*>
but I think that (?:(?!cat|CAT)[^xy]) might also match up with the trailing characters, which would result in bad matches. Am I mistaken in this?
I'm sure you're all busy people, but a tip in the right direction would be great.
Thanks again.

Regular expression alphabets

Hi
I want to retrieve the data if the data contains a character or a space or '-' thru select query .
Please help me in writing the combination of 3 with regular expression.
Thanks!!

VT wrote:
Hi,
Try this
SELECT *
FROM <TABLE> WHERE REGEXP_LIKE(<COLUMN>, '[a-z -][A-Z -]');cheers
VTThat won't work as it's expecting at least two characters with the first having to be a-z (lower case) or space or "-" followed by A-Z (upper case) or space or "-".
The correct way is either:
[a-zA-Z -]or
[[:alpha:] -]using the alpha set is often preferable as it can work differently with different character sets/languages rather than restricting to just the a-zA-Z ranges.
Generating a reference for your own database characterset/language can be useful...
SQL> select level-1 as asc_code, decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), CHR(level-1)) as chr,
2 decode(chr(level-1), regexp_substr(chr(level-1), '[[:graph:]]'), 1) is_graph,
3 decode(chr(level-1), regexp_substr(chr(level-1), '[[:blank:]]'), 1) is_blank,
4 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alnum:]]'), 1) is_alnum,
5 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alpha:]]'), 1) is_alpha,
6 decode(chr(level-1), regexp_substr(chr(level-1), '[[:digit:]]'), 1) is_digit,
7 decode(chr(level-1), regexp_substr(chr(level-1), '[[:cntrl:]]'), 1) is_cntrl,
8 decode(chr(level-1), regexp_substr(chr(level-1), '[[:lower:]]'), 1) is_lower,
9 decode(chr(level-1), regexp_substr(chr(level-1), '[[:upper:]]'), 1) is_upper,
10 decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), 1) is_print,
11 decode(chr(level-1), regexp_substr(chr(level-1), '[[:punct:]]'), 1) is_punct,
12 decode(chr(level-1), regexp_substr(chr(level-1), '[[:space:]]'), 1) is_space,
13 decode(chr(level-1), regexp_substr(chr(level-1), '[[:xdigit:]]'), 1) is_xdigit
14 from dual
15 connect by level <= 256
16 /
ASC_CODE C IS_GRAPH IS_BLANK IS_ALNUM IS_ALPHA IS_DIGIT IS_CNTRL IS_LOWER IS_UPPER IS_PRINT IS_PUNCT IS_SPACE IS_XDIGIT
 0 1
 1 1
 2 1
 3 1
 4 1
 5 1
 6 1
 7 1
 8 1
 9 1 1
 10 1 1
 11 1 1
 12 1 1
 13 1 1
 14 1
 15 1
 16 1
 17 1
 18 1
 19 1
 20 1
 21 1
 22 1
 23 1
 24 1
 25 1
 26 1
 27 1
 28 1
 29 1
 30 1
 31 1
 32 1 1 1
 33 ! 1 1 1
 34 " 1 1 1
 35 # 1 1 1
 36 $ 1 1 1
 37 % 1 1 1
 38 & 1 1 1
 39 ' 1 1 1
 40 ( 1 1 1
 41 ) 1 1 1
 42 * 1 1 1
 43 + 1 1 1
 44 , 1 1 1
 45 - 1 1 1
 46 . 1 1 1
 47 / 1 1 1
 48 0 1 1 1 1 1
 49 1 1 1 1 1 1
 50 2 1 1 1 1 1
 51 3 1 1 1 1 1
 52 4 1 1 1 1 1
 53 5 1 1 1 1 1
 54 6 1 1 1 1 1
 55 7 1 1 1 1 1
 56 8 1 1 1 1 1
 57 9 1 1 1 1 1
 58 : 1 1 1
 59 ; 1 1 1
 60 < 1 1 1
 61 = 1 1 1
 62 > 1 1 1
 63 ? 1 1 1
 64 @ 1 1 1
 65 A 1 1 1 1 1 1
 66 B 1 1 1 1 1 1
 67 C 1 1 1 1 1 1
 68 D 1 1 1 1 1 1
 69 E 1 1 1 1 1 1
 70 F 1 1 1 1 1 1
 71 G 1 1 1 1 1
 72 H 1 1 1 1 1
 73 I 1 1 1 1 1
 74 J 1 1 1 1 1
 75 K 1 1 1 1 1
 76 L 1 1 1 1 1
 77 M 1 1 1 1 1
 78 N 1 1 1 1 1
 79 O 1 1 1 1 1
 80 P 1 1 1 1 1
 81 Q 1 1 1 1 1
 82 R 1 1 1 1 1
 83 S 1 1 1 1 1
 84 T 1 1 1 1 1
 85 U 1 1 1 1 1
 86 V 1 1 1 1 1
 87 W 1 1 1 1 1
 88 X 1 1 1 1 1
 89 Y 1 1 1 1 1
 90 Z 1 1 1 1 1
 91 [ 1 1 1
 92 \ 1 1 1
 93 ] 1 1 1
 94 ^ 1 1 1
 95 _ 1 1 1
 96 ` 1 1 1
 97 a 1 1 1 1 1 1
 98 b 1 1 1 1 1 1
 99 c 1 1 1 1 1 1
 100 d 1 1 1 1 1 1
 101 e 1 1 1 1 1 1
 102 f 1 1 1 1 1 1
 103 g 1 1 1 1 1
 104 h 1 1 1 1 1
 105 i 1 1 1 1 1
 106 j 1 1 1 1 1
 107 k 1 1 1 1 1
 108 l 1 1 1 1 1
 109 m 1 1 1 1 1
 110 n 1 1 1 1 1
 111 o 1 1 1 1 1
 112 p 1 1 1 1 1
 113 q 1 1 1 1 1
 114 r 1 1 1 1 1
 115 s 1 1 1 1 1
 116 t 1 1 1 1 1
 117 u 1 1 1 1 1
 118 v 1 1 1 1 1
 119 w 1 1 1 1 1
 120 x 1 1 1 1 1
 121 y 1 1 1 1 1
 122 z 1 1 1 1 1
 123 { 1 1 1
 124 | 1 1 1
 125 } 1 1 1
 126 ~ 1 1 1
 127 1
 128 Ç 1 1 1
etc.
{code}

Help with regular expression to find a pattern in clob

can someone help me writing a regular expression to query a clob that containts xml type data?
query to find multiple occurrences of a variable string (i.e <EMPID-XX> - XX can be any number). If <EMPID-01> appears twice in the clob i want the result as EMPID-01,2 and if EMPID-02 appears 4 times i want the result as EMPID-02,4.

with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || to_char(ids) || '(' || to_char(count(*)) || ')' multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
 from ofx_clob
 connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(2)
with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || listagg(to_char(ids) || '(' || to_char(count(*)) || ')',',') within group (order by ids) multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
 from ofx_clob
 connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(3),2(2)
Regards
Etbin
Message was edited by: Etbin
used listagg to report more than one multiple <EMPID>

How do I have to define a regular expression to filter out data from file?

Hi all,
I need to extract parts of lines of a ASCII file and didn't get it done with my low knowledge of regular expressions
The file contains hundreds of lines and I am just interested in a few lines, within that lines I just need a part of the data.
One original line looks like that:
TP3| |TP_SMD|Nicht in Stueckliste|~TP TP_SMD TESTPUNKT|-|0|87.770|157.950|0|top|c| |other|TP_SMD|TP_SMD_60RF-TP
Only the bold and underlined information is of interest, I don't need the rest.
I can open that file, read in each line but then I am struggling to pick out only the lines of interest (starting with TP), taking that TP with its number and the coordinates following later on and then writing these shortened lines to a new text file. So the new line should look like that:
TP3; 87.770;157.950;0 (It doesn't matter if the separator will be ; or |)
I thought of using regular expressions - is that the right way or is there a better approach?
Thanks & regards,
gedi, using LabVIEW 8.5
Regards,
gedi

Hi max,
for finding a specific part of a string you can use the "Match Pattern" VI, it is located in the Strings Palette.
Maybe the Extract Numbers.vi example in the examples browser library can help you.
What I did to filter out my data of interest is first to sort out only the columns which I want to have -
then there are still a lot of lines remaining I don't need (this is the thing described above).
The rest I am going to filter out with a (then easy) regular expression with the "Match Pattern" VI.
Regards,
gedi
Regards,
gedi

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Regular expression in FIND statement

Hi All,
I am writing the regular expressions.
But i didn't get properly how to write them.
I have one internal table with the five fields.
Exapmle wa-mandt = '800'.
wa_number = '3768'
wa_path = '/usr/tmp/sapuser/3768/test.txt.'
append wa to itab.
Loop at itab itno wa.
Here i need to find client and number system id from the WA using regular expression in singe line
endloop.
Can anybody please explain how to write this.
Thanks,

Hi,
What do you mean by FIND?
If I got it right, you can use a READ statement with KEY f1 f2 etc BINARY SEARCH.Mention all the fields you want in the KEY fields.
Dont forget to SORT this itab before the loop.
Thanks
Kiran

Java Regular Expressions and Pattern

I have a file that i first want to get all the lines that match a given pattern. Then from these lines that match i want to extract two values.
Example line for the pattern to match
INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'
So all the lines that are like these i want to extract two variables
2006/11/07 15:14:09
and
/opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf
so i can store these variables in a database.
Can someone help me with writing the pattern to match and the regular express to extract? Also if anyone else has a better way of doing this i am all ears and i have a lot of log files to go through.

import java.util.regex.*;
class Main
public static void main(String[] args)
    String txt="INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'";
    String re1=".*?";     // Non-greedy match on filler
    String re2="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";     // Time Stamp 1
    String re3=".*?";     // Non-greedy match on filler
    String re4="((?:\\/[\\w\\.]+)+)";     // Unix Path 1
    Pattern p = Pattern.compile(re1+re2+re3+re4,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    Matcher m = p.matcher(txt);
    if (m.find())
        String timestamp1=m.group(1);
        String unixpath1=m.group(2);
        System.out.print("("+timestamp1.toString()+")"+"("+unixpath1.toString()+")"+"\n");
}

Introduction to regular expressions ... continued.

After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
Having fun with regular expressions - Part 2
Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
CREATE OR REPLACE PACKAGE regex_utils AS
-- Regular Expression Utilities
-- Version 0.1
TYPE t_outrec IS RECORD(
 data VARCHAR2(255)
TYPE t_outtab IS TABLE OF t_outrec;
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED;
END regex_utils;
CREATE OR REPLACE PACKAGE BODY regex_utils AS
-- FUNCTION gen_data returns a collection of generated varchar2 elements
FUNCTION gen_data(
 p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED
IS
 TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
 v_counter t_counter;
 v_exit BOOLEAN;
 v_string VARCHAR2(255);
 v_outrec t_outrec;
BEGIN
 FOR max_length IN 1..p_length
 LOOP
 -- init counter loop
 FOR i IN 1..max_length
 LOOP
 v_counter(i) := 1;
 END LOOP;
 -- start data generation loop
 v_exit := FALSE;
 WHILE NOT v_exit
 LOOP
 -- start generation
 v_string := '';
 FOR i IN 1..max_length
 LOOP
 v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
 END LOOP;
 -- set outgoing record
 v_outrec.data := v_string;
 -- now pipe the result
 PIPE ROW(v_outrec);
 -- increment loop
 <<inc_loop>>
 FOR i IN REVERSE 1..max_length
 LOOP
 v_counter(i) := v_counter(i) + 1;
 IF v_counter(i) > LENGTH(p_charset) THEN
 IF i > 1 THEN
 v_counter(i) := 1;
 ELSE
 v_exit := TRUE;
 END IF;
 ELSE
 -- no further processing required
 EXIT inc_loop;
 END IF;
 END LOOP;
 END LOOP;
 END LOOP;
END gen_data;
END regex_utils;
/This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
SELECT *
FROM (SELECT data col1
 FROM TABLE(regex_utils.gen_data('+-.0', 5))
 ) t
WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
 '^([+-]?[^+-]+|[^+-]+[+-]?)$'
 '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
FROM dual
;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '( )\1*|.', '\1')))
FROM dual
;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
Let's compare our solutions than, shall we?
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions. !',
 '([a-z])+|.', '\1', 1, 0, 'i')), 0)
FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
WITH t AS (SELECT 'Where is the last space?' col1
 FROM dual)
SELECT INSTR(col1, ' ', -1)
FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')
FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
FROM dual
;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
SELECT REGEXP_REPLACE('10~20~30~40~50',
 '^(.*)~(.*)~(.*)~(.*)~(.*)$',
 '\5~\4~\3~\2~\1'
FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
Continued in Introduction to regular expressions ... last part..
C.
Fixed some typos and a flawed example ...
cd

Thank you very much C. Awaiting other parts.... keep going.
One german typo :-)
I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
tnsnames voor EDWH:
PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
 (HOST=blah.blah.blah.com)
 (PORT=5227)))
 (CONNECT_DATA=(SID=pcescrd1)))
db user: BW_I2_VIEWER / BW_I2_VIEWER_SCRD1Had to look for translators.
Cheers
Sarma.

BUG?? Syntax Highlighting Regular Expressions

I'm working on a rather long regular expression in
dreamweaver code view. It has about 200+ characters in it.
I wrote it in another application and copied it over for use
in js, and it wouldn't highlight correctly (sort of like when you
leave off quote on a string). Naturally I thought I screwed up my
regexp or forgot to escape a character or something so I went
through it re-writing it and checking everything but it was
correct. I saved the page and ran it in the browser and sure enough
it worked.
So I started typing the regexp from scratch this time in
dreamweaver to see when it stopped highlighting correctly. It
stopped at exactly 100 characters including opening and closing
forward slashes. I tried writing another exp this time filled with
just one letter. Again 100 characters exactly - not one more.
Example:
var reg1 =
/ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss ssssssssss/
var reg2 =
/ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss sssssssssss/
reg1 will highlight correctly, while reg2 won't.
Is this a bug or am I missing something? The highlight rule
(in Configuration\CodeColoring\CodeColoring.xml) is
(\s*/\e*\\/
-or-
=\s*/ \e*\\ /
There seems to be no limitation to length there.
While this is not a huge deal, I might as was well be coding
in notepad (or my other script editors which highlight this
correctly) because the highlighting is worthless from this point
on.
I could make these strings instead of literals but I have a
lot of these long expressions and would rather not go through them
and escape all of my back slashes (there are tons) as well as
quotes - and make them more un-maintainable as they already are.
Anyone have this problem? Or a solution to it?

random_acts wrote:
> I'm working on a rather long regular expression in
dreamweaver code view. It
> has about 200+ characters in it.
I have never written a regex as long as that, but was
fascinated by your
question, so attempted to replicate your problem.
I gave up at 614 characters, but the syntax coloring didn't.
I suggest
that you submit a bug report with the details of your actual
code:
http://www.adobe.com/cfusion/mmform/index.cfm?name=wishform
David Powers
Author, "Foundation PHP for Dreamweaver 8" (friends of ED)
Author, "Foundation PHP 5 for Flash" (friends of ED)
http://foundationphp.com/

Regular expressions with multi character separator

I have data like the
where |`| is the separator for distinguishing two fields of data. I am having trouble writing a regular expression to display the data correctly.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> declare
2 l_string varchar2 (200) :='123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes';
3 v varchar2(40);
4 begin
5 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 1);
6 dbms_output.put_line(v);
7 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 2);
8 dbms_output.put_line(v);
9 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 3);
10 dbms_output.put_line(v);
11 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 4);
12 dbms_output.put_line(v);
13 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 5);
14 dbms_output.put_line(v);
15 end;
16 /
123
456
789 10 here
223
5434I need it to display
123` 456
789 10 here
|223
5434|`}22
yesI am not sure how to handle multi character separators in data using reg expressions
Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:35 PM
Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:37 PM

Hi,
Actually, using non-greedy matching, you can do what you want with regular expressions:
VARIABLE l_string VARCHAR2 (100)
EXEC :l_string := '123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes'
SELECT LEVEL
, REPLACE ( REGEXP_SUBSTR ( '|`|' || REPLACE ( :l_string
 , '|`|'
 , '|`||`|'
 ) || '|`|'
 , '\|`\|.*?\|`\|'
 , 1
 , LEVEL
 , '|`|'
 ) AS ITEM
FROM dual
CONNECT BY LEVEL <= 7
;Output:
LEVEL ITEM
 1 123` 456
 2 789 10 here
 3 |223
 4 5434|`}22
 5 yes
 6
 7Here's how it works:
The pattern
~.*?~is non-greedy ; it matches the smallest possible string that begins and ends with a '~'. So
REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 1) returns '~SHALL~'. However,
REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 2) returns '~COMPARE~'. Why not '~I~'? Because the '~' between 'SHALL' and 'I' was part of the 1st pattern, so it can't be part of the 2nd pattern. So the first thing we have to do is double the delimiters; that's what the inner REPLACE does. The we add delimiters to the beginning and end of the list. Once we've done prepared the string like that, we can use the non-greedy REGEXP_SUBSTR to bring back the delimited items, with a delimiter at either end. We don't want those delimiters, so the outer REPLACE removes them.
I'm not sure this is any better than Sri's solution.

Regular Expressions for converting HTML to Structured Plain Text

I'm writing a PL/SQL function that will convert HTML to plain text, but still preserve some of the formatting/line breaks. One of my challenges is in writing a regular expression to capture the text blocks while ignoring the markup. I'm trying to write an expression that will grab all of the text between start/end tags, but discard the tags. For example, to find all of the text between a start/end paragraph, I want to do something like:
REGEXP_REPLACE('This is the body of the paragraph', '<p.*>(.*)', '\1||v_crlf' )
where \1 returns the contents of the paragraph and v_crlf (declared earlier in the function) inserts a line break. I know there are more general expressions that will remove all tags, but I want to specifically identify the tags so I can process them appropriately. This way I can easily convert HTML to plain text for email and reporting without having to keep two versions around. Any help would be greatly appreciated. Once I get this worked out, I will repost with the function code for others to use. Thanks.
Edited by: jritschel on Oct 26, 2010 9:58 AM

Here's a function I wrote for an app. I'm not making in promises on it's accuracy as the app was just a proof of concept and never made it to production.
function strip_html( p_clob in clob )
return clob
is
 l_out clob;
 l_test number := 0;
 l_max_loops constant number := 20;
 i pls_integer := 0;
begin
 l_out := regexp_replace(p_clob,' | ',chr(13)||chr(10),1,0,'imn');
 l_out := regexp_replace(l_out,'',chr(13)||chr(10),1,0,'imn');
 l_out := replace(l_out,'<li>',chr(13)||chr(10)||'*<li>');
 l_out := regexp_replace(l_out,'(.+?)','*\1*',1,0,'imn');
 l_out := regexp_replace(l_out,'(.+?)','_\1_',1,0,'imn');
 loop
 l_test := regexp_instr(l_out,'<([A-Z][A-Z0-9]*)[^>]*>.*?</\1>',1,1,0,'imn');
 exit when l_test = 0 or i > l_max_loops;
 l_out := regexp_replace(l_out,'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>','\2',1,0,'imn');
 i := i + 1;
 end loop;
 return l_out;
end strip_html;{code}
The loop is there to handle nested HTML.
Tyler Muth
http://tylermuth.wordpress.com
"Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book
Edited by: Tyler on Oct 26, 2010 10:03 AM

Regular Expressions - Dictionary List

Good day all -
I am trying to create a signature(s) to provide a minimalistic "content management" scenario. We have a list of about 150 words that we need to flag if they are seen in user data. I know how to create the regex string for a single word ... and can use the | pipe to separate the words to allow me to combine multiple words into a single signature ... but just how large is the STRING field? 255? 128? unlimited?
The idea hopefully is to use only 10 - 20 signatures to cover the whole list. Certainly hope to avoid having to write a new signature for each word!
Looking for suggestions and/or experiences of anyone else having attempted to do something like this.
Maybe someone found that you could insert unlimited words in the list but by doing so they overtaxed the sensor ... or that it appeared that using more than 10 words in a list was an iffy proposition.
All your inputs will be appreciated - whether I like what I hear or not! Thanks everyone.
Hank Schupp

It all depends on how many states the regular expression will create in the engine. The maximum is 64K bytes, which is a pretty long string. You will have to experiment to find the maximum number of words you can pipe into a single signature. I would recommend dividing the 150 words into different categories and writing one signature for each category. In general, writing one signature for 20 words will make it easy to manage.

Writing Regular expression

Similar Messages

Maybe you are looking for