Writing a particular regular expression

I'm trying to come up with a regular expression that I would use to scan a list of strings and store the ones I want. I have most of it working properly but a section of it stumps me.
In order to make my problem as clear as possible, I'll assume that I have some strings that look like the following:
<catapult>
<yay>
<griffin>
<goo>
<dog_man>
<ok>
<>
What I would like to do, is to make the entire string a match, as long as it doesn't contain certain characters and certain substrings.
For instance, say, as long as it doesn't contain the characters "y" or "x", and doesn't contain "cat" or "riff", and is not "goo" then it's a good match. So, the first four in my list should not match and the last three should.
What I tried to do, was create a set of just negating the characters and the substrings contained in groups, which I didn't want. It looked something like this:
<(?:[[^yx]&&[^(?:.*cat.*)]&&[^(?:.*riff.*)]&&[^(?:goo)]]*)>
Apparently this did not produce the results I desired. I assume this can be done but I just can't figure out how.
I hope I made sense. Does anyone have any idea what I need to do?

Wow, thanks so much. That's really helpful. No wonder I couldn't get it. I looked over the tutorials so that helped to clear things up a bit.
Actually, sjasja, I would've done something like that but what I mentioned was just a portion of the patterns that I wanted to match, so although I'm inexperienced with complicated regexes, I figured it would be much neater and concise if I could get it to do what I wanted.
There is a slightly longer, but similar, expression that I would like to match but I suspect I might've made a mistake.
What I want is to have examine a line, and group every instance of what I find within '[ ]'. The substring that would match would be be found in between < >. The first part I would like to be a group of characters (or no characters at all) which don't contain cat, CAT, x or y (anywhere in this area). This time though, I would like something else before the end bracket, so I don't know if I should include the '>' like in the original expression you provided, or not. Following it should be 'cat' or CAT' and then '[ ] >'
What I'm trying to get this pattern to match, are strings like these:
<cat [yay]>
< something CAT [this] >
< dog cat [cat3]>
but not like these:
< cat CAT [yay] >
<catch [HI]>
< CAT CAT [ ] >
What I attempted to do was write the regex as
<\\s*(?:(?!cat|CAT)[^xy])*(?:cat|CAT)\\s*\\[(.*)\\]\\s*>
but I think that (?:(?!cat|CAT)[^xy]) might also match up with the trailing characters, which would result in bad matches. Am I mistaken in this?
I'm sure you're all busy people, but a tip in the right direction would be great.
Thanks again.

Similar Messages

Difference between regular expressions and spry character masking?

Hi,
This is my first time writing my own regular expressions. Often times though, they seem to work in various testing widgets, but then they do not perform as expected in Spry. I have no idea how to even begin to debug this.
For example, this string:
^\#?[A-Fa-f0-9]{3}([A-Fa-f0-9]{3})?$
Does a perfect job enforcing hex colors in a regexp testing widget. But it doesn't work in spry. It won't let me type a darn thing in.
Can somebody throw me a bone here?

Hi!
Thank you for the response. I read that article prior to posting and it seems to relate more to Spry's custom pattern function rather than regular expressions. Here's the code I have:
<script type="text/javascript">
     
</script>
Expected behavior: I should be able to type in a valid hex color and have Spry perform validation.
Actual behavior: I can't type anything in, at all. I immediately get the invalid Spry feedback (in my case a little red .png image and an error message).
Simpler expressions like this work fine in Spry:
                    <script type="text/javascript">
     
</script>
I think if I can figure out what the special rules are for one somewhat robust regular expression in Spry, then I will be off and running.
Can anyone help?
Scott

Regular expressions for xml parsing

I have a xml parsing problem that I have to solve using regular expressions. It's not possible for me to use a different method other than regular expression. But there is a problem that I cannot seem to rap my head around. I want to extract the contents of a tag but the problem is that this tag occurs serveral times in the XML file but I only want the contents of one particular occurence. Basically the problem is as follows;
I want to extract
<bp:NAME ***stufff***>(I want this part)</bp:NAME>This tag can occur is serval places. For example here;
<bp:ORGANISM>
***bunch of tags***
<bp:NAME ***stufff***>***stufff***</bp:NAME>
***bunch of tags***
</bp:ORGANISM>or here;
<bp:DATABASE>
***bunch of tags***
<bp:NAME ***stufff***>***stufff***</bp:NAME>
***bunch of tags***
</bp:DATABASE>I do not want the content of those tags. I want the content of the <NAME> tag that is not between either the <ORGANISM> tags or the <DATABASE> tags. These tags can be in any order. I for the life of me cannot seem to figure this problem out. I tried several different approaches. For example I tried using the following regex
(?:<bp:NAME [^>]*>([^<]*).*?<bp:ORGANISM>.*?</bp:ORGANISM>|
<bp:ORGANISM>.*?</bp:ORGANISM>.*?<bp:NAME [^>]*>([^<]*))This kind of works, the information I want is either in the first captured group or in the second one. So I just check which group is not empty and that is the one I want. But this only works if there is only one other tag containing the name tag (in this particular regular expression that is the organism tag). Since there is another tag (the database tag) I have to work around, and these tags can be in any order, the regular expression then becomes three times as large and then there are six different groups in which the information I want can occur. This does not seem like a good idea to me. There has to be another way to do this. So I tried using the following regex;
(?:</bp:ORGANISM>)?.*?(?:</bp:DATABASE>)?.*?<bp:NAME [^>]*>([^<]*)I thought this would get rid of any occurences of the other tags in front of the name tag, but it doesn't work either. It seems like it is not greedy enough. Well I think you get the point. I don't know what to try next so I really need some help.
Here is an example of the type of data I will run into. The tags can be in any order and they do not always have to occur. In the example below the <DATABASE> tag is not part of the data and the name tag I want just happens to be in front of the organism tag but this is not always the case. The name tag I want is the firstname tag in the file, namely;
<bp:NAME rdf:datatype="xsd:string">Progesterone receptor</bp:NAME>So I don't want the name tag that is in between the organism tags.
<bp:protein rdf:ID="CPATH-27885">
−<bp:COMMENT rdf:datatype="xsd:string">
Belongs to the nuclear hormone receptor family. NR3 subfamily. SIMILARITY: Contains 1 nuclear receptor DNA-binding domain. WEB RESOURCE: Name=NIEHS-SNPs; URL="http://egp.gs.washington.edu/data/pgr/"; WEB RESOURCE: Name=Wikipedia; Note=Progesterone receptor entry; URL="http://en.wikipedia.org/wiki/Progesterone_receptor"; GENE SYNONYMS: NR3C3. COPYRIGHT: Protein annotation is derived from the UniProt Consortium (http://www.uniprot.org/). Distributed under the Creative Commons Attribution-NoDerivs License.
</bp:COMMENT>
<bp:SYNONYMS rdf:datatype="xsd:string">Nuclear receptor subfamily 3 group C member 3</bp:SYNONYMS>
<bp:SYNONYMS rdf:datatype="xsd:string">PR</bp:SYNONYMS>
<bp:NAME rdf:datatype="xsd:string">Progesterone receptor</bp:NAME>
−<bp:ORGANISM>
−<bp:bioSource rdf:ID="CPATH-LOCAL-112384">
<bp:NAME rdf:datatype="xsd:string">Homo sapiens</bp:NAME>
−<bp:TAXON-XREF>
−<bp:unificationXref rdf:ID="CPATH-LOCAL-112385">
<bp:DB rdf:datatype="xsd:string">NCBI_TAXONOMY</bp:DB>
<bp:ID rdf:datatype="xsd:string">9606</bp:ID>
</bp:unificationXref>
</bp:TAXON-XREF>
</bp:bioSource>
</bp:ORGANISM>
<bp:SHORT-NAME rdf:datatype="xsd:string">PRGR_HUMAN</bp:SHORT-NAME>
−<bp:XREF>
−<bp:relationshipXref rdf:ID="CPATH-LOCAL-112386">
<bp:DB rdf:datatype="xsd:string">ENTREZ_GENE</bp:DB>
<bp:ID rdf:datatype="xsd:string">5241</bp:ID>
</bp:relationshipXref>
</bp:XREF>
−<bp:XREF>
−<bp:unificationXref rdf:ID="CPATH-LOCAL-112387">
<bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
<bp:ID rdf:datatype="xsd:string">P06401</bp:ID>
</bp:unificationXref>
</bp:XREF>
−<bp:XREF>
−<bp:unificationXref rdf:ID="CPATH-LOCAL-112388">
<bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
<bp:ID rdf:datatype="xsd:string">A7X8B0</bp:ID>
</bp:unificationXref>
</bp:XREF>
−<bp:XREF>
−<bp:relationshipXref rdf:ID="CPATH-LOCAL-112389">
<bp:DB rdf:datatype="xsd:string">GENE_SYMBOL</bp:DB>
<bp:ID rdf:datatype="xsd:string">PGR</bp:ID>
</bp:relationshipXref>
</bp:XREF>
−<bp:XREF>
−<bp:relationshipXref rdf:ID="CPATH-LOCAL-112390">
<bp:DB rdf:datatype="xsd:string">REF_SEQ</bp:DB>
<bp:ID rdf:datatype="xsd:string">NP_000917</bp:ID>
</bp:relationshipXref>
</bp:XREF>
−<bp:XREF>
−<bp:unificationXref rdf:ID="CPATH-LOCAL-112391">
<bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
<bp:ID rdf:datatype="xsd:string">Q9UPF7</bp:ID>
</bp:unificationXref>
</bp:XREF>
−<bp:XREF>
−<bp:unificationXref rdf:ID="CPATH-LOCAL-113580">
<bp:DB rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CPATH</bp:DB>
<bp:ID rdf:datatype="http://www.w3.org/2001/XMLSchema#string">27885</bp:ID>
</bp:unificationXref>
</bp:XREF>
</bp:protein>Edited by: Dani3ll3 on Nov 19, 2009 2:51 AM

Dani3ll3 wrote:
Thanks a lot after I did that the regular expression worked. :)Good. But remember that in real life, you would then have to apply the XML rules to get the actual contents of the text node. For example it might be a CDATA section or it might include characters like ampersands which have been escaped and which you need to unescape. That's why it's better to use a proper parser, as already suggested.
It seems to me this forum is full of posts where people are doing homework questions which teach them to do things the wrong way. But of course there's nothing the student can do about that.

Need help in writing regular expressions involving \w

Hi,
Here is my requirement .
I have a string : GTA - 12AB TRA - 12AB
I need a regex that represent above string.
GTA - Constant - This wont change
12AB - This will be \w (alphanumeric)
Here I cannot have TRA within this 4 characters.
The question is :
How can I write an expression which says it can be a word(the positions where I have 12AB in example) but not TRA in sequence.
Is this doable?
Thanks in advance.

Use lookarounds: [http://www.regular-expressions.info/lookaround.html]
The regex:
(?!.?TRA).{4}matches any 4 characters (except line breaks) that does not contain 'TRA'.

Writing Regular Expression with a character ^, too difficult

I want to change "^1Mandrake ^3Style ^4DM" this sentence to "Mandrake Style DM".
(^ with number means color code)
So..I used String.replaceAll() method with regular expression.
But however hard I try, I cant find any solution for this.
In php I could use \^ as a ^ character, but java dosnt support \^.
How can I solve this problem?

Use \\^ in your regex (you have to escape the slash, too).

Writing Regular expression

Hi All,
I want to use the regular expression for the following statement, So how should I write that.
If any string that is of letter (mr,Mr,mr.,Mr.,Mr .,mr .) should be trated as Male or M. Like wise for female also.
DECODE(title,'MR','M','MR.','M','MISS','F','MRS','F','MRS.','F','MS.','F','Unknown') AS male_femalePlease can anyone help to write the REG EXP?
thanks.

Hi Sid,
Assuming 'Mr .' is OK but 'Mr ' is not, you can use
with t as (
select 'MR' as title from dual union all
select 'Mr.' as title from dual union all
select 'Mr .' as title from dual union all
select 'Mr ' as title from dual union all
select 'MISS' as title from dual union all
select 'MRS' as title from dual union all
select 'MRS.' as title from dual union all
select 'ms.' as title from dual union all
select 'MISSTAKE' as title from dual
select
title
, case
    when regexp_like(title,'^MR( \.|\.?)$','i') then 'M'
    when regexp_like(title,'^(MISS|MRS\.?|MS.)$','i') then 'F'
    else 'Unknown'
end as Male_Female
from t
TITLE    MALE_FEMALE
MR       M
Mr.      M
Mr .     M
Mr       Unknown
MISS     F
MRS      F
MRS.     F
ms.      F
MISSTAKE Unknown     Regards,
Bob

Regular expression alphabets

Hi
I want to retrieve the data if the data contains a character or a space or '-' thru select query .
Please help me in writing the combination of 3 with regular expression.
Thanks!!

VT wrote:
Hi,
Try this
SELECT *
FROM <TABLE> WHERE REGEXP_LIKE(<COLUMN>, '[a-z -][A-Z -]');cheers
VTThat won't work as it's expecting at least two characters with the first having to be a-z (lower case) or space or "-" followed by A-Z (upper case) or space or "-".
The correct way is either:
[a-zA-Z -]or
[[:alpha:] -]using the alpha set is often preferable as it can work differently with different character sets/languages rather than restricting to just the a-zA-Z ranges.
Generating a reference for your own database characterset/language can be useful...
SQL> select level-1 as asc_code, decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), CHR(level-1)) as chr,
2         decode(chr(level-1), regexp_substr(chr(level-1), '[[:graph:]]'), 1) is_graph,
3         decode(chr(level-1), regexp_substr(chr(level-1), '[[:blank:]]'), 1) is_blank,
4         decode(chr(level-1), regexp_substr(chr(level-1), '[[:alnum:]]'), 1) is_alnum,
5         decode(chr(level-1), regexp_substr(chr(level-1), '[[:alpha:]]'), 1) is_alpha,
6         decode(chr(level-1), regexp_substr(chr(level-1), '[[:digit:]]'), 1) is_digit,
7         decode(chr(level-1), regexp_substr(chr(level-1), '[[:cntrl:]]'), 1) is_cntrl,
8         decode(chr(level-1), regexp_substr(chr(level-1), '[[:lower:]]'), 1) is_lower,
9         decode(chr(level-1), regexp_substr(chr(level-1), '[[:upper:]]'), 1) is_upper,
10         decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), 1) is_print,
11         decode(chr(level-1), regexp_substr(chr(level-1), '[[:punct:]]'), 1) is_punct,
12         decode(chr(level-1), regexp_substr(chr(level-1), '[[:space:]]'), 1) is_space,
13         decode(chr(level-1), regexp_substr(chr(level-1), '[[:xdigit:]]'), 1) is_xdigit
14    from dual
15 connect by level <= 256
16 /
ASC_CODE C   IS_GRAPH   IS_BLANK   IS_ALNUM   IS_ALPHA   IS_DIGIT   IS_CNTRL   IS_LOWER   IS_UPPER   IS_PRINT   IS_PUNCT   IS_SPACE IS_XDIGIT
         0                                                                   1
         1                                                                   1
         2                                                                   1
         3                                                                   1
         4                                                                   1
         5                                                                   1
         6                                                                   1
         7                                                                   1
         8                                                                   1
         9                                                                   1                                              1
        10                                                                   1                                              1
        11                                                                   1                                              1
        12                                                                   1                                              1
        13                                                                   1                                              1
        14                                                                   1
        15                                                                   1
        16                                                                   1
        17                                                                   1
        18                                                                   1
        19                                                                   1
        20                                                                   1
        21                                                                   1
        22                                                                   1
        23                                                                   1
        24                                                                   1
        25                                                                   1
        26                                                                   1
        27                                                                   1
        28                                                                   1
        29                                                                   1
        30                                                                   1
        31                                                                   1
        32                       1                                                                            1                     1
        33 !          1                                                                                       1          1
        34 "          1                                                                                       1          1
        35 #          1                                                                                       1          1
        36 $          1                                                                                       1          1
        37 %          1                                                                                       1          1
        38 &          1                                                                                       1          1
        39 '          1                                                                                       1          1
        40 (          1                                                                                       1          1
        41 )          1                                                                                       1          1
        42 *          1                                                                                       1          1
        43 +          1                                                                                       1          1
        44 ,          1                                                                                       1          1
        45 -          1                                                                                       1          1
        46 .          1                                                                                       1          1
        47 /          1                                                                                       1          1
        48 0          1                     1                     1                                           1                                1
        49 1          1                     1                     1                                           1                                1
        50 2          1                     1                     1                                           1                                1
        51 3          1                     1                     1                                           1                                1
        52 4          1                     1                     1                                           1                                1
        53 5          1                     1                     1                                           1                                1
        54 6          1                     1                     1                                           1                                1
        55 7          1                     1                     1                                           1                                1
        56 8          1                     1                     1                                           1                                1
        57 9          1                     1                     1                                           1                                1
        58 :          1                                                                                       1          1
        59 ;          1                                                                                       1          1
        60 <          1                                                                                       1          1
        61 =          1                                                                                       1          1
        62 >          1                                                                                       1          1
        63 ?          1                                                                                       1          1
        64 @          1                                                                                       1          1
        65 A          1                     1          1                                           1          1                                1
        66 B          1                     1          1                                           1          1                                1
        67 C          1                     1          1                                           1          1                                1
        68 D          1                     1          1                                           1          1                                1
        69 E          1                     1          1                                           1          1                                1
        70 F          1                     1          1                                           1          1                                1
        71 G          1                     1          1                                           1          1
        72 H          1                     1          1                                           1          1
        73 I          1                     1          1                                           1          1
        74 J          1                     1          1                                           1          1
        75 K          1                     1          1                                           1          1
        76 L          1                     1          1                                           1          1
        77 M          1                     1          1                                           1          1
        78 N          1                     1          1                                           1          1
        79 O          1                     1          1                                           1          1
        80 P          1                     1          1                                           1          1
        81 Q          1                     1          1                                           1          1
        82 R          1                     1          1                                           1          1
        83 S          1                     1          1                                           1          1
        84 T          1                     1          1                                           1          1
        85 U          1                     1          1                                           1          1
        86 V          1                     1          1                                           1          1
        87 W          1                     1          1                                           1          1
        88 X          1                     1          1                                           1          1
        89 Y          1                     1          1                                           1          1
        90 Z          1                     1          1                                           1          1
        91 [          1                                                                                       1          1
        92 \          1                                                                                       1          1
        93 ]          1                                                                                       1          1
        94 ^          1                                                                                       1          1
        95 _          1                                                                                       1          1
        96 `          1                                                                                       1          1
        97 a          1                     1          1                                1                     1                                1
        98 b          1                     1          1                                1                     1                                1
        99 c          1                     1          1                                1                     1                                1
       100 d          1                     1          1                                1                  1                           1
       101 e          1                     1          1                                1                  1                           1
       102 f          1                     1          1                                1                  1                           1
       103 g          1                     1          1                                1                  1
       104 h          1                     1          1                                1                  1
       105 i          1                     1          1                                1                  1
       106 j          1                     1          1                                1                  1
       107 k          1                     1          1                                1                  1
       108 l          1                     1          1                                1                  1
       109 m          1                     1          1                                1                  1
       110 n          1                     1          1                                1                  1
       111 o          1                     1          1                                1                  1
       112 p          1                     1          1                                1                  1
       113 q          1                     1          1                                1                  1
       114 r          1                     1          1                                1                  1
       115 s          1                     1          1                                1                  1
       116 t          1                     1          1                                1                  1
       117 u          1                     1          1                                1                  1
       118 v          1                     1          1                                1                  1
       119 w          1                     1          1                                1                  1
       120 x          1                     1          1                                1                  1
       121 y          1                     1          1                                1                  1
       122 z          1                     1          1                                1                  1
       123 {          1                                                                                    1     1
       124 |          1                                                                                    1     1
       125 }          1                                                                                    1     1
       126 ~          1                                                                                    1     1
       127                                                                   1
       128 Ç          1                                                                                    1     1
etc.
{code}

Help with regular expression to find a pattern in clob

can someone help me writing a regular expression to query a clob that containts xml type data?
query to find multiple occurrences of a variable string (i.e <EMPID-XX> - XX can be any number). If <EMPID-01> appears twice in the clob i want the result as EMPID-01,2 and if EMPID-02 appears 4 times i want the result as EMPID-02,4.

with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || to_char(ids) || '(' || to_char(count(*)) || ')' multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
          from ofx_clob
        connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(2)
with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || listagg(to_char(ids) || '(' || to_char(count(*)) || ')',',') within group (order by ids) multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
          from ofx_clob
        connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(3),2(2)
Regards
Etbin
Message was edited by: Etbin
used listagg to report more than one multiple <EMPID>

Unix Log Monitoring regular expression not picking up alerts

Hi,
We are moving our unix monitoring to SCOM 2012 SP1 rollup 4.
What I have got working is indvidual alert logging of Unix Log alerts by exporting the MP and changing the <IndividualAlerts> value to true and removing the suppression xml section then reimporting the MP.
What I am trying to do is use the regular expression to peform the suppression of specific event (such as event codes).
The expression is:
((?i:warning)(?!(.*1222)|(.*1001)))
ie Search the log for "warning" (not case sensitive) then check if events 1222 or 1001 exist if so return no match, if they dont exist then return true.
I use the built in test function in SCOM when creating the rule and the tests come back as expected but when I inject test lines into the unix log, no alerts get generted.
I suspect it could be the syntax not being accepted on the system (its running RedHat 6 )
I have tested this with regex tools and works.
When I try and test it on the server i get:
[root@bld02 ~]# grep ((?i:Warning)(?!(.*1222)|(.*1001))) /var/log/messages
-bash: !: event not found
[root@bld02 ~]# tail /var/log/messages
Nov 13 15:07:26 bld02 root: SCOM Test Warning Event ID 1001 Round 18
Nov 13 15:07:29 bld02 root: SCOM Test Warning Event ID 1000 Round 18
Nov 13 15:07:35 bld02 root: SCOM Test Warning Event ID 1002 Round 18
So I am expecting 2 alerts to be generated.
SCOM tests to show expression working:
Test 1 Matching
Test 2 to exclude
Need some help with this, Thankyou in advance :)

Hello,
Here's an example of modifying the MP to exclude particular events. Firstly, I created a log file rule using the MP template that is fairly inclusive - matching the string Warning (with either a lower or upper case W).
I then exported the MP, and modified the rule. I set the IndividualAlerts = true and removed the AlertSuppression element, so that every matched line will fire a unique alert. You don't have to remove the AlertSuppression, but you should use
Individual alerts so that the exclusion logic doesn't exclude concurrent events that you actually want to match.
Implementing the exclusion logic involves the addition of a System.ExpressionFilter definition in the rule. This will use a conditional evaluation of the //row element of the data item. Here's an example of a dataitem matching an individual row:
<DataItem type="System.Event.Data"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
<EventOriginId>{86AB962D-2F44-29FD-A909-B99FF6FEB2C5}</EventOriginId>
<PublisherId>{EC7EA4B1-0EA5-7E8E-701F-82FEF3367BC4}</PublisherId>
<PublisherName>WSManEventProvider</PublisherName>
<EventSourceName>WSManEventProvider</EventSourceName>
<Channel>WSManEventProvider</Channel>
<LoggingComputer/>
<EventNumber>0</EventNumber>
<EventCategory>3</EventCategory>
<EventLevel>0</EventLevel>
<UserName/>
<RawDescription>Detected Entry: warning 1002</RawDescription>
<CollectDescription Type="Boolean">true</CollectDescription>
<EventData>
<DataItem type="SCXLogProviderDataSourceData"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
<SCXLogProviderDataSourceData>
<row>warning 1002</row>
</SCXLogProviderDataSourceData>
</DataItem>
</EventData>
<EventDisplayNumber>0</EventDisplayNumber>
<EventDescription>Detected Entry: warning 1002</EventDescription>
</DataItem>
Here is the rule in the MP XML. The <ConditionDetection>...</ConditionDetection> content was what I added to do the exclusion filtering:
<Rule ID="LogFileTemplate_66b86eaded094c309ffd2631b8367a32.Alert" Enabled="false" Target="Unix!Microsoft.Unix.Computer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>EventCollection</Category>
<DataSources>
<DataSource ID="EventDS" TypeID="Unix!Microsoft.Unix.SCXLog.VarPriv.DataSource">
<Host>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</Host>
<LogFile>/tmp/test</LogFile>
<UserName>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/UserName$</UserName>
<Password>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/Password$</Password>
<RegExpFilter>warning</RegExpFilter>
<IndividualAlerts>true</IndividualAlerts>
</DataSource>
</DataSources>
<ConditionDetection TypeID="System!System.ExpressionFilter" ID="Filter">
<Expression>
<RegExExpression>
<ValueExpression>
<XPathQuery Type="String">//row</XPathQuery>
</ValueExpression>
<Operator>DoesNotContainSubstring</Operator>
<Pattern>1001</Pattern>
</RegExExpression>
</Expression>
</ConditionDetection>
<WriteActions>
<WriteAction ID="GenerateAlert" TypeID="Health!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertName>Log File Alert: ExclusionExample</AlertName>
<AlertDescription>$Data/EventDescription$</AlertDescription>
</WriteAction>
</WriteActions>
</Rule>
I traced this with the Workflow Analyzer as I tested, which shows the logic being applied. Here is the exclusion happening:
Here's more info on the definition of an ExpressionFilter:
http://msdn.microsoft.com/en-us/library/ee692979.aspx
And more information on Regular Expressions in MPs:
http://support.microsoft.com/kb/2702651/en-us
You can also have multiple Expressions in the ExpressionFilter joined by OR or AND operators.
Also, if you are comfortable with the MP authoring, you can just skip the step of creating the rules in the MP template and just author your own MP with the VSAE tool:
http://social.technet.microsoft.com/wiki/contents/articles/18085.scom-2012-authoring-unixlinux-log-file-monitoring-rules.aspx
www.operatingquadrant.com

Re: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't forget about the regular expression potential

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

How do I have to define a regular expression to filter out data from file?

Hi all,
I need to extract parts of lines of a ASCII file and didn't get it done with my low knowledge of regular expressions
The file contains hundreds of lines and I am just interested in a few lines, within that lines I just need a part of the data.
One original line looks like that:
TP3| |TP_SMD|Nicht in Stueckliste|~TP TP_SMD TESTPUNKT|-|0|87.770|157.950|0|top|c| |other|TP_SMD|TP_SMD_60RF-TP
Only the bold and underlined information is of interest, I don't need the rest.
I can open that file, read in each line but then I am struggling to pick out only the lines of interest (starting with TP), taking that TP with its number and the coordinates following later on and then writing these shortened lines to a new text file. So the new line should look like that:
TP3; 87.770;157.950;0 (It doesn't matter if the separator will be ; or |)
I thought of using regular expressions - is that the right way or is there a better approach?
Thanks & regards,
gedi, using LabVIEW 8.5
Regards,
gedi

Hi max,
for finding a specific part of a string you can use the "Match Pattern" VI, it is located in the Strings Palette.
Maybe the Extract Numbers.vi example in the examples browser library can help you.
What I did to filter out my data of interest is first to sort out only the columns which I want to have -
then there are still a lot of lines remaining I don't need (this is the thing described above).
The rest I am going to filter out with a (then easy) regular expression with the "Match Pattern" VI.
Regards,
gedi
Regards,
gedi

How to use special characters in regular expression

HI all, I am new to regular expression.
Can any one please tell me the regular expression for characters which are used in regular expression like " [({. tetc.Is there any particular expression to prefix before using these characters

Expression:
< td .*? bgcolor = \" ( [^ \" ] +) \" \\s* .*? > ( .+? ) </td>
It will search for expression starting with <td ,
.*? means any characters zero or more than one,
then it will find bgcolor = , then literal \"... \(any char) is that specific char
[ ^ \" ] any character but not literal \",this means there has to be something between ".... " if empty then wont match ,+ is 1 or more times
Then again literal \ " , after that \\s* means zero or more num of spaces,
then again , .*? means any characters zero or more than one,
it will search for literal > , again any chars . * ?
Finally </td> will be searched.....!!
So all expressions having this particular structure will be
matched.
Output :
<td align="left" valign="top" bgcolor="ffffff" width="177">bla bla bla</td>

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Regular expression in FIND statement

Hi All,
I am writing the regular expressions.
But i didn't get properly how to write them.
I have one internal table with the five fields.
Exapmle wa-mandt = '800'.
wa_number = '3768'
wa_path = '/usr/tmp/sapuser/3768/test.txt.'
append wa to itab.
Loop at itab itno wa.
Here i need to find client and number system id from the WA using regular expression in singe line
endloop.
Can anybody please explain how to write this.
Thanks,

Hi,
What do you mean by FIND?
If I got it right, you can use a READ statement with KEY f1 f2 etc BINARY SEARCH.Mention all the fields you want in the KEY fields.
Dont forget to SORT this itab before the loop.
Thanks
Kiran

Java Regular Expressions and Pattern

I have a file that i first want to get all the lines that match a given pattern. Then from these lines that match i want to extract two values.
Example line for the pattern to match
INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'
So all the lines that are like these i want to extract two variables
2006/11/07 15:14:09
and
/opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf
so i can store these variables in a database.
Can someone help me with writing the pattern to match and the regular express to extract? Also if anyone else has a better way of doing this i am all ears and i have a lot of log files to go through.

import java.util.regex.*;
class Main
public static void main(String[] args)
    String txt="INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'";
    String re1=".*?";     // Non-greedy match on filler
    String re2="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";     // Time Stamp 1
    String re3=".*?";     // Non-greedy match on filler
    String re4="((?:\\/[\\w\\.]+)+)";     // Unix Path 1
    Pattern p = Pattern.compile(re1+re2+re3+re4,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    Matcher m = p.matcher(txt);
    if (m.find())
        String timestamp1=m.group(1);
        String unixpath1=m.group(2);
        System.out.print("("+timestamp1.toString()+")"+"("+unixpath1.toString()+")"+"\n");
}

Writing a particular regular expression

Similar Messages

Maybe you are looking for