Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION  
           SELECT 'x123'
             FROM dual
            UNION 
           SELECT 'y'
             FROM dual
            UNION 
           SELECT '+789'
             FROM dual
            UNION 
           SELECT '-789'
             FROM dual
            UNION 
           SELECT '159-'
             FROM dual
            UNION 
           SELECT '-1-'
             FROM dual
SELECT t.col1
  FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.' 
             FROM dual
            UNION
           SELECT '.0' 
             FROM dual
            UNION
           SELECT '0.0' 
             FROM dual
            UNION
           SELECT '-1.0' 
             FROM dual
            UNION
           SELECT '.1-' 
             FROM dual
            UNION
           SELECT '.' 
             FROM dual
            UNION
           SELECT '-1.1-' 
             FROM dual
SELECT t.*
  FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Similar Messages

  • Introduction to regular expressions ... continued.

    After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
    Having fun with regular expressions - Part 2
    Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
    CREATE OR REPLACE PACKAGE regex_utils AS
      -- Regular Expression Utilities
      -- Version 0.1
      TYPE t_outrec IS RECORD(
        data VARCHAR2(255)
      TYPE t_outtab IS TABLE OF t_outrec;
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED;
    END regex_utils;
    CREATE OR REPLACE PACKAGE BODY regex_utils AS
    -- FUNCTION gen_data returns a collection of generated varchar2 elements
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED
      IS
        TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
        v_counter t_counter;
        v_exit    BOOLEAN;
        v_string  VARCHAR2(255);
        v_outrec  t_outrec;
      BEGIN
        FOR max_length IN 1..p_length 
        LOOP
          -- init counter loop
          FOR i IN 1..max_length
          LOOP
            v_counter(i) := 1;
          END LOOP;
          -- start data generation loop
          v_exit := FALSE;
          WHILE NOT v_exit
          LOOP
            -- start generation
            v_string := '';
            FOR i IN 1..max_length
            LOOP
              v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
            END LOOP;
            -- set outgoing record
            v_outrec.data := v_string;
            -- now pipe the result
            PIPE ROW(v_outrec);
            -- increment loop
            <<inc_loop>>
            FOR i IN REVERSE 1..max_length
            LOOP
              v_counter(i) := v_counter(i) + 1;     
              IF v_counter(i) > LENGTH(p_charset) THEN
                 IF i > 1 THEN
                    v_counter(i) := 1;
                 ELSE
                    v_exit := TRUE;  
                 END IF;
              ELSE
                 -- no further processing required
                 EXIT inc_loop;  
              END IF;  
            END LOOP;        
          END LOOP; 
        END LOOP; 
      END gen_data;
    END regex_utils;
    /This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
    SELECT *
      FROM (SELECT data col1
              FROM TABLE(regex_utils.gen_data('+-.0', 5))
           ) t
    WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                                         '^([+-]?[^+-]+|[^+-]+[+-]?)$'
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
    ;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
    Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
      FROM dual
      ;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
    REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
    A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
    Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions', '( )\1*|.', '\1')))
      FROM dual
      ;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
    Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
    Let's compare our solutions than, shall we?
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions.  !',
                                     '([a-z])+|.', '\1', 1, 0, 'i')), 0)
      FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
    Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
    Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
    Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
    WITH t AS (SELECT 'Where is the last space?' col1
                 FROM dual)
    SELECT INSTR(col1, ' ', -1)
      FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
    SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')                       
      FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
    One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
    SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
      FROM dual
      ;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
    You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
    SELECT REGEXP_REPLACE('10~20~30~40~50',
                          '^(.*)~(.*)~(.*)~(.*)~(.*)$',
                          '\5~\4~\3~\2~\1'
      FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
    Continued in Introduction to regular expressions ... last part..
    C.
    Fixed some typos and a flawed example ...
    cd

    Thank you very much C. Awaiting other parts.... keep going.
    One german typo :-)
    I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
    tnsnames voor EDWH:
    PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
                                                   (HOST=blah.blah.blah.com)
                                                   (PORT=5227)))
               (CONNECT_DATA=(SID=pcescrd1)))
    db user: BW_I2_VIEWER  / BW_I2_VIEWER_SCRD1Had to look for translators.
    Cheers
    Sarma.

  • Introduction to regular expressions ... last part.

    Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
    Having fun with regular expressions - Part 3
    In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
    ;I can even use strings composed of values like 'abc, xyz ,  012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
    Ready to try your own solution?
    Does it look like this?
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz ,  012', ' *, *', '|') || ')$')
    ;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
    ; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
    SELECT data
      FROM (SELECT data, LOWER(DATA) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search LIKE 'abc%'
        OR search LIKE 'xyz%'
        OR search LIKE '012%'
    SELECT data
      FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search IN ('abc', 'xyz', '012')
    ;  I'll leave it to your imagination how a complete non regular example with 'abc, xyz ,  012' as search condition would look like.
    As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
    WITH t AS (SELECT '123-4567' phone
                 FROM dual
                UNION
               SELECT '01 345678'
                 FROM dual
                UNION
               SELECT '7 87 8787'
                 FROM dual
    SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
      FROM t
    ;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
    Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
    First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
    Single digit values: 0-9
    Double digit values: 00-99
    Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
    So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
    SELECT data
      FROM TABLE(regex_utils.gen_data('0123456789', 3))
    WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$') 
    ; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
    Using first my brute force generator, I'll check if I've missed any possible combination:
    SELECT data
      FROM TABLE(regex_utils.gen_data('03.', 15))
    WHERE REGEXP_LIKE(data,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  Looks good to me. Let's check on some sample data:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
                UNION 
               SELECT '256.128.64.32'
                 FROM dual            
    SELECT t.ip
      FROM t WHERE REGEXP_LIKE(t.ip,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
    Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
      FROM t
    ;  Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                                '[0-9]*([0-9]{3})(\.?)', '\1\2'
      FROM t
    ;  Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
    Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
    "[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
    Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
    Here's mine:
    SELECT data
      FROM TABLE(regex_utils.gen_data('a1@.', 9))
    WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
    How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
    Does it look like this?
    WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
                 FROM dual
                UNION
               SELECT 'http://x/'
                 FROM dual
    SELECT t.URL
      FROM t
    WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
    Update: Improvements in 10g2
    All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
    Rewriting my example from the first lesson, the WHERE clause would look like this:
    WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
    '^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
    Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
      FROM dual
      ;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
    Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
      FROM dual
      ;This is old style, "greedy" search pattern, returning everything until the last space.
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
      FROM dual
      ;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
      FROM dual
      ;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
    SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
      FROM dual
      ;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
    I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
    C.
    Fixed some typos ...
    Message was edited by:
    cd
    Included 10g2 features
    Message was edited by:
    cd

    Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
    but for better understanding what I'd like to get
    next example:
    a have strings of two blocks separated by space.
    in the first block 5 symbols of [01] in the second block 3 symbols of [01].
    In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
    The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
    with t as (select '10(!)010 10(>)1' num from dual union all
    select '1112(!)0 111' from dual union all --incorrect because of '2'
    select '(!)10010 011' from dual union all
    select '10010(!) 101' from dual union all
    select '10010 100(>)' from dual union all
    select '13001 110' from dual union all -- incorrect because of '3'
    select '100!01 100' from dual union all --incorrect because of ! without (!)
    select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
    select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
    select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
    select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
    select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
    --end of test data

  • Introduction to regular expressions part4

    from Introduction to regular expressions ... last part.
    Hi cd_2.
    You has not introduced 11gR1 regex new features.
    Therefore I will introduce it B-)
    RegExp_Count
    http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions145.htm
    Since Oracle11gR1 There is new function RegExp_Count
    RegExp_Count counts how many strings which has match pattern.
    sampleSQL
    select
    RegExp_Count('abc','[a-c]') as cnt1,
    RegExp_Count('abc','[ac]')  as cnt2,
    RegExp_Count('abc','[0-9]') as cnt3
    from dual;
    cnt1  cnt2  cnt3
       3     2     0***********************************************************************************
    6th parameter of RegExp_SubStr
    http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions148.htm
    Since Oracle11gR1 There is new 6th parameter at RegExp_SubStr.
    This 6th parameter can emulate regex Lookahead and lookbehind.
    but This can emulate easy case only.
    for ex (?=.*abc)ghi can emulate.
    But (?=.*abc)(?=.*def)ghi cannot emulate.
    sampleSQL
    select
    RegExp_Substr('abc1def2','([a-z])[0-9]',1,1,null,1)
    as "emulate [a-z](?=[0-9])",
    RegExp_Substr('1def2','[a-z]([0-9])',1,1,null,1)
    as "emulate (?<=[a-z])[0-9]"
    from dual;
    e  e
    c  2

    This function has been known to the community for years. It has been presented at conferences by Tom Kyte, discussed in Oracle Magazine, and demonstrated in Morgan's Library on the Regular Expressions page for several years at least.
    http://www.morganslibrary.org/reference/regexp.html
    http://www.morganslibrary.org/reference/builtin_functions.html
    It is a little late to be introducing it. ;)

  • Query help in regular expression

    Hi all,
    SELECT * FROM emp11
    WHERE INSTR(ENAME,'A',1,2) >0;
    Please let me know the equivalent query using regular expressions.
    i have tried this after going through oracle regular expressions documentation.
    SELECT * FROM emp11
    WHERE regexp_LIKE(ename,'A{2}')
    Any help in this regard would be highly appreciated .
       Thanks,
    P Prakash

    please go here
    Introduction to regular expressions ...
    Thanks,
    P Prakash

  • Regular expressions in 10g

    Can any body provide good Oracle 10g regular expression tutorial with example link ?

    Here below links from nice post entries from cd :
    Introduction to regular expressions ...
    Introduction to regular expressions ... continued.
    Nicolas.

  • Oracle regular expressions REGEXP_SUBSTR

    Hi,
    I'm relatively new here and this is might be a kind of silly.
    Start using reg expressiona and do not know how to get the second pattern from the end (7 in this case)?
    select REGEXP_SUBSTR('1/2/3/4/5/6/7/8' ,'[^/]+$',1, 1),
    REGEXP_SUBSTR('1/2/3/4/5/6/7/8' ,'[^/]+$',1, 2),
    REGEXP_SUBSTR('1/2/3/4/5/6/7/8' ,'[^/]+$')
    from dual;
    Please help.
    Edited by: lsy_nn on Jul 21, 2010 1:51 PM

    RegExp_Replace is useful ;-)
    Let us read these threads.
    I have created part4 :8}
    Introduction to regular expressions part1 to part4
    Introduction to regular expressions ...
    Introduction to regular expressions ... continued.
    Introduction to regular expressions ... last part.
    Introduction to regular expressions part4
    col extStr for a10
    select
    RegExp_Replace('1/2/3/4/5/6/7/8',
                   '^.*([^/]+)/.*$',
                   '\1') as extStr
    from dual;
    EXTSTR
    7

  • Regular Expressions in Oracle

    Hello All,
    I come from Perl scripting language background. Perl's regular expressions are rock solid, robust and very fast.
    Now I am planning to master Regular Expressions in Oracle.
    Could someone please point the correct place to start with it like official Oracle documentation on Regular Expressions, any good book on Regex or may be any online link etc.
    Cheers,
    Parag
    Edited by: Parag Kalra on Dec 19, 2009 11:03 AM

    Hi, Parag,
    Look under [R in the index of the SQL language manual|http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/index.htm#R]. All the regular expression functions and operators start with "REGEXP", and there are a couple of entries under "regular expressions".
    That applies to the Oracle 11 and 10.2 documentation. Regular expressions were hidden in the Oracle 10.1 SQL Language manual; you had to look up some similar function (like REGR_SYY, itself hidden under S for "SQL Functions", and then step through the pages one at a time.
    Sorry, I don't know a good tutorial or introduction.
    If you find something hopeful, please post a reference here. I think a lot of people would be interrested.

  • FM9 SDL Authoring Assitant Regular Expression Syntax?

    I'm trying to trick SDL into identifying words that are not approved by STE.
    Under "Configure|Style and Linguistic Checks|User Defined Rules" the program allows regular expressions to create custom rules.
    I have all other options in the Utility unchecked.
    I am by no means a pro at regular expressions but was able to create a pretty solid command at http://regexlib.com/RETester.aspx.
    The idea is to create an expression that looks for any word other than those seperated by vertical bars.
    For the test text "this is not the way that should work. this is not the way that should work."
    \b(?:(?!should|not|way|this|is|that).)+
    returns: the work the work
    At that website, I can change the excluded words and it works every time. Change the test text, same thing, still works.
    Perfect! I ripped every approved word in STE into the formula and it (SDL) only returns words at the end of the sentence that are followed by a periods and question marks. So I added"\." to the exclusion list in the expression and it only found words next to question marks. I excluded question marks and now it finds nothing. I don't understand this as I wasn't aware that I had any criteria in the expression that dictates functionality only at the end of the sentence.
    I have an O'reilly book to refer to, if anyone can give me a shove in the right direction as to which set of rules to adhere to, I would appreciate it. Why did negative word matching have to be my introduction to this subject?

    I tried your expression in a couple of regex tools and it seems to parse as you wanted it to. I suspect that the SDL implementation doesn't follow the unix/linux standards. I haven't used the tool and the usage documentation is non-existant, except for the limited flash-based demo.
    From the SDL knowledgebase, it states that their regex filter uses the .NET regex flavour and I believe that the differences on this are explained in the "Mastering Regular Expressions" book.

  • Regular Expression for replacing Leading Spaces

    I don't claim to be any expert on Regular Expressions and even after reading CD's introduction to Reg Exp, I still can't figure out this one which I'm sure must be very basic.
    I want to replace all the leading spaces in a string with "." chrs. I could do this using the common replace/substr/instr functions, but I reckoned it would be possible in a single regular regexp_replace call.
    So far I've got this...
    SQL> select regexp_replace('      FRED BLOGS    WAS HERE    ', '^([:space:])*', '.')
      2  as result
      3  from dual;
    RESULT
    .      FRED BLOGS    WAS HERE
    SQL>Which is replacing the start of line with a "." and not the spaces.
    But I want my result to be:-
    RESULT
    ......FRED BLOGS    WAS HERE
    SQL>Cheers

    That was very good solution .
    Can you explain me the significance of "| " in the code, other things I could trace out.
    I try to run the code with the 2 cases
    when I give a space after | symbol it prints the * many times
    SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)|','*\1')
    2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
    COL1
    REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
    FRED BLOGS WAS HERE
    *FRED BLOGS    WAS HERE
    SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)| ','*\1')
    2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
    COL1
    REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
    FRED BLOGS WAS HERE
    ******FRED BLOGS WAS HERE

  • The regular expression.

    Hello all, I have need to learn the regular expression for teh Oracle database. Can
    anyones tell to me where I find the informations - good sites that give the secrets ;)
    I have already look but would like some advices. Nóra

    Nóra wrote:
    Hello all, I have need to learn the regular expression for teh Oracle database. Can
    anyones tell to me where I find the informations - good sites that give the secrets ;)
    I have already look but would like some advices. NóraHi Nóra,
    I have been looking at this recently myself and I have a few
    sites bookmarked. Enjoy the intricacies of regexen - with any
    luck, you won't even have to learn any PL/SQL ;)
    Good introdution
    http://www.zytrax.com/tech/web/regex.htm
    Simple summary of Oracle's regular expressions
    http://www.regular-expressions.info/oracle.html
    Not forgetting the Wiki (contains table with vi
    equivalents - for those of us who use a real editor ;)
    http://en.wikipedia.org/wiki/Regular_expression
    This site should be one of your main ports of call for any Oracle issues.
    Good forums also - better software than Oracle themselves.
    http://www.orafaq.com/node/2404
    Morgan's Library - used to be psoug.org - don't know what happened - anyone?
    Daniel posts here from time to time
    http://www.morganslibrary.org/reference/regexp.html
    And,not forgetting the Oracle docco.
    http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm
    http://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm
    Also look on OTN for a .pdf for introduction.HTH,
    Paul...

  • Regular expression for DBCC

    Hi Everyone,
    I need help with regular expression for following scenario .
    If any of the DBCC command is run against sql server i want it to be captured . Something like if it starts with DBCC it should be captured .
    For instance following is expression for
    DBCC CheckDB : part="dbcc checkdb", rgxp="(\s|;|^)dbcc\scheckdb\s?\((((to[a-z0-9_\$\#\.\@])|(t[abcdefghijklmnpqrstuvwxyz0-9_\$\#\.\@])|([abcdefghijklmnopqrsuvwxyz]))[a-z0-9_\$\#\.\@]*)\s?\)(\s|;|$)"
    But instead of writing expression for all DBCC commands is there any way to write a regular expression say if  DBCC i want it to be captured .
    Thanks
    Suhas Vallala

    Hi Suhas,
    As Olaf said, regular expressions is not supported in SQL Server. To trace the DBCC event, you can use the SQL Sever Profiler, just tick the Security Audit--> Audit DBCC Event. You can set up a trace in the profiler by following the below tutorial.
    SQL SERVER – Introduction to SQL Server 2008 Profiler
    If you have any question, feel free to let me know.
    Eric Zhang
    TechNet Community Support

  • Logical AND in Java Regular Expressions

    I'm trying to implement logical AND using Java Regular Expressions.
    I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
    Any ideas, how to do it ?
    Baken

    First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
    Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

  • Help in Regular expression

    Hello..
    I wanted to write a regular expression to match the foll string..
    <!--endclickprintexclude--><!--startclickprintexclude--> <!--endclickprintexclude-->
    <p> <b>NEW ORLEANS, Louisiana (CNN) </b>
    -- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi, residents say much of America has forgotten their plight.
    </p> <!--startclickprintexclude-->
    I tried doing..
    Matcher matcher= Pattern.compile("<!--endclickprintexclude--> <p><b>([^<^>]+?)</p><!--startclickprintexclude-->", Pattern.CASE_INSENSITIVE).matcher(story);
    Its not working...
    is there any other soln?

    Theres probably a better way to do this but here's a way that works.
    import java.util.regex.*;
    public class RegexTester{
    public static void main(String[] args){
         String text =
         "<!--endclickprintexclude--><!--startclickprintexclude--> <!--endclickprintexclude-->" +
         "<p> <b>NEW ORLEANS, Louisiana (CNN) </b>" +
         "-- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi," +
         "residents say much of America has forgotten their plight." +
         "</p> <!--startclickprintexclude-->";
         String regex = ">((?:\\s*[\\S&&[^<>]]+\\s*)*?)<";
         Pattern p = Pattern.compile(regex);
         Matcher m = p.matcher(text);
         while(m.find()){
         System.out.println("Match: '" + m.group(1) + "'");
    }

  • Help in regular expression matching

    I have three expressions like
    1) [(y2009)(y2011)]
    2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
    3) [(y2009M1d20)(y2011M12d31)]
    i want regular expression pattern for the above three expressions
    I am using :
    REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
    but its giving results for all above expressions while i want different expression for each.
    i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
    Thanks

    I dont get your question Can you post your desired output? and also give some sample data.
    Please consider the following when you post a question.
    1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
    You can use the following query and do a copy past of the output.
    select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
    that are asked the answer is already there.
    3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
    I have the following table called sales
    with sales
    as
          select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
          union all
          select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
    select *
      from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
    For example in the above sales table, I want to know the total quantity and number of invoice for each product.
    The output should look like this
    Prod_id   sum_qty   count_inv
    1         145       2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
    6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
    Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
    use the {noformat}{noformat} tags.
    The usage of the tag is like this.
    <place your code here>\
    7. If you are posting a *Performance Related Question*. Please read
       {thread:id=501834} and {thread:id=863295}.
       Following those guide will be very helpful.
    8. Please keep in mind that this is a public forum. Here No question is URGENT.
       So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Maybe you are looking for

  • BPM Scenario how to wait til file deleted in Target Directory

    Hi Frnds, I am working on one JMS to File scenario,my requirement is i am sending JMS message to File Directory,when i placed meg in target directory some vent wil read the message it wil process in to SAP. When i receive once again mesage in JMS sce

  • CS4 projector

    I have a project created on a PC in CS4 that is published as a projector file which will be distributed on CD. Works very well with player 10's hardware acceleration. I need to write it out for a Mac as well, so I created the .app in the publish opti

  • Back Ground job errors.

    Hi, we ve run the backgrnd job but it is giving errors that job is for generating orders for prices the orders were able to be generated in the foreground but all have list price validation errors eg like 2006416602. Can anyone tell me how to check 

  • 8.02 upgrade froze my phone

    I downloaded the 8.02 last night and its now swapping through a blank blue screen and the apple screen on my iphone 5S, how do I fix this?

  • Kernel doesn't support ppp

    This is the message I get when using 'pppd call MyInternetProvider' Cannot find this as a module. Installed v 0.6, kernel 2.6.3 IDE I am using 'Peer-to-Peer Dial Up on ArchLinux HOWTO version 3' by Rousian Solomakhin  which is posted to the forum Pac