Regular Expressions - unsetting greedy possible?

Hi,
I'm currently working on a parser and got some problems with the regular expressions in ABAP.
Lets say I want to calculate (22)*(33).
The RegExp \(.+\) finds everything between brackets - the problem is, that the engine finds everything between the first opening and the last closing bracket (actually it should find the first opening and the first closing bracket).
Is there a way to tell the engine to work ungreedy?
Thanks for your help
Chris

Hi Prashant,
unfortunately this won't work either.
I'd better give some more information on the topic to increase understanding.
In order to calculate this string mathmatically I created a function working recursively. It calculates the (math)value of a string.
So lets say we want to calculate (22)*(33), the function is supposed to work this way:
math ( "(22)*(33)")
-> math ("2+2") (Calculating and returning value: 4)
The formula now is "4*(3+3)"
-> math("3+3") (Calculating and returning value: 6)
The formula now is "4*6"
-> math("4*6") (calculating and returning value 24)
Thus ABAP does not know ungreedy searches in regular expressions, the function would work this way:
math ( "(22)*(33)")
->math( "22)*(33" ) (using the wrong brackets...)
... leading to a math error.
Your solution, Prashant, would work for the first recursive call. Then, the formula would be "(22)*(32)" again.
Thanks though
Regards
Christian

Similar Messages

  • Regular expression: how to match "[somestuff]"?

    I have a problem with the following code.
    I meant to catch "[fm1,-]". But I got "[fm1,-] funder [fm2,-] of our country. [sn8,s-]" instead.
    import java.util.*;
    import java.util.regex.*;
    public class regPractice {
    public static void main(String[] args) {
    String s="<TITLE Getting to Know> I hope suitabe [fm1,-] funder [fm2,-] of our country. [sn8,s-]";
    Pattern p=Pattern.compile("\\[(.*)\\]");
    Matcher m=p.matcher(s);
    if (m.find() ){
    System.out.println(m.group(0) ) ;
    }else{
    System.out.println("Nothing");
    }

    Regular expressions are greedy - that is (.*) will grab as much as it
    possibly can before a ]. Hence you see what you see.
    What you want is a reluctant quantifier, in this case (.*?)
    These grab as little as they possibly can. The parentheses are also
    not needed in your example, but you may want group(1) for some
    other reason.
    So we end up with:import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    public class regPractice {
        public static void main(String[] args) {
            String s = "<TITLE Getting to Know> I hope suitabe [fm1,-] funder [fm2,-] of our country. [sn8,s-]";
            Pattern p = Pattern.compile("\\[(.*?)\\]");
            Matcher m = p.matcher(s);
            if (m.find()) {
                System.out.println(m.group(0));
            } else {
                System.out.println("Nothing");
    }which gives the desired output.
    The different types of quantifier are described here:
    http://java.sun.com/docs/books/tutorial/extra/regex/quant.html

  • Regular expressions in Rename Wizard Utility

    Hi guys,
    Does anyone know how to use regular expressions (and it possible) in OBIEE's internal "Rename Wizard" utility. For example, I have the following columns:
    % Change Variance ColumnName1
    % Change Variance ColumnName2
    % Change Variance ColumnName3
    and I want them to be
    Variance ColumnName1 % Change
    Variance ColumnName2 % Change
    Variance ColumnName3 % Change
    Is there another way to do this except than by hand? Your help is appreciated.

    user447618,
    Regular Expressions were first introduced in SQL and PL/SQL as of Oracle Database 10g. What version of the database are you running?
    Sergio

  • Regular Expression he!!  =)

    Okay, the regexp needs to dig through an HTML file and print out all the links. Here's what I've got:
    Pattern p = Pattern.compile("a\\shref=\"(.)+\"", Pattern.MULTILINE);
    Matcher m = p.matcher(fileData);
    while (m.find()) {
    System.out.println(m.group() + "\n");
    Yes, I know the regexp I'm using isn't very good to find links, but I'm starting simple. What (I think) the above should match, is "a", followed by a space, followed by "href=", followed by a quote, any text, and another quote.
    I have the text:
    This is a "test"
    What gets printed off for this text is:
    a href="test.html">This is a "test"
    Instead of quitting after the first " it finds, it continues for more. It's not always the third quote either, sometimes it's 5, etc.
    I have no idea why this is occurring. It's either a bad regexp or I'm not utilizing the Java language properly.
    Any help would be greatly appreciated!

    You think you are searching for a (") followed by (some_text) followed immediately by a (").
    But that's not what you are doing...
    Java's Regular expressions are "greedy" by default. This means the (+) operator will take characters until it cannot take anymore. In your case, it is taking all the characturs until the end of your string. Then it looks for a ("), but it cannot find one, because it ate up all the characters. It's at the end of your string. This is because you used a (.), which means any character!!
    Now it is going to back up one character at a time until it finds a (") it can use. Well, it only has to back up one time, because there is a (") at the end of your string. So it takes that (") as it's last part of the match.
    This isn't what you intended...
    You wanted it to start at a quote and keep going until it finds the very next quote. The pattern you should use is something akin to the following:
    a\\s+href=\"([^\"])+\"Now it is saying: start at the first (") and eat everything that is not a ("). Then grab the next (") it finds.
    Note the stuff in the parentheses. I used ([^\"]), which means "anything that is not a ("). This is different from what you had. (.) means "anything at all...including a quote".

  • Regular expression - Replace not like

    Hi All,
    Is there a regexp pattern to replace anything other than 'oracle' from the below string.
    "oracle sdsd oracle xyd fgh oracle idmdh asasas trtrt"
    The result will be "oracleoracleoracle"
    If I want to write like regexp_replace('oracle sdsd oracle xyd fgh oracle idmdh asasas trtrt',<pattern>), what should be the pattern?
    I know how to get the result by nesting regexp and other functions.
    But is there any single pattern for this?
    Thanks in advance.
    Note; This is not a business requirement, trying to learn regexp..

    884476 wrote:
    Could you please explain what does that pattern mean? We can look at your string as a sequence of substrings where each substring is set of characters (part A) followed by word oracle or, in last substring, by end-of-line (part B). Each of such substrings be want to replace with part B thus removing part A.
    Dot (.) means any character. Asterisk (*) means repeated 0 or more times. This is our part A (I'll get back to ? later). Pipe (|) means OR. Dollar sigh ($) means end-of-line. Parenthesis mean grouping. So ((oracle)|$) means string oracle or end-of line. This is our part B. Now back to question mark. By default Oracle regular expressions are "greedy". We say any character repeated any number of times followed by word oracle. Since word oracle itself matched definition of any character repeated any number of times (6 in this case) regexp will match it that way that it will be from the beginning of the string to last occurrence of word oracle - that's why it is called "greedy". Question mark (?) tells regexp not to use greedy matching, therefore '.*?oracle' will stop at first occurrence of word oracle. Now replacement string. Notation \1 is grouping backreference. Group 1 is ((oracle)|$) and, as I already noted, means string oracle or end-of line (whichever was found).
    SY.

  • Non-greedy regular expression search/replace

    i want to be able to do non-greedy regular expressions as
    dreamweaver defaults to greedy and eats all of the match it can
    match.
    Is there an add-on to add a checkbox or soemthing in the
    Search and Replace Dialog for non-greedyness?
    i guess i could throw something in there to match some simple
    expressions like...
    text to search n' replace:
    $Query .= ", dateReceived='".addslashes($dateReceived)."'" .
    ", dateReleased='".addslashes($dateReleased)."'";
    Instead of using this now
    Find: '"\.addslashes\(\$([^\)]*)\)\."'
    Replace: ". dateInsert($$1) ."
    i want to just use this wihout it getting greedy with the .*
    Find: '"\.addslashes\(\$(.*)\)\."'
    Replace: ". dateInsert($$1) ."

    In regular expression, if you want the .* to not be greedy
    you add a '?' after so yo you have '.*?'
    Find: '".addslashes($dateReleased)."'
    Hope that help,
    Chris
    Adobe Dreamweaver Engineering

  • Replace replace with regular expression possible?

    Hi,
    Just another regular expression question!
    I want to replace the words "AND", "OR", "NOT" in a sentence with the symbols '&', '|', '~' respectively.
    Obviously very simple to do with 3 replace statements, just wonder if I can use regexp_replace to achieve it?
    Cheers

    Hi,
    Metacharacter symbols can be used in place of regular expressions.
    Here's an implementation scenario:
    http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
    Hope it helps.
    Regards,
    Naveed.

  • Regular expressions and capture groups

    Hi everyone :)
    Is there a way to override the default behaviour of capture groups in regular expressions? More specifically I want to override this:
    "The captured input associated with a group is always the subsequence that the group most recently matched."
    For example, if I have a string that is this:
    * <comment one>
    * <comment two>
    <some text>
    I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)" which will match multi-line comments. I have also specified the flag DOTALL so that the predefined character class '.' matches over line-breaks.
    If I apply this pattern to the above string I get comment two being captured, not comment one. This is because of the stipulation that I cited above.
    I need to be able to capture only the first match, and prevent the capture group from being overwritten by more recent matches.
    Is this possible? Any ideas?
    Thanks in advance.
    Kind regards,
    Ben Deany

    Is there a way to override the default behaviour of
    capture groups in regular expressions? More
    specifically I want to override this:No, but you don't need to.
    I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)"
    which will match multi-line comments.Comment two is captured by the second group because comment one is eaten by the first group. Use the reluctant quantifier "*?" on the . in the first group instead of the greedy quantifier "*" to get what is apparently the behavior you want. Then the first group will contain nothing, the second group will contain comments one and two, and the third group will contain the following text.
    .* is a very powerful thing to use. It will match everything in its path, guzzling text like moonshine at Mardi Gras. The only reason it doesn't match comment two as well is because then the expression as a whole would not match.
    The parentheses surrounding the first and third groups are not needed (unless you want to use group methods on them too).

  • Regular Expressions in CS5.5 - something is wrong

    Hello Everybody,
    Please correct me, but I think, I found a serious problem with regular Expressions in Indesign CS5.5 (and possibly in other apps from CS5.5).
    Let's start with simple example:
    var range = "a-a,a,a-a,a";
    var regEx = /(a+-a+|a+)(,(a+-a+|a+))*/;
    alert( "Match:" +regEx.test(range)+"\nLeftContext: "+RegExp.leftContext+"\nRightContext: "+RegExp.rightContext );
    What I expected was true match and the left  and the right context should be empty. In Indesign CS3 that is correct BUT NOT in CS5.5.
    In CS 5.5 it seems that the only first "a-a" is matched and the rest is return as the rightContext - looks like big change (if not parsing error in RegExp engine).
    Please correct me if I am wrong.
    The second example - how to freeze ID CS5.5:
    var range = "a-a,a,a-a,a";
    var regEx = /(a+-a+|a+)(,(a+-a+|a+)){8,}/;
    alert( "Match:" +regEx.test(range)+"\nLeftContext: "+RegExp.leftContext+"\nRightContext: "+RegExp.rightContext );
    As you can see it differs only with the {8,} part instead of *
    Run it in CS5.5 and you will see that the ID hangs (in CS3 of course it runs flawlessly}.
    The third example - how to freeze ID 5.5 in one line (I posted it earlier in Photoshop forum because similiar problem was called earlier):
    alert((/(n|s)? /gmi).test('s') );
    As you can guess - it freezes the CS5.5 (CS3 passes the test).
    Please correct me if I am doing something wrong or it's the problem of Adobe.
    Best regards,
    Daniel Brylak

    Hi Daniel,
    Thanks for sharing. Really annoying indeed.
    Just to complete your diagnosis, what you describe about CS.5 is the same in CS5, while CS4 behaves as CS3.
    var range = "aaaaa";
    var regEx = /(a+-a+|a+)(,(a+-a+|a+))*/;
    alert([
        "Match:" +regEx.test(range),
        "LeftContext: "+RegExp.leftContext.toSource(),        // => CS3/4: EMPTY -- CS5+: EMPTY
        "RightContext: "+RegExp.rightContext.toSource()        // => CS3/4: EMPTY -- CS5+: ",a,a-a,a"
        ].join('\r'));
    So there is a serious implementation problem of the RegExp object from ExtendScript CS5.
    I don't think it's related to the greedy modes. By default, JS RegExp quantifiers are greedy, and /a*/ still entirely captures "aaaaaa" in CS5+.
    By the way, you can make any quantifier non-greedy by adding ? after the quantifier, e.g.: /a*?/, /a+?/, etc.
    I guess that Adobe ExtendScript has a generic issue in updating the RegExp.lastIndex property in certain contexts—see http://forums.adobe.com/message/3719879#3719879 —which could explain several bugs such as the Negative Class bug —see http://forums.adobe.com/message/3510078#3510078 — or the problems you are mentioning today.
    @+
    Marc

  • Regular expressions help

    I'm using a RegExp class (http://www.jurjans.lv/flash/RegExp.html) to do some regular expression in AS2. But I'm not very good at it.
    var str:String="What if there are other variables, such as possible <a class='gloss' href='asfunction:_root.handle, confounding variables'><b>confounding variables</b></a> which could explain at least some of the relationship between the two variables? Here <a href='' target='_blank'>is another link</a>.\n<a class='gloss' href='asfunction:_root.handle, confounding variables'>confounded variables</a>"
    var reg1:RegExp = new RegExp("<a.*gloss.*href=[\'\"]?([^\\\'\">]+)>+(.*</a>)", "ig");
    var obj:Object = reg1.exec(str);
    while (obj != null) {
              for(var a in obj){
                        if(!isNaN(a)){
                        trace(a+": "+obj[a]);
              trace(newline);
              obj = reg1.exec(str);
    And this traces:
    2: <b>confounding variables</b></a>
    1: asfunction:_root.handle, confounding variables'
    0: <a class='gloss' href='asfunction:_root.handle, confounding variables'><b>confounding variables</b></a>
    2: confounded variables</a>
    1: asfunction:_root.handle, confounding variables'
    0: <a class='gloss' href='asfunction:_root.handle, confounding variables'>confounded variables</a>
    I'm trying to get the href and the "friendly link" part of the anchor tag (but only for anchors that have a class of gloss).
    As you can see I'm almost there, but I'm getting the extra </a> and the extra ' on the two examples. I tried putting the ) before the </a> but that just broke it. (Of course that could be because this class doesn't work properly, but I'm guessing that isn't the case.)
    Anybody really good with regular expressions who can help me out?

    Looks like there is a "greedy" bug with the () in that AS2 implementation.
    I also have a problem the expression matching not the next occurance of the closing </a> but the final one.
    Anybody have any ideas of other ways to do this?

  • Introduction to regular expressions ... last part.

    Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
    Having fun with regular expressions - Part 3
    In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
    ;I can even use strings composed of values like 'abc, xyz ,  012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
    Ready to try your own solution?
    Does it look like this?
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz ,  012', ' *, *', '|') || ')$')
    ;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
    ; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
    SELECT data
      FROM (SELECT data, LOWER(DATA) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search LIKE 'abc%'
        OR search LIKE 'xyz%'
        OR search LIKE '012%'
    SELECT data
      FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search IN ('abc', 'xyz', '012')
    ;  I'll leave it to your imagination how a complete non regular example with 'abc, xyz ,  012' as search condition would look like.
    As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
    WITH t AS (SELECT '123-4567' phone
                 FROM dual
                UNION
               SELECT '01 345678'
                 FROM dual
                UNION
               SELECT '7 87 8787'
                 FROM dual
    SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
      FROM t
    ;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
    Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
    First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
    Single digit values: 0-9
    Double digit values: 00-99
    Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
    So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
    SELECT data
      FROM TABLE(regex_utils.gen_data('0123456789', 3))
    WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$') 
    ; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
    Using first my brute force generator, I'll check if I've missed any possible combination:
    SELECT data
      FROM TABLE(regex_utils.gen_data('03.', 15))
    WHERE REGEXP_LIKE(data,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  Looks good to me. Let's check on some sample data:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
                UNION 
               SELECT '256.128.64.32'
                 FROM dual            
    SELECT t.ip
      FROM t WHERE REGEXP_LIKE(t.ip,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
    Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
      FROM t
    ;  Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                                '[0-9]*([0-9]{3})(\.?)', '\1\2'
      FROM t
    ;  Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
    Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
    "[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
    Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
    Here's mine:
    SELECT data
      FROM TABLE(regex_utils.gen_data('a1@.', 9))
    WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
    How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
    Does it look like this?
    WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
                 FROM dual
                UNION
               SELECT 'http://x/'
                 FROM dual
    SELECT t.URL
      FROM t
    WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
    Update: Improvements in 10g2
    All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
    Rewriting my example from the first lesson, the WHERE clause would look like this:
    WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
    '^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
    Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
      FROM dual
      ;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
    Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
      FROM dual
      ;This is old style, "greedy" search pattern, returning everything until the last space.
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
      FROM dual
      ;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
      FROM dual
      ;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
    SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
      FROM dual
      ;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
    I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
    C.
    Fixed some typos ...
    Message was edited by:
    cd
    Included 10g2 features
    Message was edited by:
    cd

    Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
    but for better understanding what I'd like to get
    next example:
    a have strings of two blocks separated by space.
    in the first block 5 symbols of [01] in the second block 3 symbols of [01].
    In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
    The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
    with t as (select '10(!)010 10(>)1' num from dual union all
    select '1112(!)0 111' from dual union all --incorrect because of '2'
    select '(!)10010 011' from dual union all
    select '10010(!) 101' from dual union all
    select '10010 100(>)' from dual union all
    select '13001 110' from dual union all -- incorrect because of '3'
    select '100!01 100' from dual union all --incorrect because of ! without (!)
    select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
    select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
    select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
    select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
    select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
    --end of test data

  • Regular expressions with multi character separator

    I have data like the
    where |`| is the separator for distinguishing two fields of data. I am having trouble writing a regular expression to display the data correctly.
    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    SQL> declare
      2  l_string varchar2 (200) :='123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes';
      3  v varchar2(40);
      4  begin
      5  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 1);
      6  dbms_output.put_line(v);
      7  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 2);
      8  dbms_output.put_line(v);
      9  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 3);
    10  dbms_output.put_line(v);
    11  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 4);
    12  dbms_output.put_line(v);
    13  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 5);
    14  dbms_output.put_line(v);
    15  end;
    16  /
    123
    456
    789 10 here
    223
    5434I need it to display
    123` 456
    789 10 here
    |223
    5434|`}22
    yesI am not sure how to handle multi character separators in data using reg expressions
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:35 PM
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:37 PM

    Hi,
    Actually, using non-greedy matching, you can do what you want with regular expressions:
    VARIABLE     l_string     VARCHAR2 (100)
    EXEC  :l_string := '123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes'
    SELECT     LEVEL
    ,     REPLACE ( REGEXP_SUBSTR ( '|`|' || REPLACE ( :l_string
                                     , '|`|'
                                      , '|`||`|'
                                     ) || '|`|'
                        , '\|`\|.*?\|`\|'
                        , 1
                        , LEVEL
               , '|`|'
               )     AS ITEM
    FROM     dual
    CONNECT BY     LEVEL     <= 7
    ;Output:
    LEVEL ITEM
        1 123` 456
        2 789 10 here
        3 |223
        4 5434|`}22
        5 yes
        6
        7Here's how it works:
    The pattern
    ~.*?~is non-greedy ; it matches the smallest possible string that begins and ends with a '~'. So
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 1) returns '~SHALL~'. However,
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 2) returns '~COMPARE~'. Why not '~I~'? Because the '~' between 'SHALL' and 'I' was part of the 1st pattern, so it can't be part of the 2nd pattern. So the first thing we have to do is double the delimiters; that's what the inner REPLACE does. The we add delimiters to the beginning and end of the list. Once we've done prepared the string like that, we can use the non-greedy REGEXP_SUBSTR to bring back the delimited items, with a delimiter at either end. We don't want those delimiters, so the outer REPLACE removes them.
    I'm not sure this is any better than Sri's solution.

  • Looping through files with Regular expressions ?

    Hi,
    My Question is:
    if i have the following Regular Expression,
    String regrex = "tree\\s\\w{1,4}.+\\s=\\s(.*;)";
    The file in which i am looking for the string has multiple entries, is it
    possible to do another regular expression on the captured group (.*;)
    which is in the original Regular expression ?
    The text that is captured by the RE is of the type "(1,(2,((3,5),4)));" for
    each entry, and different entries in the file have slightly different syntax
    is it possible to loop through the file and first of all check for the presence
    of the original RE in each entry of the file
    and then secondly, check for the presence of another RE on the captured group?
    [ e.g. to check for something like, if the captured group has a 1 followed by a 3
    followed by a 5 followed by a and so on ].
    Thanks Very much for any help, i've been struggling with this for a while!!
    Much appreciated
    The code that i have so far is as follows:
    import java.util.*;
    import java.util.regex.*;
    import java.io.*;
    import java.lang.*;
    import javax.swing.*;
    public class ExpressReg {
    public String Edit;
    public ExpressReg(String editorEx){
    Edit = editorEx; // Edit = JTextArea
    String regrex = "tree\\s\\w{1,4}.+\\s=\\s(.*;)";
    //String regrex1 = "(.*;)";
    Pattern p = Pattern.compile(regrex);
    Matcher m = p.matcher(editorEx); // matcher can have more than one argument!
    boolean result = m.find();
    if(result){                           
    JOptionPane.showMessageDialog(null, "String Present in Editor");
    else if(!result){
    JOptionPane.showMessageDialog(null, "String Not Present In Editor");

    if i have the following Regular Expression,
    String regrex = "tree\\s\\w{1,4}.+\\s=\\s(.*;)";
    The file in which i am looking for the string has multiple entries, is it
    possible to do another regular expression on the captured group (.*;)
    which is in the original Regular expression ?Yes, the capturing group is $1 (the only one) referenced in source code as m.group(1).
    m.group() will return entire matching.
    simply use this way:
    String result = m.group(1);
    // your stuff: could be another validation
    The text that is captured by the RE is of the type "(1,(2,((3,5),4)));" for
    each entry, and different entries in the file have slightly different syntax
    is it possible to loop through the file and first of all check for the presence
    of the original RE in each entry of the file
    and then secondly, check for the presence of another RE on the captured group?Again "Yes", no limits!
    Don't need to create another Matcher, just use m.reset(anotherSourceString)..loop using the same pattern.
    Note: Take care with ".*" because regex nature is "greedy", be more specific, eg.: "\\d" just matches digits (0-9).
    Can you give us some sample of "slight difference" ?

  • Regular expressions... they are not regular! =)

    So,
    I've been pulling my hair out with regular expressions. I'm sure there is a logical explanation to this, but i've read a bunch of explanations and i THOUGHT i understood this, but i don't. Here goes:
    I have a string "2010PETE". I tried matching it to "\\d{1,}" (this is how i entered it in Java). This returns FALSE. HOWEVER, it seems to me the above should be TRUE because it says that a greedy quantifier with {1,} searches for the the preceding character AT LEAST N times, where in this case n=1, so i interpret this as "If a digit (\\d) is found at least once within the string, then this string matches the regular expression. This does NOT seem to be the case.
    Can someone clear this up for me?

    THANK YOU. i think that is what i was missing, the part about
    "would only match if the input consisted of at least one digit, possibly multiple digits, and nothing else."
    I read the documentation and some of it didn't seem to be clear on that point.
    i'll play around with this and see how far i can get. if i still have questions i will post some code for sure, and try to get a nice, rounded set of examples.
    thanks!
    ONE OTHER QUESTION I JUST THOUGHT OF: does the .matches() method match expressions when some substring of the String matches, or does it have to match the entire String? So, if i have the String "123ABC", and i ask to match "1 or more letters" will it fail because there are non-letters in the String, but then pass if i add "1 or more letters AND 1 or more digits"? so, in the latter every character in the String is accounted for in the search, as opposed to the first. Is that correct, or are there ways to JUST match some substring in the String instead of the whole thing? i WILL make some examples too... but does that make sense?
    Edited by: pedron on Jan 12, 2012 3:23 PM

  • Regular expressions for xml parsing

    I have a xml parsing problem that I have to solve using regular expressions. It's not possible for me to use a different method other than regular expression. But there is a problem that I cannot seem to rap my head around. I want to extract the contents of a tag but the problem is that this tag occurs serveral times in the XML file but I only want the contents of one particular occurence. Basically the problem is as follows;
    I want to extract
    <bp:NAME ***stufff***>(I want this part)</bp:NAME>This tag can occur is serval places. For example here;
    <bp:ORGANISM>
    ***bunch of tags***
    <bp:NAME ***stufff***>***stufff***</bp:NAME>
    ***bunch of tags***
    </bp:ORGANISM>or here;
    <bp:DATABASE>
    ***bunch of tags***
    <bp:NAME ***stufff***>***stufff***</bp:NAME>
    ***bunch of tags***
    </bp:DATABASE>I do not want the content of those tags. I want the content of the <NAME> tag that is not between either the <ORGANISM> tags or the <DATABASE> tags. These tags can be in any order. I for the life of me cannot seem to figure this problem out. I tried several different approaches. For example I tried using the following regex
    (?:<bp:NAME [^>]*>([^<]*).*?<bp:ORGANISM>.*?</bp:ORGANISM>|
    <bp:ORGANISM>.*?</bp:ORGANISM>.*?<bp:NAME [^>]*>([^<]*))This kind of works, the information I want is either in the first captured group or in the second one. So I just check which group is not empty and that is the one I want. But this only works if there is only one other tag containing the name tag (in this particular regular expression that is the organism tag). Since there is another tag (the database tag) I have to work around, and these tags can be in any order, the regular expression then becomes three times as large and then there are six different groups in which the information I want can occur. This does not seem like a good idea to me. There has to be another way to do this. So I tried using the following regex;
    (?:</bp:ORGANISM>)?.*?(?:</bp:DATABASE>)?.*?<bp:NAME [^>]*>([^<]*)I thought this would get rid of any occurences of the other tags in front of the name tag, but it doesn't work either. It seems like it is not greedy enough. Well I think you get the point. I don't know what to try next so I really need some help.
    Here is an example of the type of data I will run into. The tags can be in any order and they do not always have to occur. In the example below the <DATABASE> tag is not part of the data and the name tag I want just happens to be in front of the organism tag but this is not always the case. The name tag I want is the firstname tag in the file, namely;
    <bp:NAME rdf:datatype="xsd:string">Progesterone receptor</bp:NAME>So I don't want the name tag that is in between the organism tags.
    <bp:protein rdf:ID="CPATH-27885">
    &#8722;<bp:COMMENT rdf:datatype="xsd:string">
    Belongs to the nuclear hormone receptor family. NR3 subfamily. SIMILARITY: Contains 1 nuclear receptor DNA-binding domain. WEB RESOURCE: Name=NIEHS-SNPs; URL="http://egp.gs.washington.edu/data/pgr/"; WEB RESOURCE: Name=Wikipedia; Note=Progesterone receptor entry; URL="http://en.wikipedia.org/wiki/Progesterone_receptor"; GENE SYNONYMS: NR3C3. COPYRIGHT:  Protein annotation is derived from the UniProt Consortium (http://www.uniprot.org/).  Distributed under the Creative Commons Attribution-NoDerivs License.
    </bp:COMMENT>
    <bp:SYNONYMS rdf:datatype="xsd:string">Nuclear receptor subfamily 3 group C member 3</bp:SYNONYMS>
    <bp:SYNONYMS rdf:datatype="xsd:string">PR</bp:SYNONYMS>
    <bp:NAME rdf:datatype="xsd:string">Progesterone receptor</bp:NAME>
    &#8722;<bp:ORGANISM>
    &#8722;<bp:bioSource rdf:ID="CPATH-LOCAL-112384">
    <bp:NAME rdf:datatype="xsd:string">Homo sapiens</bp:NAME>
    &#8722;<bp:TAXON-XREF>
    &#8722;<bp:unificationXref rdf:ID="CPATH-LOCAL-112385">
    <bp:DB rdf:datatype="xsd:string">NCBI_TAXONOMY</bp:DB>
    <bp:ID rdf:datatype="xsd:string">9606</bp:ID>
    </bp:unificationXref>
    </bp:TAXON-XREF>
    </bp:bioSource>
    </bp:ORGANISM>
    <bp:SHORT-NAME rdf:datatype="xsd:string">PRGR_HUMAN</bp:SHORT-NAME>
    &#8722;<bp:XREF>
    &#8722;<bp:relationshipXref rdf:ID="CPATH-LOCAL-112386">
    <bp:DB rdf:datatype="xsd:string">ENTREZ_GENE</bp:DB>
    <bp:ID rdf:datatype="xsd:string">5241</bp:ID>
    </bp:relationshipXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:unificationXref rdf:ID="CPATH-LOCAL-112387">
    <bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
    <bp:ID rdf:datatype="xsd:string">P06401</bp:ID>
    </bp:unificationXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:unificationXref rdf:ID="CPATH-LOCAL-112388">
    <bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
    <bp:ID rdf:datatype="xsd:string">A7X8B0</bp:ID>
    </bp:unificationXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:relationshipXref rdf:ID="CPATH-LOCAL-112389">
    <bp:DB rdf:datatype="xsd:string">GENE_SYMBOL</bp:DB>
    <bp:ID rdf:datatype="xsd:string">PGR</bp:ID>
    </bp:relationshipXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:relationshipXref rdf:ID="CPATH-LOCAL-112390">
    <bp:DB rdf:datatype="xsd:string">REF_SEQ</bp:DB>
    <bp:ID rdf:datatype="xsd:string">NP_000917</bp:ID>
    </bp:relationshipXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:unificationXref rdf:ID="CPATH-LOCAL-112391">
    <bp:DB rdf:datatype="xsd:string">UNIPROT</bp:DB>
    <bp:ID rdf:datatype="xsd:string">Q9UPF7</bp:ID>
    </bp:unificationXref>
    </bp:XREF>
    &#8722;<bp:XREF>
    &#8722;<bp:unificationXref rdf:ID="CPATH-LOCAL-113580">
    <bp:DB rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CPATH</bp:DB>
    <bp:ID rdf:datatype="http://www.w3.org/2001/XMLSchema#string">27885</bp:ID>
    </bp:unificationXref>
    </bp:XREF>
    </bp:protein>Edited by: Dani3ll3 on Nov 19, 2009 2:51 AM

    Dani3ll3 wrote:
    Thanks a lot after I did that the regular expression worked. :)Good. But remember that in real life, you would then have to apply the XML rules to get the actual contents of the text node. For example it might be a CDATA section or it might include characters like ampersands which have been escaped and which you need to unescape. That's why it's better to use a proper parser, as already suggested.
    It seems to me this forum is full of posts where people are doing homework questions which teach them to do things the wrong way. But of course there's nothing the student can do about that.

Maybe you are looking for

  • MacBook Pro won't turn on without pressing 'ctrl' & 'cmd' with power button

    As title says. I have no idea why. When I press the power button nothing happens at all - no lights, no fans no video. Pressing the ctrl & cmd buttons at start-up is the only way it will turn on. It's a 2008 2.5 GHz Intel Core 2 Duo running 10.6.8. I

  • IDVD won't burn DVD, keeps showing up with "You Inserted a blank DVD..."

    I keep getting the error during the encoding process for burning. The prompt shows up as if i just inserted a new blank DVD, and gives me an encoding video error. Does anyone know how to prevent this?

  • Atan2

    i have no clue how to use this function. i'm trying to code the math for potential fields for a bot program for a class and the potential field write up i'm basing my math off uses the atan2 because it gives the correct angle of the quadrant. "Find t

  • Problem when starting the SAO instance

    Hello, I am trying to start the sap instance and everytime i am getting the following error: Startup-Log is written to /home/sd1adm/startsap_SCS01.log /usr/sap/SD1/SCS01/exe/sapcontrol -prot NI_HTTP -nr 01 -function Start Instance on host SD1ODYLSV1

  • Pavilion dv6t new hard drive won't boot after OS install

    Hello, I have a HP Pavilion dv6t-3000 and the hard drive went bad and I just ordered a new one with basically the same capabilities. BIOS recognizes it because I can run the hard disk test and because the HP Recovery disc's successfully load and inst