Optional regular expression groups

Hi,
I wonder if you could help me parse some name value pairs with RE?
I have a string of name value pairs and would like to parse out certain parts of certain pairs efficiently.
The string is like:
"name1=value1; name2=value2; FOO=value3:xyz; BAR_123=value4;"
or
The FOO and BAR pairs may or may not exist.
I am trying to extract the 'xyz' and '123' strings into groups.
I have tried umpteen capturing group and non-capturing group possibilities but none seem to either match or give null values.
One try that didn't seem to work at all was:
String regex =
               "\"" +
                    "(BAR[\\d]+=[\\S&&[^;]]+; |" +
                    "-|" +
                    "FOO=[\\S&&[^:]]+:[\\w]+; |" +
                    "[\\S&&[^=]]+=[\\S&&[^;]]+; )+" +
                    "" +
another slightly more successful one that returned nulls in the BAR pair was:
          String regex =
               "\"[\\w\\d\\s;_=]*?" +
               "(?:(?:FOO=[\\S&&[^:]]+:)([\\w]+))?" +
                    "[\\w\\d\\s;_=]*?" +
                    "(?:(?:BAR_)([\\d]*)=)?" +
                    "[\\w\\d\\s;_=]*?" +
I just couldn't seem to get any values out for the 'BAR' nvp....
Thanks,

Let's start by getting some obvious errors out of the way. First, the lookingAt() method returns true iff the regex matches starting at the beginning of the target string. That works okay with this test data (as would the matches() method), but it sounds like you're going to be searching for these quote-delimited sequences within a larger string; for that you need to use find(). Second, in your FOO and BAR alternatives, you're trying to match "FOO" or "BAR" immediately after the opening quote, but there seems to be other text in there that has to be matched first. There's also text following the part you're interested in, but preceding the closing quote, which you should also match. Finally, the pipe (or alternation operator) has lower precedence than any other regex metacharacter, so you need to enclose it in parentheses to limit its effect to just the text within the quotes. Taking your regex as a starting point, this is how that would look: "\"(?:([-])|FOO=\\w+:([^;]+)|BAR_(\\d+))\"" Notice also how I used non-capturing parens so as not to mess up the capturing-group numbers.
Now to the actual problem. The first three cases can be handled with a relatively simple regex (ugly as hell, but conceptually pretty straightforward). "\"(?:(-)|[^\"]*?(?:FOO=\\w+:([^;]+)|BAR_(\\d+))[^\"]*)\"" Plug that into your code and you should see that it yields the correct result for the first three strings, but not the fourth. In fact, it isn't really handling the second and third strings correctly. It's just testing for "X or Y" when it's supposed to be testing for "X and not Y" or "Y and not X" or "X and Y". Testing for the absence of a multicharacter sequence is fairly tricky, especially when you're also testing for the presence of another sequence. However, if you know that the FOO segment will always appear earlier in the string than the BAR segment, it will simplify things a great deal. The regex can take the form "X optionally followed by Y, or just Y": "\"(?:(-)|[^\"]*?(?:FOO=\\w+:([^;]+)(?:[^\"]*?BAR_(\\d+))?|BAR_(\\d+))[^\"]*)\""{code} This regex always captures the FOO value in group(2).  If there is also a BAR value, it will be captured in group(3); otherwise that group will be null.  If there is only a BAR value, you'll find it in group(4).
If you don't know in what order the FOO and BAR segments will appear, the regex will be twice as ugly.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

Similar Messages

  • Get all groups from a regular expression match

    Please help me understand how to use Java regular expressions:
    I have an expression similar to this:
    {noformat}"([^X]+)(X[^X]*)+"{noformat}This should match stuff like "asaasaXdfdfdfXXsdsfd".
    How does one access all the matches for the second group (the second groups has a Kleene operator
    added so it is not really just one group --- but match.groupCount() is always 2)
    Here is roughly the code:
    {noformat}java.util.regex.Pattern pattern = {noformat}{noformat}java.util.regex.Pattern.compile({noformat}{noformat}"([^X]+)(X[^X]*)+",{noformat}{noformat}java.util.regex.Pattern.MULTILINE{noformat}{noformat});{noformat}{noformat}java.util.regex.Matcher matcher = pattern.matcher(text);{noformat}{noformat}matcher.find();{noformat}{noformat}int groupcount = matcher.groupCount();{noformat}
    Also, without matcher.find() I get an illegalStateException .. which I also get if I use matcher.matches() instead
    of matcher.find().
    I am obviously missing something here. There is always at least one "X" in the string so shouldn't that pattern always
    match the whole string? Since there are often multiple X, shouldnt I get a group for each occurrence of X, followed
    by 0 or more other characters?
    {noformat}But when I try to match everything by using "^([^X]+)(X[^X]*)+$" I get an "IllegalStateException: No match available" again.{noformat}
    What is the correct way to do this?
    Edited by: johann_p on May 16, 2008 10:39 AM

    I am sorry I messed this up. Here is a SSCCE:
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class RegExp1 {
        public static void main(String[] args) {
          String testString = "first|aaaa | bbbb\n|cccc|ddddd";
          Pattern pattern = Pattern.compile("^([^|]+)(\\|[^|]*)+$");
          Matcher matcher = pattern.matcher(testString);
          matcher.find();
          int groupcount = matcher.groupCount();
          System.out.println("Found "+groupcount+" groups");
          System.out.println("Matcher: "+matcher);
          for (int i = 1; i <= groupcount; i++) {
            System.out.println("Match "+i+": "+testString.substring(matcher.start(i),matcher.end(i)));
    }I figured out a small bug in my first code that explains some of the exception oddities, but my principal question remains:
    how do I access all the matches that correspond to the second capturing group?
    In the example I would get "first" for Match 1 and "|ddddd" for Match 2, but how do I access all the matches??
    Thank you for your help!

  • Regular expressions and capture groups

    Hi everyone :)
    Is there a way to override the default behaviour of capture groups in regular expressions? More specifically I want to override this:
    "The captured input associated with a group is always the subsequence that the group most recently matched."
    For example, if I have a string that is this:
    * <comment one>
    * <comment two>
    <some text>
    I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)" which will match multi-line comments. I have also specified the flag DOTALL so that the predefined character class '.' matches over line-breaks.
    If I apply this pattern to the above string I get comment two being captured, not comment one. This is because of the stipulation that I cited above.
    I need to be able to capture only the first match, and prevent the capture group from being overwritten by more recent matches.
    Is this possible? Any ideas?
    Thanks in advance.
    Kind regards,
    Ben Deany

    Is there a way to override the default behaviour of
    capture groups in regular expressions? More
    specifically I want to override this:No, but you don't need to.
    I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)"
    which will match multi-line comments.Comment two is captured by the second group because comment one is eaten by the first group. Use the reluctant quantifier "*?" on the . in the first group instead of the greedy quantifier "*" to get what is apparently the behavior you want. Then the first group will contain nothing, the second group will contain comments one and two, and the third group will contain the following text.
    .* is a very powerful thing to use. It will match everything in its path, guzzling text like moonshine at Mardi Gras. The only reason it doesn't match comment two as well is because then the expression as a whole would not match.
    The parentheses surrounding the first and third groups are not needed (unless you want to use group methods on them too).

  • Regular Expression  Multiple grouping

    Dear all
    I want to do a grouping of multiple criterias as expression .
    Single grouping is possible , but when group within a group comes things doesnt work for me.
    User would define the comibination
    For example i have 4 criterias
    C1
    C2
    C3
    C4
    Basic grouping works for example :
    (C1&C2) | (C3&C4)
    Complicated grouping i have no idea how to implement it.
    (((C1&C2) | (C3&C4))&(C1&C2) )
    I need validate if the expression is correct and need to pick up all the groups and do the AND or OR.
    Any help woul dbe appriciated
    Regards
    pacs

    You cannot use regular expressions for this since there can be nested parenthesis which is out of the scope of regex-es.
    Continued here: [http://forums.sun.com/thread.jspa?threadID=5418262]

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Regular Expressions - replace inner group

    I'm trying to do the following:
    1. Given the following string:
    Truth, like tumors, require a cut to be removed.
    2. Grab the whole sentence and the following:
    like tumors
    3. Replace "like tumors" with "like posion" and add one line to the sentence like so:
    Truth, like posion, require a cut to be removed.
    To be free you must endure pain for a season to finally be free from it.
    I can grab the sentence with reg expressions, I can even grab the second search (#2) in a group, I'm just wondering if there is an elegant way to replace both at the same time. I could always replace one then search and replace the next, but that seems redundant since I can grab both the first time around. Any help?

    according to:
    http://javaalmanac.com/egs/java.util.regex/GroupInRep.html?l=find
    // Compile regular expression
    String patternStr = "\\((\\w+)\\)";
    String replaceStr = "<$1>";
    Pattern pattern = Pattern.compile(patternStr);
    // Replace all (\w+) with <$1>
    CharSequence inputStr = "a (b c) d (ef) g";
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replaceStr);
    // a (b c) d <ef> gthis should be exactly what you're looking for.

  • Re: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't forget about the regular expression potential

    Namburi,
    When you said you used the Reg Exp tool, did you use it only as
    preconfigured by the iMT migrate application wizard?
    Because the default configuration of the regular expression tool will only
    target the files in your ND project directories. If you wish to target
    classes outside of the normal directory scope, you have to either modify the
    "Source Directory" property OR create another instance of the regular
    expression tool. See the "Tool" menu in the iMT to create additional tool
    instances which can each be configured to target different sets of files
    using different sets of rules.
    Usually, I utilize 3 different sets of rules files on a given migration:
    spider2jato.xml
    these are the generic conversion rules (but includes the optimized rules for
    ViewBean and Model based code, i.e. these rules do not utilize the
    RequestManager since it is not needed for code running inside the ViewBean
    or Model classes)
    I run these rules against all files.
    See the file download section of this forum for periodic updates to these
    rules.
    nonProjectFileRules.xml
    these include rules that add the necessary
    RequestManager.getRequestContext(). etc prefixes to many of the common
    calls.
    I run these rules against user module and any other classes that do not are
    not ModuleServlet, ContainerView, or Model classes.
    appXRules.xml
    these rules include application specific changes that I discover while
    working on the project. A common thing here is changing import statements
    (since the migration tool moves ND project code into different jato
    packaging structure, you sometime need to adjust imports in non-project
    classes that previously imported ND project specific packages)
    So you see, you are not limited to one set of rules at all. Just be careful
    to keep track of your backups (the regexp tool provides several options in
    its Expert Properties related to back up strategies).
    ----- Original Message -----
    From: <vnamboori@y...>
    Sent: Wednesday, August 08, 2001 6:08 AM
    Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
    forget about the regular expression potential
    Thanks Matt, Mike, Todd
    This is a great input for our migration. Though we used the existing
    Regular Expression Mapping tool, we did not change this to meet our
    own needs as mentioned by Mike.
    We would certainly incorporate this to ease our migration.
    Namburi
    --- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
    All--
    Great response. By the way, the Regular Expression Tool uses thePerl5 RE
    syntax as implemented by Apache OROMatcher. If you're doing lotsof these
    sorts of migration changes manually, you should definitely buy theO'Reilly
    book "Mastering Regular Expressions" and generate some rules toautomate the
    conversion. Although they are definitely confusing at first,regular
    expressions are fairly easy to understand with some documentation,and are
    superbly effective at tackling this kind of migration task.
    Todd
    ----- Original Message -----
    From: "Mike Frisino" <Michael.Frisino@S...>
    Sent: Tuesday, August 07, 2001 5:20 PM
    Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
    forget about the regular expression potential
    Also, (and Matt's document may mention this)
    Please bear in mind that this statement is not totally correct:
    Since the migration tool does not do much of conversion for
    these
    utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
    the translation tool, and the regular expression tool, and severalother
    smaller tools (like the jar and compilation tools). It is correctto state
    that the extraction and translation tools only significantlyconvert the
    primary ND project objects (the pages, the data objects, and theproject
    classes). The extraction and translation tools do minimumtranslation of the
    User Module objects (i.e. they repackage the user module classes inthe new
    jato module packages). It is correct that for all other utilityclasses
    which are not formally part of the ND project, the extraction and
    translation tools do not perform any migration.
    However, the regular expression tool can "migrate" any arbitrary
    file
    (utility classes etc) to the degree that the regular expressionrules
    correlate to the code present in the arbitrary file. So first andforemost,
    if you have alot of spider code in your non-project classes youshould
    consider using the regular expression tool and if warranted adding
    additional rules to reduce the amount of manual adjustments thatneed to be
    made. I can stress this enough. We can even help you write theregular
    expression rules if you simply identify the code pattern you wish to
    convert. Just because there is not already a regular expressionrule to
    match your need does not mean it can't be written. We have notnearly
    exhausted the possibilities.
    For example if you say, we need to convert
    CSpider.getDataObject("X");
    To
    RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
    Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
    updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
    rule:
    <!--getPage to getViewBean-->
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getViewBean($1ViewBean.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    Following this example a getDataObject to getModel would look
    like this:
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getModel($1Model.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    In fact, one migration developer already wrote that rule andsubmitted it
    for inclusion in the basic set. I will post another upgrade to thebasic
    regular expression rule set, look for a "file uploaded" posting.Also,
    please consider contributing any additional generic rules that youhave
    written for inclusion in the basic set.
    Please not, that in some cases (Utility classes in particular)
    the rule
    application may be more effective as TWO sequention rules ratherthan one
    monolithic rule. Again using the example above, it will convert
    CSpider.getDataObject("Foo");
    To
    getModel(FooModel.class);
    Now that is the most effective conversion for that code if that
    code is in
    a page or data object class file. But if that code is in a Utilityclass you
    really want:
    >
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    So to go from
    getModel(FooModel.class);
    To
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    You would apply a second rule AND you would ONLY run this rule
    against
    your utility classes so that you would not otherwise affect yourViewBean
    and Model classes which are completely fine with the simplegetModel call.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getModel\(]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getModel\(]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    A similer rule can be applied to getSession and other CSpider APIcalls.
    For instance here is the rule for converting getSession calls toleverage
    the RequestManager.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getSession().]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    ----- Original Message -----
    From: "Matthew Stevens" <matthew.stevens@e...>
    Sent: Tuesday, August 07, 2001 12:56 PM
    Subject: RE: [iPlanet-JATO] Use Of models in utility classes
    Namburi,
    I will post a document to the group site this evening which has
    the
    details
    on various tactics of migrating these type of utilities.
    Essentially,
    you
    either need to convert these utilities to Models themselves or
    keep the
    utilities as is and simply use the
    RequestManager.getRequestContext.getModelManager().getModel()
    to statically access Models.
    For CSpSelect.executeImmediate() I have an example of customhelper
    method
    as a replacement whicch uses JDBC results instead of
    CSpDBResult.
    matt
    -----Original Message-----
    From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
    Sent: Tuesday, August 07, 2001 3:24 PM
    Subject: [iPlanet-JATO] Use Of models in utility classes
    Hi All,
    In the present ND project we have lots of utility classes.
    These
    classes in diffrent directory. Not part of nd pages.
    In these classes we access the dataobjects and do themanipulations.
    So we access dataobjects directly like
    CSpider.getDataObject("do....");
    and then execute it.
    Since the migration tool does not do much of conversion forthese
    utilities we have to do manually.
    My question is Can we access the the models in the postmigration
    sameway or do we need requestContext?
    We have lots of utility classes which are DataObjectintensive. Can
    someone suggest a better way to migrate this kind of code.
    Thanks
    Namburi
    [email protected]
    [email protected]
    [Non-text portions of this message have been removed]
    [email protected]
    [email protected]

    Namburi,
    When you said you used the Reg Exp tool, did you use it only as
    preconfigured by the iMT migrate application wizard?
    Because the default configuration of the regular expression tool will only
    target the files in your ND project directories. If you wish to target
    classes outside of the normal directory scope, you have to either modify the
    "Source Directory" property OR create another instance of the regular
    expression tool. See the "Tool" menu in the iMT to create additional tool
    instances which can each be configured to target different sets of files
    using different sets of rules.
    Usually, I utilize 3 different sets of rules files on a given migration:
    spider2jato.xml
    these are the generic conversion rules (but includes the optimized rules for
    ViewBean and Model based code, i.e. these rules do not utilize the
    RequestManager since it is not needed for code running inside the ViewBean
    or Model classes)
    I run these rules against all files.
    See the file download section of this forum for periodic updates to these
    rules.
    nonProjectFileRules.xml
    these include rules that add the necessary
    RequestManager.getRequestContext(). etc prefixes to many of the common
    calls.
    I run these rules against user module and any other classes that do not are
    not ModuleServlet, ContainerView, or Model classes.
    appXRules.xml
    these rules include application specific changes that I discover while
    working on the project. A common thing here is changing import statements
    (since the migration tool moves ND project code into different jato
    packaging structure, you sometime need to adjust imports in non-project
    classes that previously imported ND project specific packages)
    So you see, you are not limited to one set of rules at all. Just be careful
    to keep track of your backups (the regexp tool provides several options in
    its Expert Properties related to back up strategies).
    ----- Original Message -----
    From: <vnamboori@y...>
    Sent: Wednesday, August 08, 2001 6:08 AM
    Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
    forget about the regular expression potential
    Thanks Matt, Mike, Todd
    This is a great input for our migration. Though we used the existing
    Regular Expression Mapping tool, we did not change this to meet our
    own needs as mentioned by Mike.
    We would certainly incorporate this to ease our migration.
    Namburi
    --- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
    All--
    Great response. By the way, the Regular Expression Tool uses thePerl5 RE
    syntax as implemented by Apache OROMatcher. If you're doing lotsof these
    sorts of migration changes manually, you should definitely buy theO'Reilly
    book "Mastering Regular Expressions" and generate some rules toautomate the
    conversion. Although they are definitely confusing at first,regular
    expressions are fairly easy to understand with some documentation,and are
    superbly effective at tackling this kind of migration task.
    Todd
    ----- Original Message -----
    From: "Mike Frisino" <Michael.Frisino@S...>
    Sent: Tuesday, August 07, 2001 5:20 PM
    Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
    forget about the regular expression potential
    Also, (and Matt's document may mention this)
    Please bear in mind that this statement is not totally correct:
    Since the migration tool does not do much of conversion for
    these
    utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
    the translation tool, and the regular expression tool, and severalother
    smaller tools (like the jar and compilation tools). It is correctto state
    that the extraction and translation tools only significantlyconvert the
    primary ND project objects (the pages, the data objects, and theproject
    classes). The extraction and translation tools do minimumtranslation of the
    User Module objects (i.e. they repackage the user module classes inthe new
    jato module packages). It is correct that for all other utilityclasses
    which are not formally part of the ND project, the extraction and
    translation tools do not perform any migration.
    However, the regular expression tool can "migrate" any arbitrary
    file
    (utility classes etc) to the degree that the regular expressionrules
    correlate to the code present in the arbitrary file. So first andforemost,
    if you have alot of spider code in your non-project classes youshould
    consider using the regular expression tool and if warranted adding
    additional rules to reduce the amount of manual adjustments thatneed to be
    made. I can stress this enough. We can even help you write theregular
    expression rules if you simply identify the code pattern you wish to
    convert. Just because there is not already a regular expressionrule to
    match your need does not mean it can't be written. We have notnearly
    exhausted the possibilities.
    For example if you say, we need to convert
    CSpider.getDataObject("X");
    To
    RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
    Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
    updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
    rule:
    <!--getPage to getViewBean-->
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getViewBean($1ViewBean.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    Following this example a getDataObject to getModel would look
    like this:
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getModel($1Model.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    In fact, one migration developer already wrote that rule andsubmitted it
    for inclusion in the basic set. I will post another upgrade to thebasic
    regular expression rule set, look for a "file uploaded" posting.Also,
    please consider contributing any additional generic rules that youhave
    written for inclusion in the basic set.
    Please not, that in some cases (Utility classes in particular)
    the rule
    application may be more effective as TWO sequention rules ratherthan one
    monolithic rule. Again using the example above, it will convert
    CSpider.getDataObject("Foo");
    To
    getModel(FooModel.class);
    Now that is the most effective conversion for that code if that
    code is in
    a page or data object class file. But if that code is in a Utilityclass you
    really want:
    >
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    So to go from
    getModel(FooModel.class);
    To
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    You would apply a second rule AND you would ONLY run this rule
    against
    your utility classes so that you would not otherwise affect yourViewBean
    and Model classes which are completely fine with the simplegetModel call.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getModel\(]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getModel\(]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    A similer rule can be applied to getSession and other CSpider APIcalls.
    For instance here is the rule for converting getSession calls toleverage
    the RequestManager.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getSession().]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    ----- Original Message -----
    From: "Matthew Stevens" <matthew.stevens@e...>
    Sent: Tuesday, August 07, 2001 12:56 PM
    Subject: RE: [iPlanet-JATO] Use Of models in utility classes
    Namburi,
    I will post a document to the group site this evening which has
    the
    details
    on various tactics of migrating these type of utilities.
    Essentially,
    you
    either need to convert these utilities to Models themselves or
    keep the
    utilities as is and simply use the
    RequestManager.getRequestContext.getModelManager().getModel()
    to statically access Models.
    For CSpSelect.executeImmediate() I have an example of customhelper
    method
    as a replacement whicch uses JDBC results instead of
    CSpDBResult.
    matt
    -----Original Message-----
    From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
    Sent: Tuesday, August 07, 2001 3:24 PM
    Subject: [iPlanet-JATO] Use Of models in utility classes
    Hi All,
    In the present ND project we have lots of utility classes.
    These
    classes in diffrent directory. Not part of nd pages.
    In these classes we access the dataobjects and do themanipulations.
    So we access dataobjects directly like
    CSpider.getDataObject("do....");
    and then execute it.
    Since the migration tool does not do much of conversion forthese
    utilities we have to do manually.
    My question is Can we access the the models in the postmigration
    sameway or do we need requestContext?
    We have lots of utility classes which are DataObjectintensive. Can
    someone suggest a better way to migrate this kind of code.
    Thanks
    Namburi
    [email protected]
    [email protected]
    [Non-text portions of this message have been removed]
    [email protected]
    [email protected]

  • [SOLVED]ZSH and regular expressions

    Hi
    I am getting into regular expressions and i have noticed that with my .zshrc file i have some problem. In bash this expression works:
    \^\[^#]
    but not also in zsh. I have also noted that regular expression works fine with other zshrc configurations found in archwiki (like grml) but i want to have my configuration. And i really can't find what command make a difference
    My .zshrc file is pulled from this site https://github.com/slashbeast/things/bl … s/DOTzshrc.
    # .zshrc
    # Author: Piotr Karbowski <[email protected]>
    # License: beerware.
    # Basic zsh config.
    umask 077
    ZDOTDIR=${ZDOTDIR:-${HOME}}
    ZSHDDIR="${HOME}/.config/zsh.d"
    HISTFILE="${ZDOTDIR}/.zsh_history"
    HISTSIZE='10000'
    SAVEHIST="${HISTSIZE}"
    export EDITOR="/usr/bin/vim"
    export TMP="$HOME/tmp"
    export TEMP="$TMP"
    export TMPDIR="$TMP"
    export TMPPREFIX="${TMPDIR}/zsh"
    if [ ! -d "${TMP}" ]; then mkdir "${TMP}"; fi
    if ! [[ "${PATH}" =~ "^${HOME}/bin" ]]; then
    export PATH="${HOME}/bin:${PATH}"
    fi
    # Not all servers have terminfo for rxvt-256color. :<
    if [ "${TERM}" = 'rxvt-256color' ] && ! [ -f '/usr/share/terminfo/r/rxvt-256color' ] && ! [ -f '/lib/terminfo/r/rxvt-256color' ] && ! [ -f "${HOME}/.terminfo/r/rxvt-256color" ]; then
    export TERM='rxvt-unicode'
    fi
    # Colors.
    red='\e[0;31m'
    RED='\e[1;31m'
    green='\e[0;32m'
    GREEN='\e[1;32m'
    yellow='\e[0;33m'
    YELLOW='\e[1;33m'
    blue='\e[0;34m'
    BLUE='\e[1;34m'
    purple='\e[0;35m'
    PURPLE='\e[1;35m'
    cyan='\e[0;36m'
    CYAN='\e[1;36m'
    NC='\e[0m'
    # Functions
    if [ -f '/etc/profile.d/prll.sh' ]; then
    . "/etc/profile.d/prll.sh"
    fi
    run_under_tmux() {
    # Run $1 under session or attach if such session already exist.
    # $2 is optional path, if no specified, will use $1 from $PATH.
    # If you need to pass extra variables, use $2 for it as in example below..
    # Example usage:
    # torrent() { run_under_tmux 'rtorrent' '/usr/local/rtorrent-git/bin/rtorrent'; }
    # mutt() { run_under_tmux 'mutt'; }
    # irc() { run_under_tmux 'irssi' "TERM='screen' command irssi"; }
    # There is a bug in linux's libevent...
    # export EVENT_NOEPOLL=1
    command -v tmux >/dev/null 2>&1 || return 1
    if [ -z "$1" ]; then return 1; fi
    local name="$1"
    if [ -n "$2" ]; then
    local file_path="$2"
    else
    local file_path="command ${name}"
    fi
    if tmux has-session -t "${name}" 2>/dev/null; then
    tmux attach -d -t "${name}"
    else
    tmux new-session -s "${name}" "${file_path}" \; set-option status \; set set-titles-string "${name} (tmux@${HOST})"
    fi
    t() { run_under_tmux rtorrent; }
    irc() { run_under_tmux irssi "TERM='screen' command irssi"; }
    over_ssh() {
    if [ -n "${SSH_CLIENT}" ]; then
    return 0
    else
    return 1
    fi
    reload () {
    exec "${SHELL}" "$@"
    confirm() {
    local answer
    echo -ne "zsh: sure you want to run '${YELLOW}$@${NC}' [yN]? "
    read -q answer
    echo
    if [[ "${answer}" =~ ^[Yy]$ ]]; then
    command "${=1}" "${=@:2}"
    else
    return 1
    fi
    confirm_wrapper() {
    if [ "$1" = '--root' ]; then
    local as_root='true'
    shift
    fi
    local runcommand="$1"; shift
    if [ "${as_root}" = 'true' ] && [ "${USER}" != 'root' ]; then
    runcommand="sudo ${runcommand}"
    fi
    confirm "${runcommand}" "$@"
    poweroff() { confirm_wrapper --root $0 "$@"; }
    reboot() { confirm_wrapper --root $0 "$@"; }
    hibernate() { confirm_wrapper --root $0 "$@"; }
    detox() {
    if [ "$#" -ge 1 ]; then
    confirm detox "$@"
    else
    command detox "$@"
    fi
    has() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [ "${string}" = "${element}" ]; then
    return 0
    fi
    done
    return 1
    begin_with() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [[ "${string}" =~ "^${element}" ]]; then
    return 0
    fi
    done
    return 1
    termtitle() {
    case "$TERM" in
    rxvt*|xterm|nxterm|gnome|screen|screen-*)
    local prompt_host="${(%):-%m}"
    local prompt_user="${(%):-%n}"
    local prompt_char="${(%):-%~}"
    case "$1" in
    precmd)
    printf '\e]0;%s@%s: %s\a' "${prompt_user}" "${prompt_host}" "${prompt_char}"
    preexec)
    printf '\e]0;%s [%s@%s: %s]\a' "$2" "${prompt_user}" "${prompt_host}" "${prompt_char}"
    esac
    esac
    git_check_if_worktree() {
    # This function intend to be only executed in chpwd().
    # Check if the current path is in git repo.
    # We would want stop this function, on some big git repos it can take some time to cd into.
    if [ -n "${skip_zsh_git}" ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # The : separated list of paths where we will run check for git repo.
    # If not set, then we will do it only for /root and /home.
    if [ "${UID}" = '0' ]; then
    # running 'git' in repo changes owner of git's index files to root, skip prompt git magic if CWD=/home/*
    git_check_if_workdir_path="${git_check_if_workdir_path:-/root:/etc}"
    else
    git_check_if_workdir_path="${git_check_if_workdir_path:-/home}"
    git_check_if_workdir_path_exclude="${git_check_if_workdir_path_exclude:-${HOME}/_sshfs}"
    fi
    if begin_with "${PWD}" ${=git_check_if_workdir_path//:/ }; then
    if ! begin_with "${PWD}" ${=git_check_if_workdir_path_exclude//:/ }; then
    local git_pwd_is_worktree_match='true'
    else
    local git_pwd_is_worktree_match='false'
    fi
    fi
    if ! [ "${git_pwd_is_worktree_match}" = 'true' ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # todo: Prevent checking for /.git or /home/.git, if PWD=/home or PWD=/ maybe...
    # damn annoying RBAC messages about Access denied there.
    if [ -d '.git' ] || [ "$(git rev-parse --is-inside-work-tree 2> /dev/null)" = 'true' ]; then
    git_pwd_is_worktree='true'
    git_worktree_is_bare="$(git config core.bare)"
    else
    unset git_branch git_worktree_is_bare
    git_pwd_is_worktree='false'
    fi
    git_branch() {
    git_branch="$(git symbolic-ref HEAD 2>/dev/null)"
    git_branch="${git_branch##*/}"
    git_branch="${git_branch:-no branch}"
    git_dirty() {
    if [ "${git_worktree_is_bare}" = 'false' ] && [ -n "$(git status --untracked-files='no' --porcelain)" ]; then
    git_dirty='%F{green}*'
    else
    unset git_dirty
    fi
    precmd() {
    # Set terminal title.
    termtitle precmd
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    preexec() {
    # Set terminal title along with current executed command pass as second argument
    termtitle preexec "${(V)1}"
    chpwd() {
    git_check_if_worktree
    man() {
    if command -v vimmanpager >/dev/null 2>&1; then
    PAGER="vimmanpager" command man "$@"
    else
    command man "$@"
    fi
    # Are we running under grsecurity's RBAC?
    rbac_auth() {
    local auth_to_role='admin'
    if [ "${USER}" = 'root' ]; then
    if ! grep -qE '^RBAC:' "/proc/self/status" && command -v gradm > /dev/null 2>&1; then
    echo -e "\n${BLUE}*${NC} ${GREEN}RBAC${NC} Authorize to '${auth_to_role}' RBAC role."
    gradm -a "${auth_to_role}"
    fi
    fi
    #rbac_auth
    # Check if we started zsh in git worktree, useful with tmux when your new zsh may spawn in source dir.
    git_check_if_worktree
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    # Le features!
    # extended globbing, awesome!
    setopt extendedGlob
    # zmv - a command for renaming files by means of shell patterns.
    autoload -U zmv
    # zargs, as an alternative to find -exec and xargs.
    autoload -U zargs
    # Turn on command substitution in the prompt (and parameter expansion and arithmetic expansion).
    setopt promptsubst
    # Control-x-e to open current line in $EDITOR, awesome when writting functions or editing multiline commands.
    autoload -U edit-command-line
    zle -N edit-command-line
    bindkey '^x^e' edit-command-line
    # Include user-specified configs.
    if [ ! -d "${ZSHDDIR}" ]; then
    mkdir -p "${ZSHDDIR}" && echo "# Put your user-specified config here." > "${ZSHDDIR}/example.zsh"
    fi
    for zshd in $(ls -A ${HOME}/.config/zsh.d/^*.(z)sh$); do
    . "${zshd}"
    done
    # Completion.
    autoload -Uz compinit
    compinit
    zstyle ':completion:*' matcher-list 'm:{a-z}={A-Z}'
    zstyle ':completion:*' completer _expand _complete _ignored _approximate
    zstyle ':completion:*' menu select=2
    zstyle ':completion:*' select-prompt '%SScrolling active: current selection at %p%s'
    zstyle ':completion::complete:*' use-cache 1
    zstyle ':completion:*:descriptions' format '%U%F{cyan}%d%f%u'
    # If running as root and nice >0, renice to 0.
    if [ "$USER" = 'root' ] && [ "$(cut -d ' ' -f 19 /proc/$$/stat)" -gt 0 ]; then
    renice -n 0 -p "$$" && echo "# Adjusted nice level for current shell to 0."
    fi
    # Fancy prompt.
    if over_ssh && [ -z "${TMUX}" ]; then
    prompt_is_ssh='%F{blue}[%F{red}SSH%F{blue}] '
    elif over_ssh; then
    prompt_is_ssh='%F{blue}[%F{253}SSH%F{blue}] '
    else
    unset prompt_is_ssh
    fi
    case $USER in
    root)
    PROMPT='%B%F{cyan}%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{blue}%1~${git_prompt}%F{blue} %# %b%f%k'
    PROMPT='%B%F{blue}%n@%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{cyan}%1~${git_prompt}%F{cyan} %# %b%f%k'
    esac
    # Ignore lines prefixed with '#'.
    setopt interactivecomments
    # Ignore duplicate in history.
    setopt hist_ignore_dups
    # Prevent record in history entry if preceding them with at least one space
    setopt hist_ignore_space
    # Nobody need flow control anymore. Troublesome feature.
    #stty -ixon
    setopt noflowcontrol
    # Fix for tmux on linux.
    case "$(uname -o)" in
    'GNU/Linux')
    export EVENT_NOEPOLL=1
    esac
    # Aliases
    alias cp='cp -iv'
    alias rcp='rsync -v --progress'
    alias rmv='rsync -v --progress --remove-source-files'
    alias mv='mv -iv'
    alias rm='rm -iv'
    alias rmdir='rmdir -v'
    alias ln='ln -v'
    alias chmod="chmod -c"
    alias chown="chown -c"
    if command -v colordiff > /dev/null 2>&1; then
    alias diff="colordiff -Nuar"
    else
    alias diff="diff -Nuar"
    fi
    alias grep='grep --colour=auto'
    alias egrep='egrep --colour=auto'
    alias ls='ls --color=auto --human-readable --group-directories-first --classify'
    # Keys.
    case $TERM in
    rxvt*|xterm*)
    bindkey "^[[7~" beginning-of-line #Home key
    bindkey "^[[8~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    linux)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward
    bindkey "^[[B" history-beginning-search-forward
    screen|screen-*)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    esac
    bindkey "^R" history-incremental-pattern-search-backward
    bindkey "^S" history-incremental-pattern-search-forward
    if [ -f ~/.alert ]; then cat ~/.alert; fi
    Thanks for all the help.
    Last edited by Shark (2013-05-11 22:32:24)

    Raynman wrote:
    "This expression doesn't work", "It doesn't work" ...
    Could you try being a bit more specific?
    Firstly, i am sorry i didn't post the output. I should have know better.
    Secondly, chill out.
    I have used above regex with grep command. Output from terminal is:
    zsh: bad pattern: ^[^#]
    In bash it works perfectly.
    If i issue "setopt re_match_pcre" i have the same ouput as above.
    EDIT: If i issue "unsetopt no_match" it actually works but i have to change the regex from "\^\[^#]" to "\^[^#]" otherwise i get the same output as above. In bash both options work.
    Last edited by Shark (2013-05-11 22:07:21)

  • In a Regular expressions I can set up an "OR" statement?

    HI, I'm using e-tester version 8.2..My problem is about regular expessions.
    I need to catch a dynamic value from a Form Field (My-TxtBox).. this textbox gets pre populated data ( when the customer has info in the database).. this is the code:
    *<input name="My-TxtBox" type="text" value="XXXX" id="ID-TxtBox"*
    ---- ( XXXX = dynamic number prepopulated by the web app)
    So, I'm using this regular expression:
    *<input name="My-TxtBox" type="text" value="(.+?)" id="ID-TxtBox"*
    -----At this point, everything is fine.. but there is an exception
    I'm getting a big problem when the customer doesn't have data in the server, the code is NOT like this
    <input name="My-TxtBox" type="text" value="" id="ID-TxtBox"
    When the customer doesn't have data in the server, the code is more like this:
    *<input name="My-TxtBox" type="text" id="ID-TxtBox"*
    ---- (please note that there is not a value parameter now)
    So, I think there is no way to create a CDV that will work for both cases? any idea to solve this?
    i was thinking that maybe in the reg exps sintax you can create an "OR" statement.. my idea was to create a CDV
    that works for both cases.. when there is the "value=" string and when there is not.
    something like this
    This CDV returns the dynamic value when there is the "value=" string =
    <input name="My-TxtBox" type="text" value="(.+?)" id="ID-TxtBox"
    And this CDV returns "" when there is no "value=" string = Without value:
    <input name="My-TxtBox" type="text" (.*?)id="ID-TxtBox"
    My idea is to place something like this in some point of the CDV = *{* value="(.+?)" *OR* (.*?) *}*
    so my dream is to create a CDV similar to this:
    <input name="My-TxtBox" type="text" { value="(.+?)" *OR* (.*?) }id="ID-TxtBox
    I was searching on google but I simply don't get an answer....it is posible to place an OR statement into a Reg Exp and how the sintax is? ..
    Regards.. I appreciate your time.

    Hola,
    You can use a regular expression such as:
    <input name="My-TxtBox" type="text" ?v?a?l?u?e?=?"?(.*?)"? id="ID-TxtBox"
    Note that:
    * Note that there is a question mark sign (?) after each letter that may appear in the string. This means that the letter may or may not appear.
    * Note that instead of using (.+?) you should use (.*?).
    This means that will match any character that appears zero or mutliple times. The question mark here means that is non-greedy, meaning that it will not include in the .* matching anything like the rest of the pattern (in this case the rest of the pattern is "? id="ID-TxtBox").
    * Note that the question mark in front of the v of value is there to match a space that may or may not exists.
    Few other facts:
    * In regular expressions the parenthesys determine sequence of operations and mark groups. Such groups can be referenced in code (not in etester but in general). In eTester it will always get the value of the first group (first group = first set of parenthesys).
    * ORs in regular expressions can be expressed with the pipe "|" (without the quotes), but you will need parenthesys in this case which would not allow you to capture the group of characters that you want.
    Regards,
    [Z]{1}uriel C?
    Edited by: Zuriel on Oct 5, 2009 3:07 PM
    Edited to avoid having the text changed by the forum formatting options.

  • Introduction to regular expressions ...

    I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
    Introduction
    Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
    Having fun with regular expressions - Part 1
    Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
    Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
    If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
    Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
    Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
    WITH t AS (SELECT '456' col1
                 FROM dual
                UNION
               SELECT '123x'
                 FROM dual
                UNION  
               SELECT 'x123'
                 FROM dual
                UNION 
               SELECT 'y'
                 FROM dual
                UNION 
               SELECT '+789'
                 FROM dual
                UNION 
               SELECT '-789'
                 FROM dual
                UNION 
               SELECT '159-'
                 FROM dual
                UNION 
               SELECT '-1-'
                 FROM dual
    SELECT t.col1
      FROM t
    WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
    ;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
    To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
    I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
    Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
    Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
    So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
    Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
    Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
    Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
    Wait a minute, what happened to the row with the column value "-1-"?
    You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
    Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
    Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
    Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
    Compare your solution to this one:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
    Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
    I'll just use another SELECT to show what I want to do:
    WITH t AS (SELECT '0' col1
                 FROM dual
                UNION
               SELECT '0.' 
                 FROM dual
                UNION
               SELECT '.0' 
                 FROM dual
                UNION
               SELECT '0.0' 
                 FROM dual
                UNION
               SELECT '-1.0' 
                 FROM dual
                UNION
               SELECT '.1-' 
                 FROM dual
                UNION
               SELECT '.' 
                 FROM dual
                UNION
               SELECT '-1.1-' 
                 FROM dual
    SELECT t.*
      FROM t
    ;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
    NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
    This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
    All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
    Does your solution look a bit like this?
    WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                               '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                           '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                          )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
    Continued in Introduction to regular expressions ... continued.
    C.
    Fixed some embarrassing typos ... and mistakes.
    cd

    Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

  • Regular expressions in Format Definition add-on

    Hello experts,
    I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
    Summary of my problem:
    In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
    :61:071222D208,00N026
    :86:P  12345678BELASTINGDIENST       F8R03782497                $GH
    $0000009                         BETALINGSKENM. 123456789123456
    0 1234567891234560                                            
    :61:071225C758,70N078
    :86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
    CITY 48772-54314                                                  
    :61:071225C425,05N078
    :86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
    LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    
    :61:071225C850,00N078
    :86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
    DERNR. 53846 REF. MAIL 21-02
    - I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
    Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
    - a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
    - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
    - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
    I am looking forward to the right solutions, I can give more info if you need any.

    Hello Hendri,
    Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
    Answer: Format Definition uses .Net regular expression.
    You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
    Example 6
    Description:
    To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
    Regular expression:
    (?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
    Text:
    :61:0711211121C216,08N051NONREF
    Matches:
    1
    Tips:
    1.     All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D  (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
    Example 7
    Description:
    Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
    Regular expression:
    (?<=\b)(?<!\.)\d+(?=\b)(?!\.)
    Text:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    Matches:
    6
    Tips:
    1.     The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
    2.     It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    You may lose some real values like u201C3187u201D before the u201C-u201D.
    Example 8
    Description:
    Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
    Regular expression:
    (?<=^(P|0))\d
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
    2.     u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
    :86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
    The regular expression becomes
    (?<=:86:(P|0))\d
    Example 9
    Description:
    Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
    Regular expression:
    (?<=^(P|0)\d)[a-zA-Z. ]*
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     In this case, put BP account number regular expression into the look behind assertion.
    Example 10
    Description:
    Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc.  A possible document identification sub-record is made up of the following parts:
    u2022     keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
    u2022     an optional group made up of following:
         a separator of either a dot, hyphen or slash, and
         an optional space, and
         an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
         an optional space
    u2022     and finally document identification in digits
    Regular expression:
    (?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
    Kind Regards
    -Yatsea

  • Regular Expression in Java problem

    what is wrong with the following regex?
    query = query.replaceAll("SELECT.*?([WHERE.*?|GROUP BY.*?|HAVING.*?|ORDER BY.*?|LIMIT.*?]*?)","\\1");
    when I put in this:
    "SELECT WHERE MlsNumber=\'555555\', AdType=\'MyAdType\'"
    I get this:
    "1 WHERE MlsNumber='5100093', AdType='NytClass'"
    Where is the 1 coming from? I know the backreference must be working or I wouldnt get the WHERE statement back.

    There's a pretty good regex tutorial at this site: http://www.regular-expressions.info/ (I meant to include that in my first reply).
    Basically what I'm trying to do is cut the SELECT and
    anything after it up until it reaches the WHERE (and
    text), GROUPBY, HAVING, ORDER BY, or LIMIT. I want to
    remove The SELECT and text, but keep all instances of
    WHERE text GROUP BY text, etc. I also want to
    keep anything that is past then end of these
    expressions (in case there are option I haven't
    forseen).Try this:  query = query.replaceFirst("SELECT.*?(?=WHERE|GROUP BY|HAVING|ORDER BY|LIMIT)", "");The (?=...) part is a lookahead; it will cause the .*? to stop matching at the first instance of "WHERE", "GROUP BY", "HAVING", "ORDER BY", or "LIMIT". (Check out the "Lookahead & Lookbehind" section in the tutorial for an explanation.) However, if the first thing after the SELECT clause is one of the unknown options you mentioned, it will be removed too. If you know that keywords will always be in all caps, and that none of the other text will be, you could try generalizing the regex like this:  query = query.replaceFirst("SELECT.*? (?=[A-Z]{3,})", "");This regex assumes that any sequence of three or more capital letters preceded by a space is a keyword, which is probably not a safe assumption, but it gives you an idea of the kind of thing you can try.
    I thought the brackets were there to allow you to
    select choices from a group of items, if not I can
    remove them.Alternation doesn't require special bracketing characters. You usually want to enclose it in parentheses, but that's just to isolate it from the rest of the regex (e.g., "abc(?:foo|bar)xyz"). Square brackets are used for character classes, which are a completely different breed of animal; look them up in the tutorial.
    Apparantly the JDK docs are wrong, since they tell you
    to use the \\1 instead of $1 (which works).If you want to use a backreference within the regex, you use \1. For instance, if you want to match a complete HTML element, you might use  String regex = "<(\\w+)[^>]*>.*?</\\1>";But in the replacement string, you use $1. BTW, it's the Matcher docs that tell you that, not the Pattern docs.

  • Regular Expressions find and replace

    Hi ,
    I have a question on using Regular Expressions in Java(java.util.regex).
    Problem Description:
    I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
    If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
    I have to do it inside a servlet and hence have to use java.
    I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
    //Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
              String ddurl="http://www.google.com/";
    String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
    Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
    Matcher m = p.matcher (strHTML);
    while(m.find())
    m.replaceAll(ddurl+m.group(1));
    what is wrong in this?
    Thanks,
    Rajiv

    Right, here's the full monte (whatever that means):import java.util.regex.*;
    public class Test1
      public static void main(String[] args)
        String domain = "http://www.google.com/";
        String strHTML =
          " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=images/logo.gif >\n" +
          " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
          " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
        String regex =
          "(<\\s*img.+?src\\s*=\\s*)   # Capture preliminaries in $1.  \n" +
          "(?:                         # First look for URL in quotes. \n" +
          "   ([\"\'])                 #   Capture open quote in $2.   \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?(.+?)                  #    ...capture URL in $3       \n" +
          "   \\2                      #   Match the closing quote     \n" +
          " |                          # Look for non-quoted URL.      \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?([^\\s>]+)             #    ...capture URL in $4       \n" +
        Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
        Matcher m = p.matcher(strHTML);
        StringBuffer sbuf = new StringBuffer();
        while (m.find())
          String relURL = m.group(3) != null ? m.group(3) : m.group(4);
          m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
        m.appendTail(sbuf);
        System.out.println(sbuf.toString());
    }First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
    As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

  • Regular Expressions Question

    I am using regular expressions to parse through a text file generated by a set of sensors attached to a SeaBird CTD profiler, what matter is that some lines use quotes as a special marker and some dont, but I need to treat both as the same data. Here is the example:
    SeaBird Datafile (just some part):
    "# name 0 = depS: depth, salt water [m]"
    "# name 1 = t068: temperature, IPTS-68 [deg C]"
    # name 2 = sal00: salinity, PSS-78 [PSU]
    "# name 3 = sigma-t00: density, sigma-t [kg/m^3]"
    "# name 4 = flS: fluorometer, sea tech"And I am using this regular expression to read this lines and save those names:
    private static final String ColumnName = "(^\"#|^# ) name (\\d)+ = ((.+)\"$|(.+)$)";There is a possiblity that the string either starts with "# and ends with ", or that it only starts with # and no special marker to the end. Can anyone enlighten me on the correct regex because the one I posted up there only works in the case of being surrounded by quotes. Thanks in advance!
    Christian A. Sueiras

    Try this: private static final String ColumnName = "^(\"?)# name (\\d+) = (.+)\\1$";{code} You match an optional quotation mark at the beginning, and capture it in group #1.  At the end, you match whatever was captured in group #1: either a quotation mark, or nothing.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Regular expressions for replacing text with sms language text

    Hi, I'm trying to write a function which converts normal, correctly spelled text into the shorter sms language format but struggling to come up with the regular expressions i need to do so, can anyone help?
    1: remove surplus white space at the beginning of a sentence and at the end of a sentence.
    e.g. " hello." --> "hello." OR "hello ." --> "hello."
    2: remove preceeding and/or proceeding space if there's a word then a number possibly followed by another word
    e.g. "come 2 me" --> "come2me" OR "dnt 4get" --> "dnt4get"
    3: remove "aeiou" if word starts and ends with "!aeiou"
    e.g. "text" --> "txt"

    You can make the whitespace on either side optional:   text = text.replaceAll("\\s*(\\d)\\s*", "$1");1. Use String's trim() method.
    3. This one has to be done in two steps: import java.util.regex.*;
    public class Test
      public static void main(String... args) throws Exception
        String text = "The quick brown fox jumps over the lazy dog.";
        System.out.println(devowelize(text));
      public static String devowelize(String str)
        Pattern p = Pattern.compile(
          "[a-z&&[^aeiou]]++(?:[aeiou]++[a-z&&[^aeiou]]++)+",
          Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(str);
        StringBuffer sb = new StringBuffer();
        while (m.find())
          m.appendReplacement(sb, m.group().replaceAll("[aeiou]+", ""));
        m.appendTail(sb);
        return sb.toString();
    }

Maybe you are looking for