Regular expression confusion

Hi,
I want to use regular expressions to parse a text data file that has the following structure:
key1=value<EOL>
key2=value<EOL>
keyn=value<EOL>
<EOR><EOL>
key1=value<EOL>
keyn=value<EOL>
<EOR><EOL>
etc.
...where <EOR> is a user specified string that defines the end of the record and <EOL> is a user defined line delimiter (i.e. \n). I use the following regular expression to extract a single line [^n]+ which returns a string upto but exclusive of the delimiter itself. I want to be able to do the same for a delimiter that is a string rather than a single character. For example I would like to extract the entire record from the file using a regular expression; that is, all the characters (including line delimiters) upto but exclusive of the <EOR> where <EOR> is a string such as "EOR".
Is there a pattern similar to [^n]+ where I could also specify a string rather than a single character?
Given the following data file (newline characters are shown for clarity)...
MAKE=FORD\n
MODEL=MUSTANG\n
YEAR=1969\n
EOR\n
MAKE=DODGE\n
MODEL=CHARGER\n
YEAR=1973\n
EOR\n
MAKE=CHEVROLET\n
MODEL=CORVETTE\n
YEAR=1977\n
<end of file>
I want to know of a regular expression that will extract an entire record such that the group() method would return, for example, the following string when applied to the start of the file:
MAKE=FORD\n
MODEL=MUSTANG\n
YEAR=1969\n

I might be missing the point of what you want to do here, but I would probably approach it this way (modifying alice's example.)
import java.util.regex.*;
public class Test
    public static void main(String[] args)
        String str = "MAKE=FORD\n" +
                       "MODEL=MUSTANG\n" +
                       "YEAR=1969\n" +
                       "EOR\n" +
                       "MAKE=DODGE\n" +
                       "MODEL=CHARGER\n" +
                       "YEAR=1973\n" +
                       "EOR\n" +
                       "MAKE=CHEVROLET\n" +
                       "MODEL=CORVETTE\n" +
                       "YEAR=1977\n";
          Pattern p = Pattern.compile("(.*?)(EOR|\\z)", Pattern.MULTILINE | Pattern.DOTALL);
          Matcher m = p.matcher(str);
          while (m.find())
              System.out.println();
              System.out.println(m.group(1));
}

Similar Messages

  • Confusion in a regular expression

    Hi,
    I am getting problem in understanding a regular expression "J.*\\d[0-35-9]-\\d\\d-\\d\\d" . The author says that it represents the pattern "Joe's Birthday is 12-17-77".
    My problem is that what is [0-35-9] here. I have practised before [1-9] which represents any no. between 1 and 9 but I m not getting the meaning of [0-35-9]
    Can anybody help me?
    Best,
    Arsalan

    [0-35-9]
    is 0123 56789 but not 4thanx Gino86! I could now understand this regular expression

  • Re: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't forget about the regular expression potential

    Namburi,
    When you said you used the Reg Exp tool, did you use it only as
    preconfigured by the iMT migrate application wizard?
    Because the default configuration of the regular expression tool will only
    target the files in your ND project directories. If you wish to target
    classes outside of the normal directory scope, you have to either modify the
    "Source Directory" property OR create another instance of the regular
    expression tool. See the "Tool" menu in the iMT to create additional tool
    instances which can each be configured to target different sets of files
    using different sets of rules.
    Usually, I utilize 3 different sets of rules files on a given migration:
    spider2jato.xml
    these are the generic conversion rules (but includes the optimized rules for
    ViewBean and Model based code, i.e. these rules do not utilize the
    RequestManager since it is not needed for code running inside the ViewBean
    or Model classes)
    I run these rules against all files.
    See the file download section of this forum for periodic updates to these
    rules.
    nonProjectFileRules.xml
    these include rules that add the necessary
    RequestManager.getRequestContext(). etc prefixes to many of the common
    calls.
    I run these rules against user module and any other classes that do not are
    not ModuleServlet, ContainerView, or Model classes.
    appXRules.xml
    these rules include application specific changes that I discover while
    working on the project. A common thing here is changing import statements
    (since the migration tool moves ND project code into different jato
    packaging structure, you sometime need to adjust imports in non-project
    classes that previously imported ND project specific packages)
    So you see, you are not limited to one set of rules at all. Just be careful
    to keep track of your backups (the regexp tool provides several options in
    its Expert Properties related to back up strategies).
    ----- Original Message -----
    From: <vnamboori@y...>
    Sent: Wednesday, August 08, 2001 6:08 AM
    Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
    forget about the regular expression potential
    Thanks Matt, Mike, Todd
    This is a great input for our migration. Though we used the existing
    Regular Expression Mapping tool, we did not change this to meet our
    own needs as mentioned by Mike.
    We would certainly incorporate this to ease our migration.
    Namburi
    --- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
    All--
    Great response. By the way, the Regular Expression Tool uses thePerl5 RE
    syntax as implemented by Apache OROMatcher. If you're doing lotsof these
    sorts of migration changes manually, you should definitely buy theO'Reilly
    book "Mastering Regular Expressions" and generate some rules toautomate the
    conversion. Although they are definitely confusing at first,regular
    expressions are fairly easy to understand with some documentation,and are
    superbly effective at tackling this kind of migration task.
    Todd
    ----- Original Message -----
    From: "Mike Frisino" <Michael.Frisino@S...>
    Sent: Tuesday, August 07, 2001 5:20 PM
    Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
    forget about the regular expression potential
    Also, (and Matt's document may mention this)
    Please bear in mind that this statement is not totally correct:
    Since the migration tool does not do much of conversion for
    these
    utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
    the translation tool, and the regular expression tool, and severalother
    smaller tools (like the jar and compilation tools). It is correctto state
    that the extraction and translation tools only significantlyconvert the
    primary ND project objects (the pages, the data objects, and theproject
    classes). The extraction and translation tools do minimumtranslation of the
    User Module objects (i.e. they repackage the user module classes inthe new
    jato module packages). It is correct that for all other utilityclasses
    which are not formally part of the ND project, the extraction and
    translation tools do not perform any migration.
    However, the regular expression tool can "migrate" any arbitrary
    file
    (utility classes etc) to the degree that the regular expressionrules
    correlate to the code present in the arbitrary file. So first andforemost,
    if you have alot of spider code in your non-project classes youshould
    consider using the regular expression tool and if warranted adding
    additional rules to reduce the amount of manual adjustments thatneed to be
    made. I can stress this enough. We can even help you write theregular
    expression rules if you simply identify the code pattern you wish to
    convert. Just because there is not already a regular expressionrule to
    match your need does not mean it can't be written. We have notnearly
    exhausted the possibilities.
    For example if you say, we need to convert
    CSpider.getDataObject("X");
    To
    RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
    Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
    updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
    rule:
    <!--getPage to getViewBean-->
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getViewBean($1ViewBean.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    Following this example a getDataObject to getModel would look
    like this:
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getModel($1Model.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    In fact, one migration developer already wrote that rule andsubmitted it
    for inclusion in the basic set. I will post another upgrade to thebasic
    regular expression rule set, look for a "file uploaded" posting.Also,
    please consider contributing any additional generic rules that youhave
    written for inclusion in the basic set.
    Please not, that in some cases (Utility classes in particular)
    the rule
    application may be more effective as TWO sequention rules ratherthan one
    monolithic rule. Again using the example above, it will convert
    CSpider.getDataObject("Foo");
    To
    getModel(FooModel.class);
    Now that is the most effective conversion for that code if that
    code is in
    a page or data object class file. But if that code is in a Utilityclass you
    really want:
    >
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    So to go from
    getModel(FooModel.class);
    To
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    You would apply a second rule AND you would ONLY run this rule
    against
    your utility classes so that you would not otherwise affect yourViewBean
    and Model classes which are completely fine with the simplegetModel call.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getModel\(]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getModel\(]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    A similer rule can be applied to getSession and other CSpider APIcalls.
    For instance here is the rule for converting getSession calls toleverage
    the RequestManager.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getSession().]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    ----- Original Message -----
    From: "Matthew Stevens" <matthew.stevens@e...>
    Sent: Tuesday, August 07, 2001 12:56 PM
    Subject: RE: [iPlanet-JATO] Use Of models in utility classes
    Namburi,
    I will post a document to the group site this evening which has
    the
    details
    on various tactics of migrating these type of utilities.
    Essentially,
    you
    either need to convert these utilities to Models themselves or
    keep the
    utilities as is and simply use the
    RequestManager.getRequestContext.getModelManager().getModel()
    to statically access Models.
    For CSpSelect.executeImmediate() I have an example of customhelper
    method
    as a replacement whicch uses JDBC results instead of
    CSpDBResult.
    matt
    -----Original Message-----
    From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
    Sent: Tuesday, August 07, 2001 3:24 PM
    Subject: [iPlanet-JATO] Use Of models in utility classes
    Hi All,
    In the present ND project we have lots of utility classes.
    These
    classes in diffrent directory. Not part of nd pages.
    In these classes we access the dataobjects and do themanipulations.
    So we access dataobjects directly like
    CSpider.getDataObject("do....");
    and then execute it.
    Since the migration tool does not do much of conversion forthese
    utilities we have to do manually.
    My question is Can we access the the models in the postmigration
    sameway or do we need requestContext?
    We have lots of utility classes which are DataObjectintensive. Can
    someone suggest a better way to migrate this kind of code.
    Thanks
    Namburi
    [email protected]
    [email protected]
    [Non-text portions of this message have been removed]
    [email protected]
    [email protected]

    Namburi,
    When you said you used the Reg Exp tool, did you use it only as
    preconfigured by the iMT migrate application wizard?
    Because the default configuration of the regular expression tool will only
    target the files in your ND project directories. If you wish to target
    classes outside of the normal directory scope, you have to either modify the
    "Source Directory" property OR create another instance of the regular
    expression tool. See the "Tool" menu in the iMT to create additional tool
    instances which can each be configured to target different sets of files
    using different sets of rules.
    Usually, I utilize 3 different sets of rules files on a given migration:
    spider2jato.xml
    these are the generic conversion rules (but includes the optimized rules for
    ViewBean and Model based code, i.e. these rules do not utilize the
    RequestManager since it is not needed for code running inside the ViewBean
    or Model classes)
    I run these rules against all files.
    See the file download section of this forum for periodic updates to these
    rules.
    nonProjectFileRules.xml
    these include rules that add the necessary
    RequestManager.getRequestContext(). etc prefixes to many of the common
    calls.
    I run these rules against user module and any other classes that do not are
    not ModuleServlet, ContainerView, or Model classes.
    appXRules.xml
    these rules include application specific changes that I discover while
    working on the project. A common thing here is changing import statements
    (since the migration tool moves ND project code into different jato
    packaging structure, you sometime need to adjust imports in non-project
    classes that previously imported ND project specific packages)
    So you see, you are not limited to one set of rules at all. Just be careful
    to keep track of your backups (the regexp tool provides several options in
    its Expert Properties related to back up strategies).
    ----- Original Message -----
    From: <vnamboori@y...>
    Sent: Wednesday, August 08, 2001 6:08 AM
    Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
    forget about the regular expression potential
    Thanks Matt, Mike, Todd
    This is a great input for our migration. Though we used the existing
    Regular Expression Mapping tool, we did not change this to meet our
    own needs as mentioned by Mike.
    We would certainly incorporate this to ease our migration.
    Namburi
    --- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
    All--
    Great response. By the way, the Regular Expression Tool uses thePerl5 RE
    syntax as implemented by Apache OROMatcher. If you're doing lotsof these
    sorts of migration changes manually, you should definitely buy theO'Reilly
    book "Mastering Regular Expressions" and generate some rules toautomate the
    conversion. Although they are definitely confusing at first,regular
    expressions are fairly easy to understand with some documentation,and are
    superbly effective at tackling this kind of migration task.
    Todd
    ----- Original Message -----
    From: "Mike Frisino" <Michael.Frisino@S...>
    Sent: Tuesday, August 07, 2001 5:20 PM
    Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
    forget about the regular expression potential
    Also, (and Matt's document may mention this)
    Please bear in mind that this statement is not totally correct:
    Since the migration tool does not do much of conversion for
    these
    utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
    the translation tool, and the regular expression tool, and severalother
    smaller tools (like the jar and compilation tools). It is correctto state
    that the extraction and translation tools only significantlyconvert the
    primary ND project objects (the pages, the data objects, and theproject
    classes). The extraction and translation tools do minimumtranslation of the
    User Module objects (i.e. they repackage the user module classes inthe new
    jato module packages). It is correct that for all other utilityclasses
    which are not formally part of the ND project, the extraction and
    translation tools do not perform any migration.
    However, the regular expression tool can "migrate" any arbitrary
    file
    (utility classes etc) to the degree that the regular expressionrules
    correlate to the code present in the arbitrary file. So first andforemost,
    if you have alot of spider code in your non-project classes youshould
    consider using the regular expression tool and if warranted adding
    additional rules to reduce the amount of manual adjustments thatneed to be
    made. I can stress this enough. We can even help you write theregular
    expression rules if you simply identify the code pattern you wish to
    convert. Just because there is not already a regular expressionrule to
    match your need does not mean it can't be written. We have notnearly
    exhausted the possibilities.
    For example if you say, we need to convert
    CSpider.getDataObject("X");
    To
    RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
    Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
    updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
    rule:
    <!--getPage to getViewBean-->
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getViewBean($1ViewBean.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    Following this example a getDataObject to getModel would look
    like this:
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[getModel($1Model.class]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    In fact, one migration developer already wrote that rule andsubmitted it
    for inclusion in the basic set. I will post another upgrade to thebasic
    regular expression rule set, look for a "file uploaded" posting.Also,
    please consider contributing any additional generic rules that youhave
    written for inclusion in the basic set.
    Please not, that in some cases (Utility classes in particular)
    the rule
    application may be more effective as TWO sequention rules ratherthan one
    monolithic rule. Again using the example above, it will convert
    CSpider.getDataObject("Foo");
    To
    getModel(FooModel.class);
    Now that is the most effective conversion for that code if that
    code is in
    a page or data object class file. But if that code is in a Utilityclass you
    really want:
    >
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    So to go from
    getModel(FooModel.class);
    To
    RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
    You would apply a second rule AND you would ONLY run this rule
    against
    your utility classes so that you would not otherwise affect yourViewBean
    and Model classes which are completely fine with the simplegetModel call.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getModel\(]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getModel\(]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    A similer rule can be applied to getSession and other CSpider APIcalls.
    For instance here is the rule for converting getSession calls toleverage
    the RequestManager.
    <mapping-rule>
    <mapping-rule-primarymatch>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-primarymatch>
    <mapping-rule-replacement>
    <mapping-rule-match>
    <![CDATA[getSession\(\)\.]]>
    </mapping-rule-match>
    <mapping-rule-substitute>
    <![CDATA[RequestManager.getSession().]]>
    </mapping-rule-substitute>
    </mapping-rule-replacement>
    </mapping-rule>
    ----- Original Message -----
    From: "Matthew Stevens" <matthew.stevens@e...>
    Sent: Tuesday, August 07, 2001 12:56 PM
    Subject: RE: [iPlanet-JATO] Use Of models in utility classes
    Namburi,
    I will post a document to the group site this evening which has
    the
    details
    on various tactics of migrating these type of utilities.
    Essentially,
    you
    either need to convert these utilities to Models themselves or
    keep the
    utilities as is and simply use the
    RequestManager.getRequestContext.getModelManager().getModel()
    to statically access Models.
    For CSpSelect.executeImmediate() I have an example of customhelper
    method
    as a replacement whicch uses JDBC results instead of
    CSpDBResult.
    matt
    -----Original Message-----
    From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
    Sent: Tuesday, August 07, 2001 3:24 PM
    Subject: [iPlanet-JATO] Use Of models in utility classes
    Hi All,
    In the present ND project we have lots of utility classes.
    These
    classes in diffrent directory. Not part of nd pages.
    In these classes we access the dataobjects and do themanipulations.
    So we access dataobjects directly like
    CSpider.getDataObject("do....");
    and then execute it.
    Since the migration tool does not do much of conversion forthese
    utilities we have to do manually.
    My question is Can we access the the models in the postmigration
    sameway or do we need requestContext?
    We have lots of utility classes which are DataObjectintensive. Can
    someone suggest a better way to migrate this kind of code.
    Thanks
    Namburi
    [email protected]
    [email protected]
    [Non-text portions of this message have been removed]
    [email protected]
    [email protected]

  • Regular Expressions and Double Byte Characters ?

    Is it possible to use Java Regular Expressions to parse
    a file that will contain double byte characters ?
    For example, I want a regular expression to match the following line
    tag="double byte stuff" id="double byte stuff"

    The comments on the bytes/strings were helpful. Thanks.
    But I'm still confused as to what matching pattern could be used.
    For example a pattern like:
    [A-Za-z]
    I assume would not match any double byte characters.
    I also assume the following won't work either:
    [\\p{Alpah}]
    because it is posix - US-ASCII only.
    So how do you say "match the tag, then take any characters,
    double byte, ascii, whatever, then match the text tag - per the
    original example ?

  • Problems with java regular expressions

    Hi everybody,
    Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
    For example, I have this code in java:
    import java.util.regex.*;
    String text = "abc";
              Pattern p = Pattern.compile("(a)b(c)");
              Matcher m = p.matcher(text);
    if (m.matches())
                   int count = m.groupCount();
                   System.out.println("Groups found " + String.valueOf(count) );
                   for (int i = 0; i < count; i++)
                        System.out.println("group " + String.valueOf(i) + " " + m.group(i));
    My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
    Groups found 2
    group 0 abc
    group 1 a
    I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
    I am using Java 1.5.0 on Mac OS X 10.4.
    Thank to all who can help.

    paulcw wrote:
    If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
    Matcher.groupCount():
    Group zero denotes the entire pattern by convention. It is not included in this count.

  • Help with Regular Expressions

    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
        protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
        protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
        protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
        public String fixPrefix (String w)
            return w.replaceFirst(PUNC_PREFIX, "");
        public String fixSuffix (String w)
            return w.replaceFirst(PUNC_SUFFIX, "");
        }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. Doesn't the ^ mean in the front, and doesn't the $ mean in the back, so shouldn't this work by just changing replaceFirst to replace?

    JFactor2004 wrote:
    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
    protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
    protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
    protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
    public String fixPrefix (String w)
    return w.replaceFirst(PUNC_PREFIX, "");
    public String fixSuffix (String w)
    return w.replaceFirst(PUNC_SUFFIX, "");
    }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. The first parameter of the replaceFirst(...) is a regex-String. And the ^, $ and [...] stuff all have a special meaning in the regex language. The replace(...) method takes plain-Strings as a parameter, so the String "^\[\\d\\p{Punct}\]+" is interpreted as just that, without any special meaning.
    JFactor2004 wrote:
    Doesn't the ^ mean in the front, and doesn't the $ mean in the back, Yes, ^ means the start of the String (sometimes the start of a line) and $ means the end of the String (sometimes the end of a line).
    JFactor2004 wrote:
    so shouldn't this work by just changing replaceFirst to replace?Nope, see the explanation above.

  • Regular Expression Character Sets with Pattern and Matcher

    Hi,
    I am a little bit confused about a regular expressions I am writing, it works in other languages but not in Java.
    The regular expressions is to match LaTeX commands from a file, and is as follows:
    \\begin{command}([.|\n\r\s]*)\\end{command}
    This does not work in Java but does in PHP, C, etc...
    The part that is strange is the . character. If placed as .* it works but if placed as [.]* it doesnt. Does this mean that . cannot be placed in a character range in Java?
    Any help very much appreciated.
    Kind Regards
    Paul Bain

    In PHP it seems that the "." still works as a all character operator inside character classes.
    The regular expression posted did not work, but it does if I do:
    \\begin{command}((.|[\n\r\s])*)?\\end{command}
    Basically what I'm trying to match is a block of LaTeX, so the \\begin{command} and \\end{command} in LaTeX, not regex, although the \\ is a single one in LaTeX. I basically want to match any block which starts with one of those and ends in the end command. so really the regular expression that counts is the bit in the middle, ((.|[\n\r\s])*)?
    Am I right it saying that the "?" will prevent the engine matching the first and last \\bein and \\end in the following example:
    \\begin{command}
    some stuff
    \\end{command}
    \\begin{command}
    some stuff
    \\end{command}

  • Using Regular Expressions for Completion

    I'm trying to build a text completer for a simple little editor. The general idea is that I have a regular expression which describes the syntax of an expression and a set of strings which are all semantically valid cases of the expression (the latter of which is not particularly important to my problem). I would like to be able to determine, using the expression described, whether or not a section of text is capable of beginning a syntactically valid expression, not matching it.
    For example, given the expression
    "#[A-Za-z0-9]#" the string "#name#" is syntactically valid, whereas the string "#_blarg" is not. What I would like to do is be able to determine that "#partial" has the potential to match the pattern with more input, even if it doesn't yet. Specifically, the eventual use will be in such a case as the string X=#partial+3. If the cursor is positioned before the "+" and my user presses the completion keystroke, I want to recognize that "#partial" is what I need to recognize. Also, positioning the cursor immediately after the "=" and pressing the keystroke will do nothing, since nothing before the "=" is capable of matching the pattern properly.
    Is this possible? I don't have to use this exact approach, but it is important that I be able to use the regular expression in detecting a partially completed expression. If I can, the set of regular expressions which already exist in the code can be used to drive the auto completer. Otherwise, I'll have to write a special recognition module for each case; that wouldn't be pretty.
    Thanks for your time! I'll provide other information upon request, if it'd help. :)

    Thank you both for discussing this; it has definitely helped me in reaching a better understanding of uncle_alice's answer to my problem. I've adjusted my code to use this approach and, for the most part, it seems to work.
    I say "for the most part" because I am compiling Patterns with the case insensitivity flag. This appears to do horrible, horrible things. Take a look at the following code, modified from uncle_alice's example:
    String[] str = {"#test#hello", "#tes", "blargblarg", "", "#test#", "S"};
    String rgx = "#[A-Za-z0-9]+#";
    Pattern pc = Pattern.compile(rgx);
    Pattern pi = Pattern.compile(rgx, Pattern.CASE_INSENSITIVE);
    for (String s : str)
        System.out.println("    For string: "+s);
        for (Pattern p : new Pattern[]{pc, pi}) // once for each pattern
            Matcher m = p.matcher(s);
            if (m.matches())
                System.out.printf("Matched '%s'", m.group());
            } else
                System.out.print("No match");
            System.out.println("; hitEnd = " + m.hitEnd());
    }That produces the following output:
        For string: #test#hello
    No match; hitEnd = false
    No match; hitEnd = true
        For string: #tes
    No match; hitEnd = true
    No match; hitEnd = true
        For string: blargblarg
    No match; hitEnd = false
    No match; hitEnd = true
        For string:
    No match; hitEnd = true
    No match; hitEnd = true
        For string: #test#
    Matched '#test#'; hitEnd = false
    Matched '#test#'; hitEnd = false
        For string: S
    No match; hitEnd = false
    No match; hitEnd = trueIt would seem that, with the case-insensitive flag set, hitEnd always returns true unless a match is found. Why is this? I find it quite confusing.
    I can adjust my design to accomodate if this problem cannot be circumvented; however, I'd like to understand what has going wrong here. :)
    Cheers! Thanks so much for all your help!

  • Regular Expression Rage.

    Hello,
    I'm currently being driven insane by the regular expressions in java. I want to use the matches(String aString) method of String to find out if the user has inputed a wilcard filename such as *.txt or *.bmp etc.
    The problem is I can't seem to put a * or . into the expression without it being considered as a special character. I want to match *. followed by one or more characters. It would seem logical that this should be "\*\..*" however the compiler complains that \* and \. are ilegal escape characters. In the API "\{" is given as an example of an escape character but when I try it I get the same compiler error.
    Also I was trying to match a single backslash which I thought should be "\\" This compiles okay but when I run the program I get runtime error from Pattern. More confusingly if I use ".*\\.*" it will match with any input.
    Am I doing something fundamentally wrong ? Some of my other matching patterns seem to work just fine.

    Well, for the Java compiler you need to escape \ making it \\.
    Additionally the regexp requires you to escape \, so you escape it once for the regexp engine and once for the Java compiler making it \\\\.

  • Regular Expressions with Call Policy on VCSe

    Hi Guys,
      I am working on firming up the call policy on my VCS Expressway to try to better intercept the SIP spam requests it is getting from internet ip numbers. Right now those spam requets are getting rejected by the loop detection but I want to intercept them before they even do a search on the Expressway. It seems that the call policy rules I create without regular expressions are functioning fine but I don't think I have the syntax correct for the regular expressions.
    The goal of this rule is to reject any incoming SIP request that has a destination alias format of 7 to 17 digits followed by an @VCSe_IP. so for example it would reject the following attempts: 0123456@VCSe_IP and 0123456789101112@VCSe_IP with one rule.
    The policy I created is this: source pattern: unauthenticated user, Destination pattern: \d{7,17}@xx\.xx\.xx\.xx (where xx is the individual octets of the VCSe IP address), Action: reject  
    However the above policy does not seem to be rejecting the calls before they do a search. I have checked the above expression with the check pattern tool on the VCSe and it comes up with a sucessful match when I try the destination alias of a request that made it through, hence my confusion. Any help you guys could provide would be appreciated.
    Thanks,
    Steven                

    Steven,
    Default Zone access rules do not relate to this at all and you can keep those set to 'No'.
    How exactly are you placing the test calls when attempting to verify this?
    I created the following CPL rule on my X7 VCS (With 10.10.10.10 being the IP address of my VCS):
    Source pattern:
    Destination pattern: \d{7,17}@10\.10\.10\.10
    Action: Reject
    I then proceeded with placing a SIP call from an unregistered C20, calling the URI '[email protected]' while running a diagnostics log on my VCS with Network log level set to 'DEBUG', and captured the following in that log:
    Incoming INVITE:
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,598" Module="network.sip" Level="INFO":  Src-ip="10.x.x.x"  Src-port="5060"   Detail="Receive Request Method=INVITE, Request-URI=sip:[email protected], Call-ID=9dd19ad75b1063ecf716461b149e9e2a"
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,598" Module="network.sip" Level="DEBUG":  Src-ip="10.x.x.x"  Src-port="5060"
    SIPMSG:
    |INVITE sip:[email protected] SIP/2.0
    Call processing logic, showing CPL matching:
    2013-02-22T16:03:36+01:00 vcs02 tvcs: Event="Search Attempted" Service="SIP" Src-alias-type="SIP" Src-alias="10.x.x.x" Dst-alias-type="SIP" Dst-alias="sip:[email protected]" Call-serial-number="0886391c-7d01-11e2-adf5-0050569a08fd" Tag="08863aac-7d01-11e2-bd2e-0050569a08fd" Detail="searchtype:INVITE" Level="1" UTCTime="2013-02-22 15:03:36,601"
    2013-02-22T16:03:36+01:00 vcs02 tvcs: Event="Call Attempted" Service="SIP" Src-ip="10.x.x.x" Src-port="5060" Src-alias-type="SIP" Src-alias="sip:10.x.x.x" Dst-alias-type="SIP" Dst-alias="sip:[email protected]" Call-serial-number="0886391c-7d01-11e2-adf5-0050569a08fd" Tag="08863aac-7d01-11e2-bd2e-0050569a08fd" Protocol="UDP" Auth="NO" Level="1" UTCTime="2013-02-22 15:03:36,601"
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,602" Module="network.cpl" Level="DEBUG":  Remote-ip="10.x.x.x"  Remote-port="5060"   Detail="CPL: "
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,602" Module="network.cpl" Level="DEBUG":  Remote-ip="10.x.x.x"  Remote-port="5060"   Detail="CPL:   "
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,602" Module="network.cpl" Level="DEBUG":  Remote-ip="10.x.x.x"  Remote-port="5060"   Detail="CPL: matched "
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,602" Module="network.cpl" Level="DEBUG":  Remote-ip="10.x.x.x"  Remote-port="5060"   Detail="CPL: "
    VCS responds to INVITE with 403 Forbidden:
    2013-02-22T16:03:36+01:00 vcs02 tvcs: UTCTime="2013-02-22 15:03:36,616" Module="network.sip" Level="DEBUG":  Dst-ip="10.x.x.x"  Dst-port="5060"
    SIPMSG:
    |SIP/2.0 403 Forbidden
    As you can see, on my VCS everything seems to work as expected. I'd recommend you capture a similar diagnostics log on your own VCS to check what is different in your test call compared to the output above.

  • Match beginning of line with Regular Expression

    I'm confused about dreamweaver's treatment of the characters
    ^ and $ (beginning of line, end of line) in regex searches. It
    seems that these characters match the beginning of the file, not
    the beginning of the various lines in the file. I would expect it
    to work the other way around. A search like:
    (^.)
    should match every line in the file, so that a find/replace
    could be performed at the beginning of each line, like this:
    HELLO$1
    which would add 'HELLO' at the start of each line in the
    file.
    Instead, this action only matches the first character of the
    file, sticks 'HELLO' in front of it, and then quits (or moves on to
    the next file). The endline character $ behaves in a similar
    fashion, matching only the end of the file, not the end of each
    line.
    I've searched, and all the literature about regular
    expressions in dreamweaver seems to indicate that I'm expecting the
    correct behavior:
    www.adobe.com/devnet/dreamweaver/articles/regular_expressions_03.html
    quote:
    ^ Beginning of input or line ^T matches "T" in "This good
    earth" but not in "Uncle Tom's Cabin"
    $ End of input or line h$ matches "h" in "teach" but not in
    "teacher"
    Thanks for any insight, folks.

    Hi Winston,
    I am still digesting the material from the regular expression book and will take sometime to become proficient with it.
    It seems that using groupCount() to eliminate the unwanted text does not work in this case, since all the lines returned the same value. Ie 3 posted earlier. This may be because the patterns are complex and only a few were grouped together. Otherwise, could you provide an example using the string posted as opposed to a hyperthetic one. In the meantime, at least one solution have been found by defining an additional special pattern “\\A[^%].*\\Z”, before combining / intersecting both existing and the new special pattern to get the best of both world. Another approach that should also work is to evaluate the size of String.split() and only accept those lines with a minimum number of tokens.
    Anyhow, I have come a crossed another minor stumbling block in the mean time with the following line, where some hidden characters is preventing the existing pattern from reading it:
    o;?Mervan Bay 40 Boyde St 7 br t $250,000 X West Park AE
    Below is the existing regular expression that works for other lines with the same pattern but not for special hidden characters such as “o;?”:
    \\A([A-Z][a-z]*){1,2} [0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La [0-9] br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}\\ZIs it possible to come up with a regular expression to ignore them so that this line could be picked up? Would also like to know whether I could combine both the special pattern “\\A[^%].*\\Z” with existing one as opposed to using 2 separate patterns altogether?
    Many thanks,
    Jack

  • AutoVue 20.2.1 - ABV - Regular Expressions

    All,
    We have implemented the ABV/HotSpots feature and there is a regular expression we have defined that should work but isn't.
    Regular Expression:
    ^[0-9]{5}$
    Text:
    00345
    Using http://www.regular-expressions.info/javascriptexample.html to test it works seems is fine. So a00345, 00345a, 0a0345, test 00345 ALL fail which is what i want.
    But when i setup the hotspot definition with the above it doesn't match the cases in the PDF document.
    If i remove the ^ and $ then it does match but it matches cases i do not want (example: DVA-AB-12345)
    I want 5 number and that is it, nothing before or after.
    I am confused as to why the regular expression tester is saying it is fine and then AutoVue HotSpots fails it.
    I have thought about case sensitive and and either way true or false we are using numbers so it shouldn't affect things.
    Can anyone help?
    Nick

    Hi Nick,
    Currently, there is no way for you to specify which characters to highlight within the regular expression.
    Theoretically, it could be done through regex groups: (^|\s)(\d{4})(\s|$), highlight group # 2.
    We would need to enhance our API to accept the group index to highlight.
    Please log an enhancement request for this feature. I'm sure it will be very useful!
    Thanks,
    George

  • Format string using Regular Expression

    Input string output format...
    SELECT q'<select ab_c "ABC", efg "EFG" from dual>' str FROM DUAL
    Output:
    STR                                 
    select ab_c "ABC", efg "EFG" from dual
    Required output format using regular expression...
    STR                                 
    select 'ab_c' "ABC", 'efg' "EFG" from dual

    Regular expressions have many limitations as parsing tools, and you didn't specify the rules you wanted. This expression puts quotes around the non blank string before a quoted string:
    SELECT regexp_replace(q'<select ab_c "ABC", efg "EFG" from dual>',
                          '([^" ]+)( +"[^ ]*")' , '''\1''\2' ) str FROM DUAL;
    STR
    select 'ab_c' "ABC", 'efg' "EFG" from dual
    {code}
    It is not robust - a missing " will confuse it, and you should be using bind variables anyway.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

  • Regular expression to match incremental or consecutive digits

    I need to process a string containing all digits to ensure that it does not contain either
    (a) a group of 5 or more consecutive identical digits eg. 11111
    (b) a group of 5 or more incremental/decremental digits eg 12345, 98765
    Also, is there a processing difference between the reluctant and greedy qualifiers?
    Case (a) seems to be easy:
    Pattern pattern = Pattern.compile("[0-9]{5,}?");but what about (b) ? From the documentation it seems that using capturing groups might be helpful, but they are confusing me.
    Finally how do I merge multiple pattern matching strings into one overall regular expression so I can make one pass on the input to see whether it is valid or not?
    Thanks
    Chris

    give this code a try
    public class Test {
         static boolean check(String str) {
              loop : for (int x = 0, y; x < str.length() - 4; x += y) {
                   y = 1;
                     int dif = (str.charAt(x+y) - str.charAt(x)); //assuming you don't whant 90123
                     if (dif >= -1 && dif <= 1) {
                        for (; y < 4; y++) {
                             if ((str.charAt(x+y+1) - str.charAt(x+y)) != dif) {
                                  continue loop;
                        return true;
              return false;
         public static void main(String[] args) {
              System.out.println(check(args[0]));
    }

  • Regular expression to covert pascal case to underscores

    I am looking for a way to use the SQL regular expression support to convert some pascal text into underscore separated words.
    For example:
    IsComplianceActionPossible -> Is_Compliance_Action_Possible.
    What I am confused on is how to get the capital letter back in the output.
    select regexp_replace('IsComplianceActionPossible','[A-Z]','_') from dual
    REGEXP_REPLACE('ISCOMPLIAN
    _s_ompliance_ction_ossibleI am know I am missing something simple. Any help is appreciated.
    Regards, Tony

    ebrian wrote:
    Another option:
    SQL> select regexp_replace('IsComplianceActionPossible','(.)([A-Z])','\1_\2') from dual;
    REGEXP_REPLACE('ISCOMPLIANCEA
    Is_Compliance_Action_Possible
    Most likely it would fit OP's needs, howvever it will not work on one letter words in the middle:
    SQL> select regexp_replace('HereIAm','(.)([A-Z])','\1_\2') from dual
      2  /
    REGEXP_R
    Here_IAm
    SQL> select ltrim(regexp_replace('HereIAm','([A-Z])','_\1'),'_') from dual;
    LTRIM(REG
    Here_I_Am
    SQL> SY.

Maybe you are looking for