Need help starting with regular expressions

Hi,
looked through the tutorials, and some stuff online and still having trouble.
I've done a fair amount of modifying xml files with perl, and want to do the same thing with java.
I'm writing this at work, and we don't have the 1.5 jdk installed, so I can't use the Pattern and Match objects.
Here's some code I wrote:
package pack1;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
public class PlantTest {
      * @param args
      * @throws Exception
     public static void main(String[] args) throws Exception {
          // get file to read from
          java.io.File infile = new java.io.File("plant.xml");
          FileReader infileReader = new FileReader(infile);
          BufferedReader bInfile = new BufferedReader(infileReader);
          //get basic file info
          System.out.println("does the file exist? " + infile.exists());
          System.out.println("file is located at: " + infile.getAbsolutePath());
          System.out.println("file was last modified: " + new java.util.Date(infile.lastModified()));
          // create a pattern
          String match = "<COMMON>";
          String line = bInfile.readLine();
          while (line != null){
               //System.out.println(bInfile.readLine());
               if (line.matches(match)){
                    System.out.println("match made: " + line);
          bInfile.close();
          infileReader.close();
}I want the "match" object to be the equivalent of this in perl:
(I realize I don't have the output set up in the code above, I was going to add it later, for now I just want to make the match and print something on the console)
     m/<COMMON>([^<]+)</COMMON>/i;
     $common = $1;
     print HTML "<p>Common Name: $common\n";here's a snippet of xml:
     <PLANT>
          <COMMON>Bloodroot</COMMON>
          <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
          <ZONE>4</ZONE>
          <LIGHT>Mostly Shady</LIGHT>
          <PRICE>$2.44</PRICE>
          <AVAILABILITY>031599</AVAILABILITY>
     </PLANT>(I cribbed that xml from the w3c site)
thanks in advance,
bp
Message was edited by:
badperson

ouch. 1.3.1...
That's here at work, I'm kind of nervous about
downloading another jdk, will there be a conflict?Google for Jakarta ORO regex. It is about the same speed as Java regex and works well with JDK1.3.1 .

Similar Messages

  • Need help in unix regular expressions

    Hi All,
    I'm new to shell scripting. Please help me in achieving this
    I am trying to a find regular expression that need to pick a file with begin with the below format and mask variable is called in xml file.
    currently the script accepts:
    mask="CLIENT_ID+'_ADHSUITE_IN_'+date2str(now,'MMddyy','US/Eastern')+'.txt'"
    But it should accept in the below format
    2595_ADHSUITE_IN_ANNWEL_030309_2009-02-10_15-12-46-000_648.TXT715.outpgp_out
    where CLIENT_ID=2595. How to place wild card character '*' in the below to accept file in the above format. here is what i made changes.
    mask="CLIENT_ID+'_ADHSUITE_IN_'*+date2str(now,'MMddyy','US/Eastern')*+'.TXT'*+'.outpgp_out'"
    Please help.
    Thanks

    I believe your statement is being passed over twice:
    First Pass: (This is done by something like javascript)
    CLIENT_ID+'_ADHSUITE_IN_'+'.*'+date2str(now,'MMddyy','US/Eastern')+'.*'+'.TXT'+'.*'+'.outpgp_out'In this pass the variables and functions that are enclosed in literals are processed:
    (1) CLIENT_ID is replaced by 2595 or whatever is current value is:
    (2) date2str(now,'MMddyy','US/Eastern') gets replaced by 040609 (if the current time now is 4th april 2009).
    So at the end of this first pass we have a string:
    2595_ADHSUITE_IN_.\*040609.\*.TXT.*.outpgp_outThis string at the end of the first pass is a Posix basic regular expression. (ref: [http://en.wikipedia.org/wiki/Regular_expression] ) accessed at time of post).
    This is the string I put in the Regular Expression text box on [http://www.fileformat.info/tool/regex.htm]
    and it matches "2595_ADHSUITE_IN_ANNWEL_040609_2009-01-27_17-02-28-000_631.TXT715.outpgp_out" for me (though I prefer my egrep test).
    I hope this is somewhat clearer. Remember I have very little information about your system/application and I make big guesses.
    NB: (I should thank Frits earlier for pointing my sloppiness between wildcards (for eg unix shell filename expansion) and regular expressions).
    For the second pass this used to compared other strings to see

  • Need help in writing regular expressions involving \w

    Hi,
    Here is my requirement .
    I have a string : GTA - 12AB TRA - 12AB
    I need a regex that represent above string.
    GTA - Constant - This wont change
    12AB - This will be \w (alphanumeric)
    Here I cannot have TRA within this 4 characters.
    The question is :
    How can I write an expression which says it can be a word(the positions where I have 12AB in example) but not TRA in sequence.
    Is this doable?
    Thanks in advance.

    Use lookarounds: [http://www.regular-expressions.info/lookaround.html]
    The regex:
    (?!.?TRA).{4}matches any 4 characters (except line breaks) that does not contain 'TRA'.

  • Need to remove Commas REgular Expressions?

    How can I use java to remove commas from a number.
    1,000 string
    need it to be
    1000
    can I pass it through some sort of regular expression?

    I was attempting to do it with regular expressions to learn how to do them better.
    Thanks for your good comment.
    nupevic

  • Finding td /td tags with regular expressions

    hi
    first i thought that would be very easy and i used this expression:
    String pattern = "<td.*>.*</td>";
    but this didn't worke the way i wont. this expression returns hole body which is in between the first <td> and the last </td> tag t
    hat means:
    <td id=1>(some other tags and text)</td><td id=2>(some other tags and text)</td>
    it retruns the whole string and no the two <td id=1>,</td> sections. now i have searched for a wokearound but i didn't finde an useful exaple
    can anybody help me?
    blub4ever

    It will work as long as there are no linefeeds in the start tag or the content, and there are no TD elements inside other TD elements. The first concern can be adressed by setting the DOTALL flag: String pattern = "(?s)<td.*?>.*?</td>";But if the tags can be nested, you have a problem: regexes don't handle nesting very well. By the way, you might find this site helpful:
    http://www.regular-expressions.info/

  • How to search with regular expression

    I make pdx files so that I can search text quickly. But Acrobat doesn't provide a way to search with regular expression. I'm wondering if there is a way that I don't know to search for regular expression in Acrobat Pro 9?

    First, Acrobat must "mount" the PDX.
    As "Find" does not use the cataloged index, use Shift+Ctrl+F to open the advanced search dialog.
    It may be helpful to first enter Acrobat Preferences and for the Search category tick "Always use advanced search options".
    Back to the Search dialog - use the drop down menu for "Look In" to pick "Select Index" then, if no PDXs show, click the Add button.
    In the Open Index File dialog, browse to the location of the desired PDX and select it.
    OK out and use "Return results containing" to pick a "Match ..." requirement or Boolean.
    To become familiar with query syntax, for Acrobat, it is good to review Acrobat Help.
    http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS58a04a822e3e50102bd615109794195ff-7 c4b.w.html
    Be well...

  • Problem with Regular Expression

    Hi There!!
    I have a problem with regular expression. I want to validate that one word and second word are same. For that I have written a regex
    Pattern p=Pattern.compile("([a-z][a-zA-Z]*)\\s\1");
    Matcher m=p.matcher("nikhil nikhil");
    boolean t=m.matches();
    if (t)
              System.out.println("There is a match");
         else
              System.out.println("There is no match");
    The result I am getting is always "There is no match
    Your timely help will be much appreciated.
    Regards

    Ram wrote:
    ErasP wrote:
    You are missing a backward slash in the regex
    Pattern p = Pattern.compile("([a-z][a-zA-Z]*)\\s\\1");
    But this will fail in this case.
    Matcher m = p.matcher("Nikhil Nikhil");It is the reason for that *[a-z]*.The OP had [a-z][a-zA-Z]* in his code, so presumably he know what that means and wants that String not to match.

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Assistance with Regular Expression and Tcl

    Assistance with Regular Expression and Tcl
    Hello Everyone,
      I recently began learning Tcl to develop scripts for automating network switch deployments. 
    In my script, I want to name the device with a location and the last three octets of the base mac address.
    I can get the Base MAC address by : 
    show version | include Base
     Base ethernet MAC Address       : 00:00:00:DB:CE:00
    And I can get the last three octets of the MAC address using the following regular expression. 
    ([0-9a-f]{2}[:-]){2}([0-9a-f]{2}$)
    But I have not been able to figure out how to call the regular expression in the tcl script.
    I have checked several resources but have not been able to figure it out.  Suggestions?
    Ultimately, I want to set the last three octets to a variable (something like below) and then call the variable when I name the switch.
    set mac [exec "sh version | i Base"] (include the regular expression)
    ios_config "hostname location$mac"
    Thanks for any assistance in advance.
    Chris

    This worked for me.
    Switch_1(tcl)#set result [exec show ver | inc Base]   
    Base ethernet MAC Address       : 00:1B:D4:F8:B1:80
    Switch_1(tcl)#regexp {([0-9A-F:]{8}\r)} $result -> mac
    1
    Switch_1(tcl)#puts $mac                               
    F8:B1:80
    Switch_1(tcl)#ios_config "hostname location$mac"      
    %Warning! Hostname should contain at least one alphabet or '-' or '_' character
    locationF8:B1:80(tcl)#

  • Get the string between li tags, with regular expression

    I have a unordered list, and I want to store all the strings between the li tags (<li>.?</li>)in an array:
    <ul>
    <li>This is String One</li>
    <li>This is String Two</li>
    <li>This is String Three</li>
    </ul>
    This is what have so far:
    <li>(.*?)</li>
    but it is not correct, I only want the string without the li tags.
    Thanks.

    No one?
    Anoyone here experienced with Regular Expression?

  • Need help with regular expression

    I'm trying to use the java.util.regex package to extract URLs from html files.
    The URLs that I am interested in extracting from the HTML look like the following:
    <font color="#008000">http://forum.java.sun.com -
    So, the URL is always preceeded by:
    <font color="#008000">
    and then followed by a space character and then a hyphen character. I want to be able to put all these URLs in a Vector object. This doesn't seem like it should be too difficult but for some reason I can't get anywhere with it. Any help would be greatly appreciated. Thanks!

    hi gupta am not sure of the java syntax but i can tell u about the regular expression...try this....
    <font color="#008000">(http:\/\/[a-zA-Z0-9.]+) [-]
    i dont know the java methods to call...just the reg exp...
    Sanjay Acharya

  • Help with regular expression needed

    Hi,
    Perhaps someone here can help me with my regular expression I'm trying to build in my Java code.
    The regular expression that I'm looking to build consists of any non-whitespace character up until it finds one or two <>= symbols and then any character thereafter. So both these Strings would match the expression:
    City 1==London
    Age>=18
    The regular expression that I'm using is as follows:
    (\\S+)([><=]){1,2}(.+)However, group 1 always retrieves the first <>= symbol as in "City 1=". How can I make the <>= part greedy so that it retrieves both operator symbols?
    Thanks.

    Make the first group, the non-spaces, reluctant:
    "(\\S+?)([<>=]{1,2})(.+)"

  • Help with Regular Expression for field validation

    I'm fairly new to using regular expressions and using Acrobat. This is probably a simple question, but I've been unable to figure it out.
    I have a text field on a PDF that I would like to be 9 characters in length. The first 2 characters can only be alphanumeric, the last 7 characters can only be numeric.
    At first I was using the following, which allows all the characters to be alphanumeric:
    var re = /^[A-Za-z0-9 :\\_]$/;
    if (event.change.length >0) {
    if (event.willCommit == false) {
        if (!re.test(event.change)) {
            event.rc = false
    That works fine, but it's not quite what I needed. With some assistance I changed it (see below) to fit what I was looking for. However, this didn't work; it prevents anything from being entered in the field:
    var re = /^[A-Za-z0-9]{2}\d{7}$/;
    if (event.change.length >0) {
    if (event.willCommit == false) {
        if (!re.test(event.change)) {
            event.rc = false
    Any help would be greatly appreciated.
    Thanks...

    Here's a function you can call form the field's custom Format script. It should be placed in a document-level JavaScript:
    function custom_ks1() {
        // Define non-commited regular expression
        var re = /^[A-Za-z0-9]{0,2}([0-9]{0,7})?$/;
        // Get all of the characters the user has entered
        var value = AFMergeChange(event);
        // Allow field to be cleared
        if(!value) return;
        if (event.willCommit) {
            // Define commited regular expression
            var re = /^[A-Za-z0-9]{2}[0-9]{7}$/;
            if (!re.test(value)) {  // If final value doesn't match, alert user
                app.alert("Your error message goes here.");
                // event.rc = false
        } else {  // not commited
            // Only allow characters that match the regular expression
            event.rc = re.test(value);
    Call it like this:
    // Custom Keystroke script
    custom1_ks();

  • Help With Regular Expression In Apex Validation

    Apex 3.2
    There is a validation type of regular expression in apex, but I have never used regular expression before,
    so a little help is appreciated.
    I need to validate a field. It is only allowed to contain alpha characers, numbers, spaces and the - (dash) character.
    I have tried several times to get this working
    eg
    [[:alpha:]]*[[:digit:]]*[[:space:]]*[-]*
    ^[[:alpha:][:digit:][:space:]-]+?
    and others, but just can't to get the syntax correct.
    Can someone help me with this please
    Gus

    Example:
    SQL> ed
    Wrote file afiedt.buf
      1  with t as (select 'This is some example text' as txt from dual union all
      2             select 'And this is the 2nd one with numbers' from dual union all
      3             select 'And this allows double-barrelled words with hyphens' from dual union all
      4             select 'But this one shouldn''t be allowed!' from dual
      5            )
      6  --
      7  select *
      8  from t
      9* where regexp_like(txt, '^[[:alnum:] -]*$')
    SQL> /
    TXT
    This is some example text
    And this is the 2nd one with numbers
    And this allows double-barrelled words with hyphens

  • Help with Regular Expressions and regexp_replace

    Oh great Oracle Guru can I can gets some help
    I need to clean up the phone numbers that have been entered in Oracle eBusiness per_phones table. Some of the phone numbers have dashes, some have spaces and some have char. I would just like to take all the digits out and then re-format the number.
    Ex.
    914-123-1234 .. output (914) 123-1234
    9141231234 ..again (914) 123-1234
    914 123 1234 .. (914) 123-1234
    myphone ... just null
    (914)-123-1234.. (914) 123-1234
    I really tried to understand the regular expressions statments, but for some reason I just can't understand it.

    Hi,
    Welcome to the forum!
    I would create a user-defined function for this. I expect there will be a lot of exceptions to the regular rules (for example, strings that do not contain exactly 10 digits, such as '1-800-987-6543') that can be handled, but would require lots of nested fucntions and othwer complicted code if you had to do it in a single statement.
    If you really want to do it with a regular expression:
    SELECT     phone_txt
    ,     REGEXP_REPLACE ( phone_txt
                     , '^\D*'          || -- 0 or more non-digits at the beginning of the string
                           '(\d\d\d)'     || -- \1 = 3 consecutive digits
                    '\D*'          || -- 0 or more non-digits
                           '(\d\d\d)'     || -- \2 = 3 consecutive digits
                    '\D*'          || -- 0 or more non-digits
                           '(\d\d\d)'     || -- \3 = 4 consecutive digits
                    '\D*$'             -- 0 or more non-digits at the end of the string
                     , '(\1) \2-\3'
                     )          AS new_phone_txt
    FROM    table_x
    ;

Maybe you are looking for

  • Validating Profit Center in CJ20N Transaction

    Hi All, My requirement is to lock the Profit Center field(Make it grayed out) in the project using CJ20n transaction once it is created. Also when ever the user is trying to change the profit center by entering the project details, it should give an

  • Issue with Acrobat 10.1.2 and Windows 7 64-bit Preview Handler

    Hi. I work in an enterprise environment that is currently migrating to  64-bit Windows 7. The only version of Acrobat we have approved for Windows 7 is 10.1.2. We're seeing an issue with this version that causes the preview handler to not work. The p

  • Migration and relative path

    Hi people, I would like to migrate a Performance Manager environment from DEV to PROD but I'm facing problems with linked objects. For example, I have an "Interactive Metric" from which I can open a Webi report, using the commad : Detail Analysis||ht

  • Pavilion A620N with no audio

    Hi, I have an old Pavilion that I'm "restoring". There's no audio.  The audio properties page shows the audio slider grayed out.  Device Manager is happy...the thing just won't make a sound. I've removed the Realtek AC'97 driver and reinstalled the o

  • ME_PROCESS_REQ_CUST

    in method PROCESS_ACCOUNT, how to get the PR's item data as in EBAN/mereq_item internal table