Parsing with regular expressions

Hi
I'm developing an application that is trying to parse a text in a text file. It's looking for strings like
" 0551 TIMERHIN, S.L. "
Then I think I could lok with matches method for a string with many white chars followed by 4 numbers...
I did
line.matches("^\\s+[0-9]{4}")
but it didn't work...
Any help will be appreciated.
<jl>

The String.matches tries to match the regular express against the whole string.
You regex pattern "^\\s+[0-9]{4}" will match
"                          0551"but not
"                          0551 TIMERHIN, S.L. "You can do it using this code:
import java.sql.*;
import java.io.*;
import java.util.regex.*;
import java.util.*;
public class RegExTest {
public static void main (String args[]) {
   Pattern pat = null;
   Matcher m = null;
   String patternToMatch = "^\\s+[0-9]{4}";
   pat = Pattern.compile(patternToMatch);
   System.out.println("Pattern to match = " + patternToMatch);
   String line = "                          0551 TIMERHIN, S.L. ";
   m = pat.matcher(line);
   if (m.find()) { System.out.println("Line matched"); }
} // end main
} //End Class Test
  

Similar Messages

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Assistance with Regular Expression and Tcl

    Assistance with Regular Expression and Tcl
    Hello Everyone,
      I recently began learning Tcl to develop scripts for automating network switch deployments. 
    In my script, I want to name the device with a location and the last three octets of the base mac address.
    I can get the Base MAC address by : 
    show version | include Base
     Base ethernet MAC Address       : 00:00:00:DB:CE:00
    And I can get the last three octets of the MAC address using the following regular expression. 
    ([0-9a-f]{2}[:-]){2}([0-9a-f]{2}$)
    But I have not been able to figure out how to call the regular expression in the tcl script.
    I have checked several resources but have not been able to figure it out.  Suggestions?
    Ultimately, I want to set the last three octets to a variable (something like below) and then call the variable when I name the switch.
    set mac [exec "sh version | i Base"] (include the regular expression)
    ios_config "hostname location$mac"
    Thanks for any assistance in advance.
    Chris

    This worked for me.
    Switch_1(tcl)#set result [exec show ver | inc Base]   
    Base ethernet MAC Address       : 00:1B:D4:F8:B1:80
    Switch_1(tcl)#regexp {([0-9A-F:]{8}\r)} $result -> mac
    1
    Switch_1(tcl)#puts $mac                               
    F8:B1:80
    Switch_1(tcl)#ios_config "hostname location$mac"      
    %Warning! Hostname should contain at least one alphabet or '-' or '_' character
    locationF8:B1:80(tcl)#

  • How to search with regular expression

    I make pdx files so that I can search text quickly. But Acrobat doesn't provide a way to search with regular expression. I'm wondering if there is a way that I don't know to search for regular expression in Acrobat Pro 9?

    First, Acrobat must "mount" the PDX.
    As "Find" does not use the cataloged index, use Shift+Ctrl+F to open the advanced search dialog.
    It may be helpful to first enter Acrobat Preferences and for the Search category tick "Always use advanced search options".
    Back to the Search dialog - use the drop down menu for "Look In" to pick "Select Index" then, if no PDXs show, click the Add button.
    In the Open Index File dialog, browse to the location of the desired PDX and select it.
    OK out and use "Return results containing" to pick a "Match ..." requirement or Boolean.
    To become familiar with query syntax, for Acrobat, it is good to review Acrobat Help.
    http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS58a04a822e3e50102bd615109794195ff-7 c4b.w.html
    Be well...

  • Problem with Regular Expression

    Hi There!!
    I have a problem with regular expression. I want to validate that one word and second word are same. For that I have written a regex
    Pattern p=Pattern.compile("([a-z][a-zA-Z]*)\\s\1");
    Matcher m=p.matcher("nikhil nikhil");
    boolean t=m.matches();
    if (t)
              System.out.println("There is a match");
         else
              System.out.println("There is no match");
    The result I am getting is always "There is no match
    Your timely help will be much appreciated.
    Regards

    Ram wrote:
    ErasP wrote:
    You are missing a backward slash in the regex
    Pattern p = Pattern.compile("([a-z][a-zA-Z]*)\\s\\1");
    But this will fail in this case.
    Matcher m = p.matcher("Nikhil Nikhil");It is the reason for that *[a-z]*.The OP had [a-z][a-zA-Z]* in his code, so presumably he know what that means and wants that String not to match.

  • Get the string between li tags, with regular expression

    I have a unordered list, and I want to store all the strings between the li tags (<li>.?</li>)in an array:
    <ul>
    <li>This is String One</li>
    <li>This is String Two</li>
    <li>This is String Three</li>
    </ul>
    This is what have so far:
    <li>(.*?)</li>
    but it is not correct, I only want the string without the li tags.
    Thanks.

    No one?
    Anoyone here experienced with Regular Expression?

  • DOM Parser fails with regular expression using anchor (carat, dollar)

    I'm using version "Oracle XDK Java 9.0.4.0.0 Production"
    In trying to parse XML against schema: a regular expression fails to parse the data "8:00" with the following simple regular expression: "^.*$" (used to narrow the error)
    The error message is
    <Line 14, Column 25>: XSD-2025: (Error) Invalid text '8:00' in element: 'XYZ'
    If I remove the anchors and just have ".*", the data is parsed successfully.
    I dont understand why the parse fails when I use a anchors in the regular expression, and the java Pattern/Matcher classes succeed with the anchors?

    That "ns670" string is an xml namespace prefix. it should have a corresponding xml namespace declaration somewhere in the xml document (i'm guessing you have not shown the whole document). the actual value of an xml namespace prefix is meaningless. if you parse the xml with a namespace aware DOM parser, it should generate Nodes with the correct namespace. the namespace is the value you care about when extracting data from the document, not the namespace prefix.
    alternately, if you parse the document using a namespace aware DOM parser, you can just look for nodes based on their "local" name (the part after the ":" separator) and ignore the namespace/prefix.
    whatever you do, please do not parse the xml with a regex, see this http://stackoverflow.com/a/1732454/552759 for details (applies to xml as well).

  • Need help with regular expression

    I'm trying to use the java.util.regex package to extract URLs from html files.
    The URLs that I am interested in extracting from the HTML look like the following:
    <font color="#008000">http://forum.java.sun.com -
    So, the URL is always preceeded by:
    <font color="#008000">
    and then followed by a space character and then a hyphen character. I want to be able to put all these URLs in a Vector object. This doesn't seem like it should be too difficult but for some reason I can't get anywhere with it. Any help would be greatly appreciated. Thanks!

    hi gupta am not sure of the java syntax but i can tell u about the regular expression...try this....
    <font color="#008000">(http:\/\/[a-zA-Z0-9.]+) [-]
    i dont know the java methods to call...just the reg exp...
    Sanjay Acharya

  • Help with Regular Expression for field validation

    I'm fairly new to using regular expressions and using Acrobat. This is probably a simple question, but I've been unable to figure it out.
    I have a text field on a PDF that I would like to be 9 characters in length. The first 2 characters can only be alphanumeric, the last 7 characters can only be numeric.
    At first I was using the following, which allows all the characters to be alphanumeric:
    var re = /^[A-Za-z0-9 :\\_]$/;
    if (event.change.length >0) {
    if (event.willCommit == false) {
        if (!re.test(event.change)) {
            event.rc = false
    That works fine, but it's not quite what I needed. With some assistance I changed it (see below) to fit what I was looking for. However, this didn't work; it prevents anything from being entered in the field:
    var re = /^[A-Za-z0-9]{2}\d{7}$/;
    if (event.change.length >0) {
    if (event.willCommit == false) {
        if (!re.test(event.change)) {
            event.rc = false
    Any help would be greatly appreciated.
    Thanks...

    Here's a function you can call form the field's custom Format script. It should be placed in a document-level JavaScript:
    function custom_ks1() {
        // Define non-commited regular expression
        var re = /^[A-Za-z0-9]{0,2}([0-9]{0,7})?$/;
        // Get all of the characters the user has entered
        var value = AFMergeChange(event);
        // Allow field to be cleared
        if(!value) return;
        if (event.willCommit) {
            // Define commited regular expression
            var re = /^[A-Za-z0-9]{2}[0-9]{7}$/;
            if (!re.test(value)) {  // If final value doesn't match, alert user
                app.alert("Your error message goes here.");
                // event.rc = false
        } else {  // not commited
            // Only allow characters that match the regular expression
            event.rc = re.test(value);
    Call it like this:
    // Custom Keystroke script
    custom1_ks();

  • Help with regular expression needed

    Hi,
    Perhaps someone here can help me with my regular expression I'm trying to build in my Java code.
    The regular expression that I'm looking to build consists of any non-whitespace character up until it finds one or two <>= symbols and then any character thereafter. So both these Strings would match the expression:
    City 1==London
    Age>=18
    The regular expression that I'm using is as follows:
    (\\S+)([><=]){1,2}(.+)However, group 1 always retrieves the first <>= symbol as in "City 1=". How can I make the <>= part greedy so that it retrieves both operator symbols?
    Thanks.

    Make the first group, the non-spaces, reluctant:
    "(\\S+?)([<>=]{1,2})(.+)"

  • Help with regular expression to find a pattern in clob

    can someone help me writing a regular expression to query a clob that containts xml type data?
    query to find multiple occurrences of a variable string (i.e <EMPID-XX> - XX can be any number). If <EMPID-01> appears twice in the clob i want the result as EMPID-01,2 and if EMPID-02 appears 4 times i want the result as EMPID-02,4.

    with
    ofx_clob as
    (select q'~
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123457
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123458
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    ~' ofx from dual
    select '<EMPID>' || to_char(ids) || '(' || to_char(count(*)) || ')' multi_empid
      from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
              from ofx_clob
            connect by level <= regexp_count(ofx,'<EMPID>')
    group by ids having count(*) > 1
    MULTI_EMPID
    <EMPID>1(2)
    with
    ofx_clob as
    (select q'~
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123457
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123458
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    ~' ofx from dual
    select '<EMPID>' || listagg(to_char(ids) || '(' || to_char(count(*)) || ')',',') within group (order by ids) multi_empid
      from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
              from ofx_clob
            connect by level <= regexp_count(ofx,'<EMPID>')
    group by ids having count(*) > 1
    MULTI_EMPID
    <EMPID>1(3),2(2)
    Regards
    Etbin
    Message was edited by: Etbin
    used listagg to report more than one multiple <EMPID>

  • Help With Regular Expression In Apex Validation

    Apex 3.2
    There is a validation type of regular expression in apex, but I have never used regular expression before,
    so a little help is appreciated.
    I need to validate a field. It is only allowed to contain alpha characers, numbers, spaces and the - (dash) character.
    I have tried several times to get this working
    eg
    [[:alpha:]]*[[:digit:]]*[[:space:]]*[-]*
    ^[[:alpha:][:digit:][:space:]-]+?
    and others, but just can't to get the syntax correct.
    Can someone help me with this please
    Gus

    Example:
    SQL> ed
    Wrote file afiedt.buf
      1  with t as (select 'This is some example text' as txt from dual union all
      2             select 'And this is the 2nd one with numbers' from dual union all
      3             select 'And this allows double-barrelled words with hyphens' from dual union all
      4             select 'But this one shouldn''t be allowed!' from dual
      5            )
      6  --
      7  select *
      8  from t
      9* where regexp_like(txt, '^[[:alnum:] -]*$')
    SQL> /
    TXT
    This is some example text
    And this is the 2nd one with numbers
    And this allows double-barrelled words with hyphens

  • Help with Regular Expressions and regexp_replace

    Oh great Oracle Guru can I can gets some help
    I need to clean up the phone numbers that have been entered in Oracle eBusiness per_phones table. Some of the phone numbers have dashes, some have spaces and some have char. I would just like to take all the digits out and then re-format the number.
    Ex.
    914-123-1234 .. output (914) 123-1234
    9141231234 ..again (914) 123-1234
    914 123 1234 .. (914) 123-1234
    myphone ... just null
    (914)-123-1234.. (914) 123-1234
    I really tried to understand the regular expressions statments, but for some reason I just can't understand it.

    Hi,
    Welcome to the forum!
    I would create a user-defined function for this. I expect there will be a lot of exceptions to the regular rules (for example, strings that do not contain exactly 10 digits, such as '1-800-987-6543') that can be handled, but would require lots of nested fucntions and othwer complicted code if you had to do it in a single statement.
    If you really want to do it with a regular expression:
    SELECT     phone_txt
    ,     REGEXP_REPLACE ( phone_txt
                     , '^\D*'          || -- 0 or more non-digits at the beginning of the string
                           '(\d\d\d)'     || -- \1 = 3 consecutive digits
                    '\D*'          || -- 0 or more non-digits
                           '(\d\d\d)'     || -- \2 = 3 consecutive digits
                    '\D*'          || -- 0 or more non-digits
                           '(\d\d\d)'     || -- \3 = 4 consecutive digits
                    '\D*$'             -- 0 or more non-digits at the end of the string
                     , '(\1) \2-\3'
                     )          AS new_phone_txt
    FROM    table_x
    ;

  • Help with Regular Expressions

    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
        protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
        protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
        protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
        public String fixPrefix (String w)
            return w.replaceFirst(PUNC_PREFIX, "");
        public String fixSuffix (String w)
            return w.replaceFirst(PUNC_SUFFIX, "");
        }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. Doesn't the ^ mean in the front, and doesn't the $ mean in the back, so shouldn't this work by just changing replaceFirst to replace?

    JFactor2004 wrote:
    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
    protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
    protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
    protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
    public String fixPrefix (String w)
    return w.replaceFirst(PUNC_PREFIX, "");
    public String fixSuffix (String w)
    return w.replaceFirst(PUNC_SUFFIX, "");
    }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. The first parameter of the replaceFirst(...) is a regex-String. And the ^, $ and [...] stuff all have a special meaning in the regex language. The replace(...) method takes plain-Strings as a parameter, so the String "^\[\\d\\p{Punct}\]+" is interpreted as just that, without any special meaning.
    JFactor2004 wrote:
    Doesn't the ^ mean in the front, and doesn't the $ mean in the back, Yes, ^ means the start of the String (sometimes the start of a line) and $ means the end of the String (sometimes the end of a line).
    JFactor2004 wrote:
    so shouldn't this work by just changing replaceFirst to replace?Nope, see the explanation above.

  • Searching in reverse with regular expressions

    I have recently constructed a "Find" dialog for my editor and would like to support the use of regular expressions (for people who know more about them than I do.) I have radio buttons that allow the user to search down (from the current caret position to the end of the document) or up (from the current caret position to the beginning of the document.)
    I have it working fine with literal text via String.indexOf() and String.lastIndexOf(), but I'm not sure how to implement an "upwards" search using regular expressions. It seems that java.util.regex only provides for searching from the current caret position downwards. Any suggestions?

    two not-very-good-for-big-document suggestions
    1) search up by searching down (from the beginning) to the last match before the position searched from
    2) when the user asks for a search find all matches, number them and keep the index in the doc where they were found, then look up the largest index less than the carets index
    you might try constructing succesively larger strings that end at the current caret position, and matching them one by one. If you're near the bottom, and theres no matches, this is also likely to be sloow..
    asjf

Maybe you are looking for

  • How to sell a file using adobe muse?

    Hi, I am trying to create an Ebook store using Adobe Muse. The PDF files can each be uploaded from my HDD but I want to sell them. I want to use Paypal and I know how to initialise and place it from Widget bar. But what I can not figure out is how to

  • Question about protection mode in Stdby env.

    Hi, In a standby environment, can we have primary database working in two protection mode simultaneously. e.g. There are 2 standby databases for a primary database. One is situated in another city (Stdby_A) connected via a slow link and another is in

  • Placing/Embeding an eps/Ai file in flex panel

    Hi All, I want to display the preview of ai/eps file in flex panel Does there exiost a way using which I can do this? I have tried with Image control but it can embed only jpg/gif/png/swf format Please help me Thanks in advance Alam

  • Help --- Error installing 9i

    I have got an error message while installing Oracle 9i (9.0.1) over windows2000(server family). The message was "Unsuccessful install due to failure of java.NULLException " ...... Please help me soon Thanx in advance Farooq

  • WRT54G with MAC

    Hi, I have a MacBook. I bought a linksys WRT54G router. I plugged the new router and my MacBook was working fine (wireless) ...before I ran the setup wizard. Since it said on the papers that I needed to install the software first, I did so. But since