Regular Expression Counting

I wanted to make a specific count for regular expression pattern in the form
AA####A as
[a-zA-z](2)\d(4)[a-zA-z]
also tried <[a-zA-z](2)><\d(4)>[a-zA-z]
What am I doing wrong?

Take a look at:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
I believe the expression you were trying to write is this:
"[a-zA-Z]{2}[\\d]{4}[a-zA-Z]"

Similar Messages

  • Regular expression- insert into a string phrase "new" into correct position

    I have sample data in table T below, and i have sample output how i want the query to output over that data.
    with T as
    select 'CREATE OR REPLACE PACKAGE BODY "YYY"."PACKAGEONE" IS' s from dual union all
    select 'Create or REPLACE PACKAGE BODY "ZZZ"."PACKAGETWO" IS' s from dual
    select REGEXP_REPLACE(T.s, '^.PACKAGE BODY$','(\1)_new',1,1,'i') as s from T;
    Expected output:
    CREATE OR REPLACE PACKAGE BODY "YYY"."PACKAGEONE_new" IS
    Create OR REPLACE PACKAGE BODY "ZZZ"."PACKAGETWO_new" IS
    */All data has following pattern:
    CREATE OR REPLACE PACKAGE BODY "[owner]"."[name]" ISWhere [owner] can be any string. In sample data we have for example values XXX and YYY there.
    And [name] can be any string. In sample data we have for example values PACKAGEONE and PACKAGETWO there.
    Other parts of the string is fixed and stays as you see.
    As shown in expected output query should replace substring "[owner]"."[name]" to "[owner]"."[name]_new"
    How can i write such query?
    I think i somehow should in regular expression count the positions of double quotes to achieve the expected result, but i don't know how.

    Thx, but your solution doesn't work, it doesn't put phrase "new" into string. I use Oracle 10g.
    with T as
    select 'CREATE OR REPLACE PACKAGE BODY "YYY"."PACKAGEONE" IS...' s from dual union all
    select '...Create or REPLACE PACKAGE BODY "ZZZ"."PACKAGETWO" ..."a"."b" procedure a IS..' s from dual
    select
    RegExp_Replace(s,'(" IS$)','_new\1') as new
    from t;
    CREATE OR REPLACE PACKAGE BODY "YYY"."PACKAGEONE" IS...
    ...Create or REPLACE PACKAGE BODY "ZZZ"."PACKAGETWO" ..."a"."b" procedure a IS..
    {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

  • Help with regular expression to find a pattern in clob

    can someone help me writing a regular expression to query a clob that containts xml type data?
    query to find multiple occurrences of a variable string (i.e <EMPID-XX> - XX can be any number). If <EMPID-01> appears twice in the clob i want the result as EMPID-01,2 and if EMPID-02 appears 4 times i want the result as EMPID-02,4.

    with
    ofx_clob as
    (select q'~
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123457
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123458
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    ~' ofx from dual
    select '<EMPID>' || to_char(ids) || '(' || to_char(count(*)) || ')' multi_empid
      from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
              from ofx_clob
            connect by level <= regexp_count(ofx,'<EMPID>')
    group by ids having count(*) > 1
    MULTI_EMPID
    <EMPID>1(2)
    with
    ofx_clob as
    (select q'~
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123457
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>2
    < UNQID>123456
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    <EMPID>1
    < UNQID>123458
    < TIMESTAMP>...
    < ADDRINFO>
    < TITLE>^@~*
    < FIRST>ABCD
    < MI>
    < LAST>EFGH
    < ADDR1>ADDR1
    < ADDR2>^@~*
    < CITY>CITY
    ~' ofx from dual
    select '<EMPID>' || listagg(to_char(ids) || '(' || to_char(count(*)) || ')',',') within group (order by ids) multi_empid
      from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
              from ofx_clob
            connect by level <= regexp_count(ofx,'<EMPID>')
    group by ids having count(*) > 1
    MULTI_EMPID
    <EMPID>1(3),2(2)
    Regards
    Etbin
    Message was edited by: Etbin
    used listagg to report more than one multiple <EMPID>

  • Problems with java regular expressions

    Hi everybody,
    Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
    For example, I have this code in java:
    import java.util.regex.*;
    String text = "abc";
              Pattern p = Pattern.compile("(a)b(c)");
              Matcher m = p.matcher(text);
    if (m.matches())
                   int count = m.groupCount();
                   System.out.println("Groups found " + String.valueOf(count) );
                   for (int i = 0; i < count; i++)
                        System.out.println("group " + String.valueOf(i) + " " + m.group(i));
    My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
    Groups found 2
    group 0 abc
    group 1 a
    I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
    I am using Java 1.5.0 on Mac OS X 10.4.
    Thank to all who can help.

    paulcw wrote:
    If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
    Matcher.groupCount():
    Group zero denotes the entire pattern by convention. It is not included in this count.

  • Regular expression help to solve sys_refcursor for a record

    In reference to my thread Question on sys_refcursor with record type , I thought it can be solved differently. That is:
    I have a string like '8:1706,1194,1817~1:1217,1613,1215,1250'
    I need to do some manipulation using regular expressions and acheive some thing like
    select * from <table> where
    c1 in (8,1)
    and c2 in (1706,1194,1817,1217,1613,1215,1250);Is it possible using regular expressions in a single select statement?

    Hi,
    Clearance 6`- 8`` wrote:
    Your understanding is absolutely correct. But unfortunately it did not work Frank.
    SQL> SELECT COUNT (*)
    2    FROM (SELECT sp.*
    3            FROM spml sp, spml_assignment spag
    4           WHERE sp.spml_id = spag.spml_id
    5             AND spag.class_of_svc_id = 8
    6             AND spag.service_type_id IN (1706, 1194, 1817)
    7             AND spag.carrier_id = 4445
    8             AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    9             AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    10             AND spag.unit_id = 5
    11             AND sales_org_id = 1
    12          UNION ALL
    13          SELECT sp.*
    14            FROM spml sp, spml_assignment spag
    15           WHERE sp.spml_id = spag.spml_id
    16             AND spag.class_of_svc_id = 1
    17             AND spag.service_type_id IN (1217, 1613, 1215, 1250)
    18             AND spag.carrier_id = 4445
    19             AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    20             AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    21             AND spag.unit_id = 5
    22             AND sales_org_id = 1);
    COUNT(*)
    88
    SQL> SELECT COUNT (*)
    2    FROM spml sp, spml_assignment spag
    3   WHERE sp.spml_id = spag.spml_id
    4     AND spag.carrier_id = 4445
    5     AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    6     AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    7     AND spag.unit_id = 5
    8     AND sales_org_id = 1
    9     AND REGEXP_LIKE ('8:1706,1194,1817~1:1217,1613,1215,1250',
    10                      '(^|~)' || spag.class_of_svc_id || ':'
    11                     )
    12     AND REGEXP_LIKE ('8:1706,1194,1817~1:1217,1613,1215,1250',
    13                      '(:|,)' || spag.service_type_id || '(,|$)'
    14                     );
    COUNT(*)
    140
    SQL> Edited by: Clearance 6`- 8`` on Aug 11, 2009 8:04 PMJust serving what you ordered!
    Originally, you said you were looking for something that produced the same result as
    where   c1 in (8, 1)
    and      c2 in (1706, 1194, 1817, 1217, 1613, 1215, 1250)that is, any of the c1s could be paired with any of the c2s.
    Now it looks like what you want is
    where     (     c1 = 8
         and     c2 IN (1706, 1194, 1817)
    or     (     c1 = 1
         and     c2 IN (1217, 1613, 1215, 1250)
         )that is, c1=8 and c2=1250 is no good; neither is c1=1 and c2=1706.
    In that case, try
    WHERE     REGEXP_LIKE ( s
                  , '(^|~)' || c1
                         || ':([0-9]+,)*'
                         || c2
                         || '(,|~|$)'
                  )

  • Introduction to regular expressions ... continued.

    After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
    Having fun with regular expressions - Part 2
    Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
    CREATE OR REPLACE PACKAGE regex_utils AS
      -- Regular Expression Utilities
      -- Version 0.1
      TYPE t_outrec IS RECORD(
        data VARCHAR2(255)
      TYPE t_outtab IS TABLE OF t_outrec;
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED;
    END regex_utils;
    CREATE OR REPLACE PACKAGE BODY regex_utils AS
    -- FUNCTION gen_data returns a collection of generated varchar2 elements
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED
      IS
        TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
        v_counter t_counter;
        v_exit    BOOLEAN;
        v_string  VARCHAR2(255);
        v_outrec  t_outrec;
      BEGIN
        FOR max_length IN 1..p_length 
        LOOP
          -- init counter loop
          FOR i IN 1..max_length
          LOOP
            v_counter(i) := 1;
          END LOOP;
          -- start data generation loop
          v_exit := FALSE;
          WHILE NOT v_exit
          LOOP
            -- start generation
            v_string := '';
            FOR i IN 1..max_length
            LOOP
              v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
            END LOOP;
            -- set outgoing record
            v_outrec.data := v_string;
            -- now pipe the result
            PIPE ROW(v_outrec);
            -- increment loop
            <<inc_loop>>
            FOR i IN REVERSE 1..max_length
            LOOP
              v_counter(i) := v_counter(i) + 1;     
              IF v_counter(i) > LENGTH(p_charset) THEN
                 IF i > 1 THEN
                    v_counter(i) := 1;
                 ELSE
                    v_exit := TRUE;  
                 END IF;
              ELSE
                 -- no further processing required
                 EXIT inc_loop;  
              END IF;  
            END LOOP;        
          END LOOP; 
        END LOOP; 
      END gen_data;
    END regex_utils;
    /This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
    SELECT *
      FROM (SELECT data col1
              FROM TABLE(regex_utils.gen_data('+-.0', 5))
           ) t
    WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                                         '^([+-]?[^+-]+|[^+-]+[+-]?)$'
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
    ;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
    Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
      FROM dual
      ;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
    REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
    A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
    Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions', '( )\1*|.', '\1')))
      FROM dual
      ;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
    Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
    Let's compare our solutions than, shall we?
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions.  !',
                                     '([a-z])+|.', '\1', 1, 0, 'i')), 0)
      FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
    Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
    Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
    Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
    WITH t AS (SELECT 'Where is the last space?' col1
                 FROM dual)
    SELECT INSTR(col1, ' ', -1)
      FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
    SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')                       
      FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
    One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
    SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
      FROM dual
      ;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
    You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
    SELECT REGEXP_REPLACE('10~20~30~40~50',
                          '^(.*)~(.*)~(.*)~(.*)~(.*)$',
                          '\5~\4~\3~\2~\1'
      FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
    Continued in Introduction to regular expressions ... last part..
    C.
    Fixed some typos and a flawed example ...
    cd

    Thank you very much C. Awaiting other parts.... keep going.
    One german typo :-)
    I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
    tnsnames voor EDWH:
    PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
                                                   (HOST=blah.blah.blah.com)
                                                   (PORT=5227)))
               (CONNECT_DATA=(SID=pcescrd1)))
    db user: BW_I2_VIEWER  / BW_I2_VIEWER_SCRD1Had to look for translators.
    Cheers
    Sarma.

  • Regular Expressions and Collections

    Is it possible to use refular expressions when validating collection items?
    I have a loop that checks collection values for null, not null etc. I would also like to validate the format of some text fields in the same collection loop.
    i.e.
    for i in 1..htmldb_application.g_f02.count
    loop
    if
    replace(htmldb_application.g_f07(i), 'NULL',NULL) is null
    and TO_NUMBER(NVL(htmldb_application.g_f02(i), 0)) > 0
    THEN
    return 'A Demand Quantity Value Must Be Specified.';
    elsif
    -- check htmldb_application.g_f07(i) is correctly formatted using Reg Expressions
    then
    return 'Reg Expression error text goes here';
    end if;
    end loop;
    Regards
    Duncan

    Well, the obvious answer is "only write the data to the database if the input doesn't match the regular expression."
    Presumably you're really asking how to do that - but it depends upon how your application is structured in the first place, and you haven't told us anything at all about that.

  • Regular Expression Character Sets with Pattern and Matcher

    Hi,
    I am a little bit confused about a regular expressions I am writing, it works in other languages but not in Java.
    The regular expressions is to match LaTeX commands from a file, and is as follows:
    \\begin{command}([.|\n\r\s]*)\\end{command}
    This does not work in Java but does in PHP, C, etc...
    The part that is strange is the . character. If placed as .* it works but if placed as [.]* it doesnt. Does this mean that . cannot be placed in a character range in Java?
    Any help very much appreciated.
    Kind Regards
    Paul Bain

    In PHP it seems that the "." still works as a all character operator inside character classes.
    The regular expression posted did not work, but it does if I do:
    \\begin{command}((.|[\n\r\s])*)?\\end{command}
    Basically what I'm trying to match is a block of LaTeX, so the \\begin{command} and \\end{command} in LaTeX, not regex, although the \\ is a single one in LaTeX. I basically want to match any block which starts with one of those and ends in the end command. so really the regular expression that counts is the bit in the middle, ((.|[\n\r\s])*)?
    Am I right it saying that the "?" will prevent the engine matching the first and last \\bein and \\end in the following example:
    \\begin{command}
    some stuff
    \\end{command}
    \\begin{command}
    some stuff
    \\end{command}

  • Strange result about regular expressions

    Hello everybody,
    I write these codes to try regular expressions in Java, but there are some strang results. I read the reference like Sun Java Tutorials. however, I cann't find the problem.
    Environnement:
    WindowsXP Home + NetBeans IDE 5.0 + JDK 1.5
    Input String:
    "I write these codes to try regular expressions in Java, but it doesn't work. I read some reference like Sun Java Tutorials. Then, always cann't find the problem. Could you help me? Thanks."
    My codes:
    public static void main(String[] args) throws Exception, IOException {
    P.rintln("Let's go!");
    Date start = new Date();
    if(args.length != 1) {
    P.rintln("Input Error! Input format: java javaclass [directory path]");
    System.exit(0);
    StringBuffer sb = new StringBuffer();
    String input = TextFile.read(args[0]);
    sb = addSectionEelement(input, "re");
    P.rintln(sb.toString());
    P.rintln("Ok, it's over");
    Date end = new Date();
    System.out.println("It spends " + (end.getTime() - start.getTime()) + " ms.");
    public static StringBuffer addSectionEelement(String input, String regex) {
    Matcher m = Pattern.compile(regex).matcher(input);
    StringBuffer sb = new StringBuffer();
    int count = 0;
    while(m.find()) {
    count++;
    P.rintln(m.group());
    P.rintln("Found " + count + " fois.");
    return sb;
    Output:
    run:
    Let's go!
    Found 0 fois.
    Ok, it's over
    It spends 16 ms.
    BUILD SUCCESSFUL (total time: 0 seconds)
    However if I change the Bold line by
    sb = addSectionEelement(input, "r");
    The resultats become:
    run:
    Let's go!
    r
    r
    r
    r
    r
    r
    r
    r
    r
    r
    r
    Found 11 fois.
    Ok, it's over
    It spends 15 ms.
    BUILD SUCCESSFUL (total time: 0 seconds)
    I have no idea about it. And you?
    Thanks

    Hi guys,
    I re-examine the codes. In fact, it's the problem of encodings of the input file.
    See u

  • How to find sunstring with regular expression?

    How can I find a substring in a string with a regular expression?
    Example:
    I have a original string "<tr><th>RecordId: </th><td valign=middle>A4711</td></tr>"
    Now i want to extract the value "A4711" from this string with a regular expression. Everything except "A4711" is fixed, the id "A4711" itself is dynamic. How is it possible to get the substring "A4711" of the original string with a regular expression?

    i wrote a little method with the infos above to get such results:
         * Get all substrings of a string that matches a regular expression.
         * @param original String to inspect.
         * @param regExp Regular expression as search criteria.
         * @return All matches of <i>regExp</i> or null if one input parameter is null.
        public static String[] getSubstrings(String original, String regExp) {
            String[] result = null;
            if (original != null && regExp != null) {
                Pattern pattern = Pattern.compile(regExp);
                Matcher matcher = pattern.matcher(original);
                boolean matchFound = matcher.find();
                Vector matches = new Vector();
                while (matchFound) {
                    String match = matcher.group();        
                    matches.addElement(match);
                    matchFound = matcher.find();
                }//next match
                int count = matches.size();
                result = new String[count];
                for (int i = 0; i < count; i++) {
                    result[i] = (String) matches.elementAt(i);
                }//next match
            }//else: input unavailable
            return result;
        }//getSubstrings()

  • [DW CC] Regular Expression and Back Reference in "Replace with": Escape

    I have a problem with escaping a character in "Replace with".
    Let's assume the following situation:
    Content:
    foobar
    Search for:
    (foo)(bar)
    Replace by:
    $1zot$2
    Result:
    foozotbar
    Everything is fine.
    But I want to insert the number "2" in between foo and bar.
    When I use
    Replace by: $12$2
    The result is:
    $12$2
    It seems that the regex engine interpretes the "$12" as a whole. And because there's no back reference with the count "12" DW cannot work correctly.
    The question:
    How can I "escape" the "2"?
    Thanks.

    Dreamweaver uses JavaScript for its Find and Replace with regex.
    I've just checked the Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan (O'Reilly). It deals with exactly this sort of situation where capturing groups can be ambiguous. This is what it says:
    "Java and JavaScript try to be clever with $10 [and above]. If a capturing group with the two-digit number exists in your regular expresssion, both digits are used for the capturing group. If fewer capturing groups exist, only the first digit is used to reference the group, leaving the second as a literal. Thus $23 is the 23rd capturing group, if it exists. Otherwise, it is the second capturing group followed by a literal 3."
    In other words, there is no way to escape the 2 in the Replace field. It would appear there's a bug in Dreamweaver's use of capturing groups.

  • Sed Request Regular Expression Format

    A quick question....
    There are lots of different syntaxes for regular expressions and lots for SED. With the sed_request and sed_response filter I have tried different syntaxes for marking word boundaries, but don't know which to use. The \b syntax is supported but doesn't seem to do anything and the \< and \> syntax throughs up errors when I start up the web server. I tried the more complex (?<!\w)(?=\w) and (?<=\w)(?!\w) but the \w isn't supported. I am wondering if I just can't do this.... I am trying to stop SQL injection attacks using a syntax such as
    s/\bselect\b.{1,100}?\bfrom\b.{1,100}?\bwhere\b//g
    Are word boundaries not supported?

    Actually, the entries should be \\< and \\>, which looks double escaped to me but the entries are correct then
    Input fn="insert-filter"
    method="(GET|HEAD|POST)"
    filter="sed-request"
    sed="s/</\\</g"
    sed="s/%3c/\\</g"
    sed="s/%3C/\\</g"
    sed="s/>/\\>/g"
    sed="s/%3e/\\>/g"
    sed="s/%3E/\\>/g"
    sed="s/\x3C ?iframe//g"
    sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<javascript://g"
    sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
    sed="s/\\<href\\>[^a-zA-Z_0-9]*?\\<javascript://g"
    sed="s/\\<alert\\>[^a-zA-Z_0-9]*?\x28//g"
    sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<http://g"
    sed="s/\\<type\\>[^a-zA-Z_0-9]*?\\<text\\>[^a-zA-Z_0-9]*?\\<vbscript\\>//g"
    sed="s/\\<href\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
    sed="s/\\<url\\>[^a-zA-Z_0-9]*?\\<javascript://g"
    sed="s/\x3C ?script\\>//g"
    sed="s/\\<type\\>[^a-zA-Z_0-9]*?\\<text\\>[^a-zA-Z_0-9]*?\\<javascript\\>//g"
    sed="s/\\<url\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
    sed="s/(asfunction|javascript|vbscript|data|mocha|livescript)://g"
    sed="s/(?i:<object[ /+\t].*?((type)|(codetype)|(classid)|(code)|(data))[ /+\t]*=)//g"
    sed="s/(?i:[ /+\t\"\'`]datasrc[ +\t]*?=.)//g"
    sed="s/(?i:<link[ /+\t].*?href[ /+\t]*=)//g"
    sed="s/(?i:<meta[ /+\t].*?http-equiv[ /+\t]*=)//g"
    sed="s/(?i:<embed[ /+\t].*?SRC.*?=)//g"
    sed="s/(?i:[ /+\t\"\'`]on\x63\x63\x63+?[ +\t]*?=.)//g"
    sed="s/(?i:<?frame.*?[ /+\t]*?src[ /+\t]*=)//g"
    sed="s/(?i:<isindex[ /+\t>])//g"
    sed="s/(?i:<form.*?>)//g"
    sed="s/(?i:<script.*?[ /+\t]*?src[ /+\t]*=)//g"
    sed="s/(?i:<script.*?>)//g"
    sed="s/\\<select\\>.{0,40}buser\\>//g"
    sed="s/\\<select\\>.{0,40}\\<substring\\>//g"
    sed="s/\\<select\\>.{0,40}\\<ascii\\>//g"
    sed="s/\\<user_tables\\>//g"
    sed="s/\\<user_tab_columns\\>//g"
    sed="s/\\<all_objects\\>//g"
    sed="s/\\<drop\\>//g"
    sed="s/\\<substr\\>//g"
    sed="s/\\<sysdba\\>//g"
    sed="s/\\<user_password\\>//g"
    sed="s/\\<user_users\\>//g"
    sed="s/\\<user_constraints\\>//g"
    sed="s/\\<column_name\\>//g"
    sed="s/\\<substring\\>//g"
    sed="s/\\<object_type\\>//g"
    sed="s/\\<object_id\\>//g"
    sed="s/\\<user_ind_columns\\>//g"
    sed="s/\\<column_id\\>//g"
    sed="s/\\<table_name\\>//g"
    sed="s/\\<object_name\\>//g"
    sed="s/\\<rownum\\>//g"
    sed="s/\\<user_group\\>//g"
    sed="s/\\<utl_http\\>//g"
    sed="s/\\<select\\>.*?\\<to_number\\>//g"
    sed="s/\\<group\\>.*\\<byb.{1,100}?\\<having\\>//g"
    sed="s/\\<select\\>.*?\\<data_type\\>//g"
    sed="s/\\<isnull\\>[^a-zA-Z_0-9]*?\x28//g"
    sed="s/\\<union\\>.{1,100}?\\<select\\>//g"
    sed="s/\\<insert\\>[^a-zA-Z_0-9]*?\\<into\\>//g"
    sed="s/\\<select\\>.{1,100}?\\<count\\>.{1,100}?\\<from\\>//g"
    sed="s/\x3B[^a-zA-Z_0-9]*?\\<drop\\>//g"
    sed="s/\\<select\\>.*?\\<to_char\\>//g"
    sed="s/\\<dbms_java\\>//g"
    sed="s/\\<nvarchar\\>//g"
    sed="s/\\<utl_file\\>//g"
    sed="s/\\<inner\\>[^a-zA-Z_0-9]*?\\<join\\>//g"
    sed="s/\\<select\\>.{1,100}?\\<from\\>.{1,100}?\\<where\\>//g"
    sed="s/\\<intob[^a-zA-Z_0-9]*?\\<dumpfile\\>//g"
    sed="s/\\<delete\\>[^a-zA-Z_0-9]*?\\<from\\>//g"
    sed="s/\x3B[^a-zA-Z_0-9]*?\\<shutdown\\>//g"
    sed="s/\\<dba_users\\>//g"
    sed="s/\\<select\\>.{1,100}?\\<top\\>.{1,100}?\\<from\\>//g"

  • Introduction to regular expressions part4

    from Introduction to regular expressions ... last part.
    Hi cd_2.
    You has not introduced 11gR1 regex new features.
    Therefore I will introduce it B-)
    RegExp_Count
    http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions145.htm
    Since Oracle11gR1 There is new function RegExp_Count
    RegExp_Count counts how many strings which has match pattern.
    sampleSQL
    select
    RegExp_Count('abc','[a-c]') as cnt1,
    RegExp_Count('abc','[ac]')  as cnt2,
    RegExp_Count('abc','[0-9]') as cnt3
    from dual;
    cnt1  cnt2  cnt3
       3     2     0***********************************************************************************
    6th parameter of RegExp_SubStr
    http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions148.htm
    Since Oracle11gR1 There is new 6th parameter at RegExp_SubStr.
    This 6th parameter can emulate regex Lookahead and lookbehind.
    but This can emulate easy case only.
    for ex (?=.*abc)ghi can emulate.
    But (?=.*abc)(?=.*def)ghi cannot emulate.
    sampleSQL
    select
    RegExp_Substr('abc1def2','([a-z])[0-9]',1,1,null,1)
    as "emulate [a-z](?=[0-9])",
    RegExp_Substr('1def2','[a-z]([0-9])',1,1,null,1)
    as "emulate (?<=[a-z])[0-9]"
    from dual;
    e  e
    c  2

    This function has been known to the community for years. It has been presented at conferences by Tom Kyte, discussed in Oracle Magazine, and demonstrated in Morgan's Library on the Regular Expressions page for several years at least.
    http://www.morganslibrary.org/reference/regexp.html
    http://www.morganslibrary.org/reference/builtin_functions.html
    It is a little late to be introducing it. ;)

  • Pattern matching using Regular expression

    Hi,
    I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
    A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
    B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
    the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
    The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
    I used the condition
    AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
    First of all, is this possible to match multiple patterns in one condition?
    If yes, please suggest.
    Thanks

    user2544469 wrote:
    Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
    - query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
    If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
    WITH     got_token_cnt     AS
         SELECT     cat
         ,     book                                        -- Added March 22
         ,     REPLACE (codes, '+', '*') AS codes                    -- If desired.  Changed March 22
         ,     LENGTH (codes) - LENGTH ( TRANSLATE ( codes
                                                       , 'x*+-'
                                      , 'x'
                             ) AS token_cnt
         FROM    test_cat
    ,     cntr     AS
         SELECT     LEVEL     AS n
         FROM     (  SELECT  MAX (token_cnt)     AS max_token_cnt
                 FROM        got_token_cnt
         CONNECT BY     LEVEL     <= max_token_cnt
    ,     got_tokens     AS
         SELECT     t.cat
         ,     t.book                                        -- Added March 22
         ,     REGEXP_SUBSTR ( t.codes
                         , '[*+-]'
                         , 1
                         , c.n
                         )          AS token_type
         ,     SUBSTR ( REGEXP_SUBSTR ( t.codes
                                       , '[*+-][^*+-]*'
                               , 1
                               , c.n
                   , 2
                   )          AS token
         FROM     got_token_cnt     t
         JOIN     cntr          c  ON     c.n     <= t.token_cnt
    ,     got_must_have_cnt     AS
         SELECT       cat, book                                   -- Changed March 22
         ,       COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
         FROM       got_tokens
         GROUP BY  cat, book                                   -- Changed March 22
    SELECT       mh.cat
    ,       vn.vn_no
    FROM       got_must_have_cnt     mh
    JOIN                    vn  ON  mh.book     = vn.book               -- Changed March 22
    LEFT OUTER JOIN      got_tokens     gt  ON     mh.cat                  = gt.cat
                                     AND INSTR (vn.codes, gt.token) > 1
    GROUP BY  mh.cat
    ,            mh.must_have_cnt
    ,            vn.vn_no
    HAVING       COUNT (CASE WHEN gt.token_type = '*' THEN 1 END)     = mh.must_have_cnt
    AND       COUNT (CASE WHEN gt.token_type = '-' THEN 1 END)     = 0
    ORDER BY  mh.cat
    - query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
    When your query takes too long ...
    HOW TO: Post a SQL statement tuning request - template posting
    Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
    Edited by: Frank Kulash on Mar 22, 2011 12:04 PM

  • Validate Email by regular Expression... Need Help

    Dear All,
    Requirement:
    validate the email ID entered & throw error message, if it is invalid.
    DATA c_mailpattern TYPE c LENGTH 60 VALUE
    '[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4} '.
    ** If @ is present, more than once. Error out
        find ALL OCCURRENCES OF '@' in P_email
        MATCH COUNT v_count.
        if v_count > 1.
          v_badpattern = 1.
        endif.
    ** If , is present, once, Error out
        find ALL OCCURRENCES OF ',' in P_Email
        MATCH COUNT v_count.
        if v_count > 0.
          v_badpattern = v_badpattern + 1.
        endif.
        FIND REGEX c_mailpattern IN P_Email IGNORING CASE .
        IF sy-subrc <> 0 OR v_badpattern > 0.
    Write:/ p_EMAIL, 'has invalid Email format'.
    ENDIF.
    though this works fine, tester needs me to catch, if domain name has "app.com.com"  as invalid email id.
    above regex fails in such case.
    I searched & found
    {messageID=3706355}
    messageID=1657369}{
    https://wiki.sdn.sap.com/wiki/display/Snippets/E-MAIL+Validation
    doesn't help.
    I found this regex in a perl program.
    [a-z0-9!#$%&'{size:14}*+{size:14}/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
    Can I get help to modify this  into ABAP String?
    1) I can't bypass the boldened text using Escape characters like #* or '' Can some one help me assign this regex-string into a string variable?
    2) This regex is longer than allowed length for a literal.
    It can be split into 2 strings, then concatenated & checked.
    Edited by: Mallikarjuna J on May 16, 2011 8:23 PM
    Edited by: Mallikarjuna J on May 16, 2011 8:26 PM

    Thanks Sebastian, Pratik & Keshav for the replies.
    SX_INTERNET_ADDRESS_TO_NORMAL doesn't validate a wrong email ID. It only splits the internet address into mail & domain.
    Prathik,
    just .com.com is not the point, Bad input could be .net.ent or .net.com or so....
    Amol, Thanks, but I keep receiving Error, not found in the 41 line response I get
    I think we need to check not line 2 but line 28.
    Taking cue from Prathik, I'm planning to put this
    *** ls_inputmail-mail is the email-id entered by user.
    ************ Check for Valid Regular Expression
    *****   DOT(.) is allowed more than once,
    *****   @ is allowed only once,
    *****   , is not allowed.
    ** If @ is present, more than once. Error out
        find ALL OCCURRENCES OF '@' in ls_input_mail-mail
        MATCH COUNT v_count.
        if v_count > 1.
          v_badpattern = 1.
        endif.
    ** If , is present, once, Error out
        find ALL OCCURRENCES OF ',' in ls_input_mail-mail
        MATCH COUNT v_count.
        if v_count > 0.
          v_badpattern = v_badpattern + 1.
        endif.
    **   Find if domain part i.e., after @ has errors.
        SPLIT ls_input_mail-mail at '@' into v_mailpart v_domain.
    *    there's a dot in the domain.
        if v_domain Co '.' .
    *     last 2 char can only be country name, not anything else.
          SPLIT v_domain at '@' into v_domain1 v_domain2.
    *      v_domain2 can only be a country name, else error out
      select single landx from t005 into v_country
        where landx = v_domain2.
        if sy-subrc <> 0.
          v_badpattern = v_badpattern + 1.
        endif. 
        ENDIF. 
        FIND REGEX c_mailpattern IN ls_input_mail-mail IGNORING CASE .
        IF sy-subrc <> 0 OR v_badpattern > 0.
    Write:/ ls_inputmail-mail, 'has invalid email format'.
      ENDIF.
    However, I was wondering, if there was a way to  use escapae characters & make the beow string as a valid regex variable to check email id.
    [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
    Nevertheless, Thanks Friends for all your inputs.
    Edited by: Mallikarjuna J on May 17, 2011 2:23 PM

Maybe you are looking for