Regular Expression Character Occurance.

Hello,
I have a name textfield and I want certain characters like "-" to occur one time only.
i.e. The textfield can accept "Jean Luc" or "Jean-Luc" (french name), but not "Jean--Luc" or "Jean------------Luc" right.
The regular expression  /\-?/ looks correct but it does not seem to work.
We use "\" because "-" is an escape character and the character "?" to make "-" occur 0  or 1 time only.

I am making sure that the textfield accept only text like "Jean-Luc" and not "-----Jean--Luc".
My regular expression is already very long so I will add 5 or more additional parameters in the if statement which will not be a problem.
Below is a very common regular expression which looks simple but it will not accept "[email protected]" and non western characters.
/^[a-zA-Z0-9._-]+@[a-zA-z0-9.-]+\.[a-zA-z]{2,4}$/
Email regular expressions are complicated. The range of permissible email adresses and some domain sometimes include non western characters which is quite rare but certainly does exist. I guess there is no precise regular expression for user forms.
Checking user input becomes more complicated if some bored to death folks have a habit of playing drums on keyboard which result to "asdf;lkjasdf;lkjasdf;lkajsdf;lkajsdf;lkajsdf;lkj".

Similar Messages

  • Regular Expression Character Sets with Pattern and Matcher

    Hi,
    I am a little bit confused about a regular expressions I am writing, it works in other languages but not in Java.
    The regular expressions is to match LaTeX commands from a file, and is as follows:
    \\begin{command}([.|\n\r\s]*)\\end{command}
    This does not work in Java but does in PHP, C, etc...
    The part that is strange is the . character. If placed as .* it works but if placed as [.]* it doesnt. Does this mean that . cannot be placed in a character range in Java?
    Any help very much appreciated.
    Kind Regards
    Paul Bain

    In PHP it seems that the "." still works as a all character operator inside character classes.
    The regular expression posted did not work, but it does if I do:
    \\begin{command}((.|[\n\r\s])*)?\\end{command}
    Basically what I'm trying to match is a block of LaTeX, so the \\begin{command} and \\end{command} in LaTeX, not regex, although the \\ is a single one in LaTeX. I basically want to match any block which starts with one of those and ends in the end command. so really the regular expression that counts is the bit in the middle, ((.|[\n\r\s])*)?
    Am I right it saying that the "?" will prevent the engine matching the first and last \\bein and \\end in the following example:
    \\begin{command}
    some stuff
    \\end{command}
    \\begin{command}
    some stuff
    \\end{command}

  • Regular Expressions Character Class shortcuts

    I have been learning to use regular expressions to modify some of my text files. I noticed that on my ARCH box the Character Class shortcuts do not work e.g. [[:digit:]] in an expression works but \d does not. Is this normal or is my installation broken in some way?

    Bebo wrote:
    There are several regexp "dialects". It's quite painful actually For instance, as far as I know, \d works in perl, but not in sed or grep.
    So, yes, this is normal.
    Yeah -- Henry Spencer's regexp stuff is always generally considered the portable form for sed, awk, since they're all based from it.  Newer versions of grep though do allow for a -P flag for perl-regexps to be used, but this is non-portable, obviously.
    -- Thomas Adam

  • Regular expressions with multi character separator

    I have data like the
    where |`| is the separator for distinguishing two fields of data. I am having trouble writing a regular expression to display the data correctly.
    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    SQL> declare
      2  l_string varchar2 (200) :='123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes';
      3  v varchar2(40);
      4  begin
      5  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 1);
      6  dbms_output.put_line(v);
      7  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 2);
      8  dbms_output.put_line(v);
      9  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 3);
    10  dbms_output.put_line(v);
    11  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 4);
    12  dbms_output.put_line(v);
    13  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 5);
    14  dbms_output.put_line(v);
    15  end;
    16  /
    123
    456
    789 10 here
    223
    5434I need it to display
    123` 456
    789 10 here
    |223
    5434|`}22
    yesI am not sure how to handle multi character separators in data using reg expressions
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:35 PM
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:37 PM

    Hi,
    Actually, using non-greedy matching, you can do what you want with regular expressions:
    VARIABLE     l_string     VARCHAR2 (100)
    EXEC  :l_string := '123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes'
    SELECT     LEVEL
    ,     REPLACE ( REGEXP_SUBSTR ( '|`|' || REPLACE ( :l_string
                                     , '|`|'
                                      , '|`||`|'
                                     ) || '|`|'
                        , '\|`\|.*?\|`\|'
                        , 1
                        , LEVEL
               , '|`|'
               )     AS ITEM
    FROM     dual
    CONNECT BY     LEVEL     <= 7
    ;Output:
    LEVEL ITEM
        1 123` 456
        2 789 10 here
        3 |223
        4 5434|`}22
        5 yes
        6
        7Here's how it works:
    The pattern
    ~.*?~is non-greedy ; it matches the smallest possible string that begins and ends with a '~'. So
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 1) returns '~SHALL~'. However,
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 2) returns '~COMPARE~'. Why not '~I~'? Because the '~' between 'SHALL' and 'I' was part of the 1st pattern, so it can't be part of the 2nd pattern. So the first thing we have to do is double the delimiters; that's what the inner REPLACE does. The we add delimiters to the beginning and end of the list. Once we've done prepared the string like that, we can use the non-greedy REGEXP_SUBSTR to bring back the delimited items, with a delimiter at either end. We don't want those delimiters, so the outer REPLACE removes them.
    I'm not sure this is any better than Sri's solution.

  • Regular Expression to Locate Words with Character

    I want to identify all the words in a document that are followed by the register mark (®) symbol.
    I built, what I thought was a regular expression that would search for a register mark preceeded by alpha number characters and a space. So if my text contained the sentence "Adobe InDesign® is a great product.", the regular expression would find "InDesign®"
    Below is the regular expression I composed. It grabs anything with a register mark, not just the register marks preceded by a space and alpha numeric characters. Where did I go wrong? I though the \s would restrict the search to complete words with a register mark.
    \s[a-zA-Z0-9]|®

    \s is the special GREP code for "any kind of space" -- a regular space, a tab, hard return, or any of ID's own white space codes. It has nothing to do with "complete words", because a word can appear at the start of a story, without any preceding space. It would also not find "InDesign®" because there is no space before it, there is a double quote instead.
    Your GREP does not work because, well, you got the general idea (words may consist of the set of characters "a-z", "A-Z", and "0-9") but since you use the [..] without any other code, GREP will apply this rule once -- per character. If you want to find words of more than one character, you need to tell GREP "one or more of these, please": with a +.
    Second, where did that | come from? It's the OR operator. Essentially, you are looking for
          any space followed by one character from the set "a-z", "A-Z", and "0-9"
    OR
          the ® character
    The 'word break' you were looking for is this code: \b, so you could search for "\b[a-zA-Z0-9]+" (note the '+' to allow more than one instance) -- but it's not necessary, because by default GREP grabs as much as it can. The set 'a-zA-Z0-9' etc. describes the allowed "word" characters, but you might want to prefer these: \l (ell) and \u for all lowercase and all uppercase characters -- they are shorter, and they automatically include accented characters, Greek, Russian, and a lot more. Similar, \d (for "digits") is the short-cut for "0-9". And even better: \w is the shortcut for "word character", i.e., your set but then shorter and a bit better.
    Try this one:
    \w+~r

  • Regular expression for 2nd occurance of a substring in a string

    Hi,
    1)
    i want to find the second occurrence of a substring in a string with regular expression so that i can modify that only.
    Ex: i have a string like ---> axe,afn,sdk,jdi,afn,mki,mki
    in this i want the second occurance of afn and change that one only...
    which regular expression i have to use...
    Note that ...i have to use regular expression only....no string manipulation methods...(strictly)
    2)
    How can i apply the multiple regular expressions multiple times on a single string ..i.e in the above instance i have to apply the same 2nd occurrence logic for
    substring mki also. for this i have to use a single regular expression string that contains validations for both the sub strings mki and afn.
    Thanks in advance,
    Venkat

    javafreak666 wrote:
    Hi,
    1)
    i want to find the second occurrence of a substring in a string with regular expression so that i can modify that only.
    Ex: i have a string like ---> axe,afn,sdk,jdi,afn,mki,mki
    in this i want the second occurance of afn and change that one only...
    which regular expression i have to use...
    Note that ...i have to use regular expression only....no string manipulation methods...(strictly)
    2)
    How can i apply the multiple regular expressions multiple times on a single string ..i.e in the above instance i have to apply the same 2nd occurrence logic for
    substring mki also. for this i have to use a single regular expression string that contains validations for both the sub strings mki and afn.
    Thanks in advance,
    VenkatWhat do you mean by using a regex to get the index of a second substring? There is not method in Java which uses regex to et the index of a substring.
    There are various indexOf(...) methods for this:
    String text = "axe,afn,sdk,jdi,afn,mki,mki";
    String target = "afn";
    int second = text.indexOf(target, text.indexOf(target)+1);
    System.out.println("second="+second);Of course you can find the index of a group like this:
    Matcher m = Pattern.compile(target+".*?("+target+")").matcher(text);
    System.out.println(m.find() ? "index="+m.start(1) : "nothing found");but there is not single method that handles this: you'll have to call the find() and then the start(...) method on the Matcher instance, so the indexOf(...) approach is the favourable one, IMO.

  • Regular Expression Escaped Digit "\d" Illegal Escape Character

    Hello,
    I'm trying to write a regular expression to determine if a String matches a date format that is defined as YYYYMMDD. For example, March 11, 2009 would be "20090311"
    For the time being I don't care if an invalid month or day is entered. I've attempted both of the following
    if (date.matches("(19|20)\d{4}")) {
      // warn the user
    }and
    if (java.util.regex.Pattern.matches("(19|20)\d{4}"), date)) {
      // warn the user
    }And both yield Illegal Escape Character compilation errors, for the "\d" part of the regular expression.
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#sum
    Says that "\d" is the predefined digit character class. So at this point, I don't know what I'm doing wrong. I realize I could just define the character class myself, and use a pattern like "(19|20)[0-9]{4}", but I would like to know why "\d" isn't being recognized by the compiler.
    Thanks,
    Paul

    paulwooten wrote:
    Can someone give me an explanation of heuristics, as they might apply to SimpleDateFormat? Does this mean that if the format was similar the parser might figure it out? Say, if instead of "yyyyMMdd", it was "yyyyddMM", or "yyMMdd"?No. Since all of these are valid formats, there's no way for the parser to distinguish this.
    Or does this have to do with rejecting February 29, and other dates like that.That's the one. When setLenient(false) is called, then the 29th February is only accepted in leap years.
    It will also reject the 57th January when lenient is set to false (try parsing that with lenient=true, you'll be surprised).
    I've read some of the wikipedia article about heuristics, but I'm confused as to how it would apply to this example.Don't concentrate to much on the term heuristics. Just remember: lenient=true means that not-really-correct dates will be accepted, lenient=false means more strict checks.

  • Writing Regular Expression with a character ^, too difficult

    I want to change "^1Mandrake ^3Style ^4DM" this sentence to "Mandrake Style DM".
    (^ with number means color code)
    So..I used String.replaceAll() method with regular expression.
    But however hard I try, I cant find any solution for this.
    In php I could use \^ as a ^ character, but java dosnt support \^.
    How can I solve this problem?

    Use \\^ in your regex (you have to escape the slash, too).

  • Difference between regular expressions and spry character masking?

    Hi,
    This is my first time writing my own regular expressions.  Often times though, they seem to work in various testing widgets, but then they do not perform as expected in Spry.  I have no idea how to even begin to debug this.
    For example, this string:
    ^\#?[A-Fa-f0-9]{3}([A-Fa-f0-9]{3})?$
    Does a perfect job enforcing hex colors in a regexp testing widget.  But it doesn't work in spry.  It won't let me type a darn thing in.
    Can somebody throw me a bone here?

    Hi!
    Thank you for the response.  I read that article prior to posting and it seems to relate more to Spry's custom pattern function rather than regular expressions.  Here's the code I have:
    <script type="text/javascript">
         <!--     
              var text_1 =
              new Spry.Widget.ValidationTextField(
                   "text_1",
                   "none",
                   {regExpFilter:/^#[A-Fa-f0-9]{6,};$/,
                   useCharacterMasking:true,
                   validateOn:["change"]})
         //-->
    </script>
    Expected behavior:  I should be able to type in a valid hex color and have Spry perform validation.
    Actual behavior:  I can't type anything in, at all.  I immediately get the invalid Spry feedback (in my case a little red .png image and an error message).
    Simpler expressions like this work fine in Spry:
                        <script type="text/javascript">
         <!--
              var text_1 =
                   new Spry.Widget.ValidationTextField(
                   "text_1",
                   "none",
                   {regExpFilter:/[a-z]/,
                   useCharacterMasking:true,
                   validateOn:["change"]})
         //-->
    </script>
    I think if I can figure out what the special rules are for one somewhat robust regular expression in Spry, then I will be off and running.
    Can anyone help?
    Scott

  • How to use regular expression to delete a character?

    Hello,
    I have a query,
    select partition_name from dba_tab_partitions where table_owner='xxx'and num_rows <>0 and table_name = 'xxx';
    P5
    P6
    P7
    P12
    P13
    P14
    P17
    P18
    P19
    P20
    P24
    How can I use regular expression in above SQL query to get result without letter 'P', like..
    5
    6
    7
    12
    13
    14
    17
    18
    19
    20
    24
    thank you

    I find answer...
    select regexp_replace(partition_name,'P','')
    thanks anyway

  • Regular Expression wierdness - problem with $ character

    If I use the following KM code in Beanshell Technology - it works correctly and replaces "C$_0MYREMOTETABLE RMTALIAS, MYLOCALTABLE LOCALIAS, " with "C$_0MYREMOTETABLE_000111 RMTALIAS, MYLOCALTABLE LOCALIAS, "
    But when I try to use the same exact code in 'Undefined' technology - it does not match anything in the source string - and does not replace anything.
    If I change the regular expression to not use the $ it still does not work.
    But if I change the source string to remove the $ - then the regular expression works.
    If I use the same code in Beanshell technology - it works fine - but then I can't use the value in a later 'Undefined' technology step.
    Does anyone know if the java technology does something special with $ characters when ODI parses the KM code?
    Does anyone know if there is a way to use the value from a Beanshell variable in a 'Undefined' technology step?
    String newSourceTableList = "";
    String sessionNum ="<%=odiRef.getSession("SESS_NO") %>";
    String sourceTableList = "<%=odiRef.getSrcTablesList("", "[WORK_SCHEMA].[TABLE_NAME] [POP_TAB_ALIAS]" , ",", ",") %>";
    String matchExpr = "(C\\$_\\S*)"; (should end with two backslashes followed by 'S*' - this editor mangles it)
    String replaceExpr = "$0_"+sessionNum+ " ";
    newSourceTableList = sourceTableList.replaceAll(matchExpr,replaceExpr);
    ---------------------------------------------------

    Phases of substitution in ODI:
    The way ODI works allows for three separate phases of substitution, and you can use them all. The three phases are:
    - First Phase: <% %> You will see these appear in the knowledge moduiles etc and these are substituted on generation. (when you generate a scenario, or tell ODI to execute an interface directly) this phase is used to generate the column names, table names etc which are known from the metadata at that phase.
    - Second Phase: <? ?> This phase is substituted when the scenario is instatntiuated as an excution - session generation. At this point, ODI has the additional information which allows it to generate the schema names, as it has resolved the Logical/Physical Schemas through the use of the Context (which is provided for the execution to take place. All the substitutions at this point are written to the execution log.
    - Third Phase <@ @> This phase is substituted when the execution code is read from the session log for execution. You will note that anything substituted in this phase is NEVER written to the execution log. (see PASSWORDS as a prime example, you don't want those written to the logs, with the security risks associated with that!)
    Anything in <@ @> is always interpreted for substitution by the java beanshell, it does not have to be a Java Beanshell step, it can be any kind of step, it will be interpreted at that run-time point.

  • Using regular expressions

    Hi Experts,
    After going through some documentation on regular expressions in Oracle I have tried to draw some conclusions about the same. As I wasn’t much confident on how the patterns are built, I have tried to interpret them by looking at the output. It’s basically a reverse engineering I have tried to do.
    Please let me know if my interpretations are correct. Any additions /suggestions/corrections are most welcome.
    Some of the examples may lack conclusions, please ignore those.
    select regexp_substr('1PSN/231_3253/ABc','^([[:alnum:]]*)') from dual;
    Output: 1PSN
    Interpreted as:
    ^ From the start of the source string
    ([[:alnum:]]*) zero or more occurrences of alphanumeric characters
    select regexp_substr('@@/231_3253/ABc','@*([[:alnum:]]+)') from dual;
    Output: 231
    Interpreted as:
    @* Search for zero or more occurrences of @
    ([[:alnum:]]+) followed by one or more occurrences of alphanumeric characters
    Note: In the above example oracle looks for @(zero times or more) immediately followed by alphanumeric characters.
    Since a '/' comes between @ and 231 the o/p is 0 occurences of @ + one or more occurrences of alphanumerics.
    select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]*)') from dual;
    Output: @
    Interpreted as:
    @+ one or more ocurrences of @
    ([[:alnum:]]*) followed by 0 or more occurrences of alphanumerics
    select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]+)') from dual;
    Output: Null
    Interpreted as:
    @+ one or more occurences of @
    ([[:alnum:]]+) followed by one or more occurences of aplhanumerics
    select regexp_substr('@1PSN/231_3253/ABc125','([[:digit:]]+)$') from dual;
    Output: 125
    Interpreted as:
    ([[:digit:]]+) one or more occurences of digits only
    $ at the end of the string
    select regexp_substr('@1PSN/231_3253/ABc','([^[:digit:]]+)$') from dual;
    output: /ABc
    Interpreted as:
    ([^[:digit:]]+)$ one or more occurrences of non-digit literals at the end of the string
    '^' inside square brackets marks the negation of the class
    Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.)
    SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT
    FROM dual;
    Output: http://www.oracle.com
    Interpreted as:
    [[:alnum:]]+ one or more occurences of alplanumeric characters
    \.? dot optionally (backslash represents escape sequence,? represents optionally)
    {3,4} 3 or 4 times
    /? followed by forward slash optionally
    If you have www.oracle.co.uk; {3,4} extracts it for you as well
    Validate email:
    select case  when
           REGEXP_LIKE('[email protected]',
                       '^([[:alnum:]]+(\_?|\.))([[:alnum:]]*)@([[:alnum:]]+)(.([[:alnum:]]+)){1,2}$') then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Interpreted as:
    ([[:alnum:]]+(\_?|\.)) one or more occurrences of alpha numerics optionally followed by an underscore or dot
    ([[:alnum:]]*) followed by 0 or more occurrences of alplhanumerics
    @ followed by @
    ([[:alnum:]]+) followed by one or more occurrences of alplhanumerics
    (.([[:alnum:]]+)){1,2} followed by a dot followed by alphanumerics from once till max of twice (Ex- .com or .co.uk)
    Output: Match Found
    Input: [email protected]
    Output: Match Found
    Input: [email protected]
    Output: No Match Found
    Truncate the part, ending with digits
    select regexp_substr('Yahoo11245@US','^.*[[:digit:]]',1) from dual;
    Output: Yahoo11245
    select regexp_substr('*Yahoo*11245@US','^.*[[:digit:]]',1) from dual;
    Output: *Yahoo*11245
    Interpreted as:
    .* zero or more occurrences of any characters (dot signifies any character)
    Replace 2 to 8 spaces with single space
    select regexp_replace('Hello   you      OPs       there','[[:space:]]{2,8}',' ')
    from dual;
    Search for control characters
    select case  when
           regexp_like('Super' || chr(13) || 'Star' ,'[[:cntrl:]]')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Search for lower case letters only with a string length varying from a min of 3 to max of 12
    select case  when
           regexp_like('terminator' ,'^[[:lower:]]{3,12}$')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    4th character must be a special character
    select case  when
           regexp_like('ter*minator' ,'^...[^[:alnum:]]')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Ouput: Match Found
    Case Sensitive Search
    select case  when
           regexp_like('Republic Of  Africa' ,'of','c')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: No match found
    c stands for case sensitive
    select case  when
           regexp_like('Republic Of  africa' ,'of','i')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    i stands for case insensitive
    Two consecutive occurences of characters from a to z
    select regexp_substr('Republicc Of Africaa' ,'([a-z])\1', 1,1,'i') from dual;
    Output: cc
    Interpreted as:
    ([a-z]) character set a-z
    \1 consecutive occurence of any character
    1 starting from 1st character in the string
    1 First occurence
    i case insensitive
    Three consecutive occurences of characters from 6 to 9
    select case  when
           regexp_like('Patch 10888 applied' ,'([7-9])\1\1')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Phone validator:
    select case  when
           regexp_like('123-44-5555' ,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Input: 111-222-3333
    Output: No match found
    Interpreted as:
    ^ start of the string
    [0-9]{3} three ocurrences of digits from 0-9
    - followed by hyphen
    [0-9]{2} two ocurrences of digits from 0-9
    - followed by hyphen
    [0-9]{4} four ocurrences of digits from 0-9
    $ end of the string
    ************************************************************************Source Links:
    http://www.psoug.org/reference/regexp.html
    http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
    Edited by: Preta on Feb 25, 2010 4:38 PM
    Corrected the example for www.oracle.com
    Edited by: Preta Incorported Logan's comments

    Hi,
    It looks like you have a good understanding of how regular expressions work.
    You can put comments like the ones in your message directly in the code. For example, your validate e-mail code could be re-written
    select      case 
             when REGEXP_LIKE ( '[email protected]'
                        , '^'          || -- Starting from the beginning of the string
                        '('          || -- Begin \1
                          '[[:alnum:]]+'|| --     0 or more alphnumerics
                          '(\_?|\.)'     || --     optional underscore or dot
                        ')'          || -- End \1
                        '([[:alnum:]]*)'|| -- 0 or more alphnumerics
                        '@'          || -- @ sign
                        '([[:alnum:]]+)'|| -- 1 or more alpanumerics
                        '('          || -- Begin \5
                          '\.'          || --   dot
                          '([[:alnum:]]+)'
                                  || --   1 or more alphanumerics
                        ')'          || -- End \5
                        '{1,2}'          || -- \5 can occur 1 or 2 times
                        '$'             -- End of string
             then 'Match Found'
                    else 'No Match Found'
                end          as output
    from      dual;I find this easier to debug and maintain.
    There's no denying, it does make the code very long. You be the judge of when to do this.
    You use parentheses and \ unnceccessarily sometimes. That's not really an error; if you find they make the code easier to develop and maintain, use them as much as you like.
    For example, about the 4th line of the regular expression as I formatted it above:
    '(\_?|\.)'     || --     optional underscore or dotUnderscore has no special meaning in regular expressions (only in LIKE), so you don't have to escape it.
    I might write that line:
    '(_|\.)?'     || --     optional underscore or dotjust because I think it's clearer.
    I think you forgot a \ about 7 lines later:
    '\.'          || --   dotBe very careful about testing patterns that include literal dots; always make sure that a random character, like ~ , fails in a place where a dot is expected.

  • OR ('|') in regular expressions (e.g. split a String into lines)

    Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
    To make this concrete, suppose that you want to split a String into lines, where the line delimiters are the same as the [line terminators used by Java regex|http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#lt] :
         A newline (line feed) character ('\n'),
         A carriage-return character followed immediately by a newline character ("\r\n"),
         A standalone carriage-return character ('\r'),
         A next-line character ('\u0085'),
         A line-separator character ('\u2028'), or
         A paragraph-separator character ('\u2029)
    This problem has [been considered before|http://forums.sun.com/thread.jspa?forumID=4&threadID=464846] .
    If we ignore the idiotic microsoft two char \r\n sequence, then no problem; the Java code would be:
    String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]");How do we add support for \r\n? If we try
    String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");which pattern of the compound (OR) regex gets used if both match? The
    [\\n\\r\\u0085\\u2028\\u2029]or the
    \\r\\n?
    For instance, if the above code is called when
    s = "a\r\nb";and if the first pattern
    [\\n\\r\\u0085\\u2028\\u2029]is used for the match when the \r is encountered, then the tokens will be
    "a", "", "b"
    because there is an empty String between the \r and following \n. On the other hand, if the rule is use the pattern which matches the most characters, then the
    \\r\\n
    pattern will match that entire \r\n and the tokens will be
    "a", "b"
    which is what you want.
    On my particular box, using jdk 1.6.0_17, if I run this code
    String s = "a\r\nb";
    String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");
    System.out.print(lines.length + " lines: ");
    for (String line : lines) System.out.print(" \"" + line + "\"");
    System.out.println();
    if (true) return;the answer that I get is
    3 lines:  "a" "" "b"So it seems like the first listed pattern is used, if it matches.
    Therefore, to get the desired behavior, it seems like I should use
    "\\r\\n|[\\n\\r\\u0085\\u2028\\u2029]"instead as the pattern, since that will ensure that the 2 char sequence is first tried for matches. Indeed, if change the above code to use this pattern, it generates the desired output
    2 lines:  "a" "b"But what has me worried is that I cannot find any documentation concerning this "first pattern of an OR" rule. This means that maybe the Java regex engine could change in the future, which is worrisome.
    The only bulletproof way that I know of to do line splitting is the complicated regex
    "(?:(?<=\\r)\\n)" + "|" + "(?:\\r(?!\\n))" + "|" + "(?:\\r\\n)" + "|" + "\\u0085" + "|" + "\\u2028" + "|" + "\\u2029"Here, I use negative lookbehind and lookahead in the first two patterns to guarantee that they never match on the end or start of a \r\n, but only on isolated \n and \r chars. Thus, no matter which order the patterns above are applied by the regex engine, it will work correctly. I also used non-capturing groups
    (?:X)
    to avoid memory wastage (since I am only interested in grouping, and not capturing).
    Is the above complicated regex the only reliable way to do line splitting?

    bbatman wrote:
    Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
    The longest match wins, normally. Except for alternation (or) as can be read from the innocent sentence
    The Pattern engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.
    in the javadocs. More information can be found in Friedl's book, the relevant page of which google books shows at
    [http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false|http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false]
    If this link does not survive, search google for
    regular expression "ordered alternation"
    My first hit went right into Friedl's book.
    Harald.

  • Regular Expression he!!  =)

    Okay, the regexp needs to dig through an HTML file and print out all the links. Here's what I've got:
    Pattern p = Pattern.compile("a\\shref=\"(.)+\"", Pattern.MULTILINE);
    Matcher m = p.matcher(fileData);
    while (m.find()) {
    System.out.println(m.group() + "\n");
    Yes, I know the regexp I'm using isn't very good to find links, but I'm starting simple. What (I think) the above should match, is "a", followed by a space, followed by "href=", followed by a quote, any text, and another quote.
    I have the text:
    This is a "test"
    What gets printed off for this text is:
    a href="test.html">This is a "test"
    Instead of quitting after the first " it finds, it continues for more. It's not always the third quote either, sometimes it's 5, etc.
    I have no idea why this is occurring. It's either a bad regexp or I'm not utilizing the Java language properly.
    Any help would be greatly appreciated!

    You think you are searching for a (") followed by (some_text) followed immediately by a (").
    But that's not what you are doing...
    Java's Regular expressions are "greedy" by default. This means the (+) operator will take characters until it cannot take anymore. In your case, it is taking all the characturs until the end of your string. Then it looks for a ("), but it cannot find one, because it ate up all the characters. It's at the end of your string. This is because you used a (.), which means any character!!
    Now it is going to back up one character at a time until it finds a (") it can use. Well, it only has to back up one time, because there is a (") at the end of your string. So it takes that (") as it's last part of the match.
    This isn't what you intended...
    You wanted it to start at a quote and keep going until it finds the very next quote. The pattern you should use is something akin to the following:
    a\\s+href=\"([^\"])+\"Now it is saying: start at the first (") and eat everything that is not a ("). Then grab the next (") it finds.
    Note the stuff in the parentheses. I used ([^\"]), which means "anything that is not a ("). This is different from what you had. (.) means "anything at all...including a quote".

  • Regular Expression/Replace - Oracle 7.3

    Hi!
    I am trying the regular expression SQL functions of 10g to Oracle 7.3 and it seems the older version does not cover this feature yet.
    "Aaaa,Bbbb" --> "Aaaa, Bbbb"
    REPLACE *",[0-9A-Za-z]"* WITH *", "*
    The string pattern is to look for comma-punctuations that is not followed immediately by whitespacess so I can replace this with a comma followed by a whitespace.
    Any workaround for this?

    Hi,
    Welcome to the forum!
    kitsune wrote:
    Hi!
    I am trying the regular expression SQL functions of 10g to Oracle 7.3 and it seems the older version does not cover this feature yet.You're right; regular expressions only work in Oracle 10.1 and higher.
    >
    >
    "Aaaa,Bbbb" --> "Aaaa, Bbbb"
    REPLACE *",[0-9A-Za-z]"* WITH *", "*
    The string pattern is to look for comma-punctuations that is not followed immediately by whitespacess so I can replace this with a comma followed by a whitespace.
    Any workaround for this?You're best bet in Oracle 7.3 would be a user-defined function. That's a very old version; don't expect much.
    Do you know anything else about the string? For example, is there some character (say ~) that never occurs in the string? Will there ever be two (or more) whitespace characters after punctuation? What characters do you consider to be whitespace? Which are punctuation? Depending on the answers, you might be able to do something with nested REPLACE and/or TRANSLATE functions.

Maybe you are looking for

  • Really slow rendering on Mac Book Pro Retina

    Hi I am using  adobe premiere on a mac book pro retina and the renders are reallyyy reallyyy slow. Usually takes 30 mins for a 2 mins timeline to render. Iv also tried rendering the same timeline on a windows machine and it takes 5 mins ( 1/6th ) for

  • New computer update failed and I lost all my apps.....

    Well I have not updated my phone is some time and while using a new computer to do so there was a connection problem and my phone become coorupt. While restoring the phone I lost all my applications, some which came at a very hight price. Now my phon

  • Classic infosets

    hi bw experts,                cam anyone explain about the term Classic infosets? Whats the difference between Infoset and Classic infoset? In the SAP library , its gives the list of all the datasets that u report on using Infoset query ? explain wha

  • Labview Developer Position in Northern New Jersey

     Aavalar Consulting has been engaged to provide a Labview Software Engineer. A job description follows: Title:   Labview Developer (Test Applications) Work Location:       Northern NJ Area Is Telecommuting possible?    No Work environment:   Cube in

  • Download Sybase ASE 15.7 for  BS: SAP or Sybase ASE?

    Hi! I planning to install (system copy) an NW70EHP2 system with Sybase. There are two types of Sybase ASE installation materials: What type of recommended for the installation? The SAP ASE or the Sybase ASE? Thank you! Regards Janos