Regular Expression - Comments Elimination

Hello,
I'm trying to solve a problem, that I eliminate the comment text from an individual plain-text saved in an oracle table.
There are no special cases. I mean multiple comment patterns.
My Oracle Version is: 11.2.0.4.
Could you help me?

Hello
You can solve this problem with using the regular expression REGEXP_REPLACE.
An example for it:
SELECT strg
, REGEXP_REPLACE(strg, '/\*([^(/\*)|^(\*/)])*\*/', '', 1, 0) AS STRG_NEW
FROM (SELECT
'TextBlock: /* Dog
Cat
''*/ normal /* ''people */
text' AS STRG FROM DUAL)
Original-Text:
==============
TextBlock: /* Dog
Cat
'*/ normal /* 'people */
text
Result-Text:
============
TextBlock: normal
text
With this Select you can eliminate every Comment-Blocks, but it works right only if the comment pattern is not in single quotation marks enclosed!
It is a simple solution, maybe this is what you are looking for.
Regards,
David Berger

Similar Messages

Regular Expressions and comments

Hello,
I've got a problem with regular expressions. I Want to find special words like "todo" in java comments, but wasn't able to manage that satisfactory. I hope someone of you can give me an advice! :-)
example of a comment:
* comment text. todo: what is still todo.
First of all, I tried this:
/\*(.*?)todo(.*?)\*/
but this version seems to ignore the borders of the comment and shows me also parts of the code.
Then I tried this:
/\*([^/]*?)todo(.*?)\*/
this version returns good results, but doesn't notice html formatting like
* <code> .... </code> text. todo: text
So my idea now is to check whether a star precedes the slash, but I don't really know how to combine it.
Or is there another simplier solution? Thanks for your hints!
Greetz Jan

Thanks for your answer, but when I take this expression the programm seems to hang-up. An operation that usually finishs in 3 secs didn't even in 10 minutes. :-( Do you have any idea what could be wrong?
btw. can I take this one:
pattern = Pattern.compile("/\\*(?:[^\\*]+|\\*(?!/))*todo.*?\\*/", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(text);
text = matcher.replaceAll("");or do I have to take a this instead of replaceAll():
while (matcher.find()) {
text = text.substring(0, matcher.start(1)) + text.substring(matcher.end(1), text.length());
matcher = pattern.matcher(text);
}any hints?
Greetz Jan

Looking for a regular expression that doesnt take in account java comments

Does somebody knows the valid Regular Expression to match 'cat', but none of this:
//cat
// cat
// cat
//   cat
// something catI've tried with some patterns without success:
[^/][^/](\\s)*cat
(?<!//)catI want to use this, for example, to search java code to detect uses of "System.out" not commented.
thanks

The direct approach--finding the text first and then figuring out whether it's a valid match--won't work. Java supports two other kinds of comment plus string literals, any of which could contain the text you're looking for. Also, comments can contain literals, string literals can contain things that look like comments, even char literals can contain quotation marks. The only way to be sure you're getting valid matches is to actively search for comments and literals so you can ignore them. Pattern p = Pattern.compile(
    "//.*+|"                            // inline comment, or
+ "/\\*(?:[^*]++|\\*(?!/))*+\\*/|"    // multiline or javadoc comment, or
+ "\"(?:[^\"\\\\]++|(?:\\\\.))*+\"|" // string literal, or
+ "\'(?:[^\'\\\\]++|(?:\\\\.))*+\'|" // char literal, or
+ "(System\\.out)"                    // bingo
Matcher m = p.matcher(str);
while (m.find())
if (m.start(1) != -1)
    // bingo
}

Regular Expression Question, Repetition Operators

These are my success entries for a field;
123456,
123456,123456,
123456,123456,123456,
123456,123456,123456,123456,
"," seperated 6 digits can be repeated unlimited times.
I found on documentation this; "Repetition Operators; {m,} Match at least m times" and for my need i tried this regular expression; "^[[[:digit:]]{6},]{1,}$", but didnt worked :(
Any comments?
Thank you very much :)
Tonguc

repeating exactly 6
{6}
repeating at least 1
+
repeating at least 6
{6,}
ok, your problem is [ instead of (

Regular expressions and capture groups

Hi everyone :)
Is there a way to override the default behaviour of capture groups in regular expressions? More specifically I want to override this:
"The captured input associated with a group is always the subsequence that the group most recently matched."
For example, if I have a string that is this:
* <comment one>
* <comment two>
<some text>
I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)" which will match multi-line comments. I have also specified the flag DOTALL so that the predefined character class '.' matches over line-breaks.
If I apply this pattern to the above string I get comment two being captured, not comment one. This is because of the stipulation that I cited above.
I need to be able to capture only the first match, and prevent the capture group from being overwritten by more recent matches.
Is this possible? Any ideas?
Thanks in advance.
Kind regards,
Ben Deany

Is there a way to override the default behaviour of
capture groups in regular expressions? More
specifically I want to override this:No, but you don't need to.
I have a pattern of the form "(.*)(/\\*.*\\*/)(.*)"
which will match multi-line comments.Comment two is captured by the second group because comment one is eaten by the first group. Use the reluctant quantifier "*?" on the . in the first group instead of the greedy quantifier "*" to get what is apparently the behavior you want. Then the first group will contain nothing, the second group will contain comments one and two, and the third group will contain the following text.
.* is a very powerful thing to use. It will match everything in its path, guzzling text like moonshine at Mardi Gras. The only reason it doesn't match comment two as well is because then the expression as a whole would not match.
The parentheses surrounding the first and third groups are not needed (unless you want to use group methods on them too).

Regular Expressions and Double Byte Characters ?

Is it possible to use Java Regular Expressions to parse
a file that will contain double byte characters ?
For example, I want a regular expression to match the following line
tag="double byte stuff" id="double byte stuff"

The comments on the bytes/strings were helpful. Thanks.
But I'm still confused as to what matching pattern could be used.
For example a pattern like:
[A-Za-z]
I assume would not match any double byte characters.
I also assume the following won't work either:
[\\p{Alpah}]
because it is posix - US-ASCII only.
So how do you say "match the tag, then take any characters,
double byte, ascii, whatever, then match the text tag - per the
original example ?

Trying to use regular expressions to convert names to Title Case

I'm trying to change names to their proper case for most common names in North America (esp. the U.S.).
Some examples are in the comments of the included code below.
My problem is that *retName = retName.replaceAll("( [^ ])([^ ]+)", "$1".toUpperCase() + "$2");* does not work as I expect. It seems that the toUpperCase method call does not actually do anything to the identified group.
Everything else works as I expect.
I'm hoping that I do not have to iterate through each character of the string, upshifting the characters that follow spaces.
Any help from you RegEx experts will be appreciated.
{code}
* Converts names in some random case into proper Name Case. This method does not have the
* extra processing that would be necessary to convert street addresses.
* This method does not add or remove punctuation.
* Examples:
* DAN MARINO --> Dan Marino
* old macdonald --> Old Macdonald <-- Can't capitalize the 'D" because of Ernst Mach
* ROY BLOUNT, JR. --> Roy Blount, Jr.
* CAROL mosely-BrAuN --> Carol Mosely-Braun
* Tom Jones --> Tom Jones
* ST.LOUIS --> St. Louis
* ST.LOUIS, MO --> St. Louis, Mo <-- Avoid City Names plus State Codes
* This is a work in progress that will need to be updated as new exceptions are found.
public static String toNameCase(String name) {
* Basic plan:
* 1. Strategically create double spaces in front of characters to be capitalized
* 2. Capitalize characters with preceding spaces
* 3. Remove double spaces.
// Make the string all lower case
String retName = name.trim().toLowerCase();
// Collapse strings of spaces to single spaces
retName = retName.replaceAll("[ ]+", " ");
// "mc" names
retName = retName.replaceAll("( mc)", " $1");
// Ensure there is one space after periods and commas
retName = retName.replaceAll("(\\.|,)([^ ])", "$1 $2");
// Add 2 spaces after periods, commas, hyphens and apostrophes
retName = retName.replaceAll("(\\.|,|-|')", "$1 ");
// Add a double space to the front of the string
retName = " " + retName;
// Upshift each character that is preceded by a space
// For some reason this doesn't work
retName = retName.replaceAll("( [^ ])([^ ]+)", "$1".toUpperCase() + "$2");
// Remove double spaces
retName = retName.replaceAll(" ", "");
return retName;
Edited by: FuzzyBunnyFeet on Jan 17, 2011 10:56 AM
Edited by: FuzzyBunnyFeet on Jan 17, 2011 10:57 AM

Hopefully someone will still be able to provide a RegEx solution, but until that time here is a working method.
Also, if people have suggestions of other rules for letter capitalization in names, I am interested in those too.
* Converts names in some random case into proper Name Case. This method does not have the
* extra processing that would be necessary to convert street addresses.
* This method does not add or remove punctuation.
* Examples:
* CAROL mosely-BrAuN --> Carol Mosely-Braun
* carol o'connor --> Carol O'Connor
* DAN MARINO --> Dan Marino
* eD mCmAHON --> Ed McMahon
* joe amcode --> Joe Amcode         <-- Embedded "mc"
* mr.t --> Mr. T                    <-- Inserted space
* OLD MACDONALD --> Old Macdonald   <-- Can't capitalize the 'D" because of Ernst Mach
* old mac donald --> Old Mac Donald
* ROY BLOUNT,JR. --> Roy Blount, Jr.
* ST.LOUIS --> St. Louis
* ST.LOUIS,MO --> St. Louis, Mo     <-- Avoid City Names plus State Codes
* Tom Jones --> Tom Jones
* This is a work in progress that will need to be updated as new exceptions are found.
public static String toNameCase(String name) {
     * Basic plan:
     * 1. Strategically create double spaces in front of characters to be capitalized
     * 2. Capitalize characters with preceding spaces
     * 3. Remove double spaces.
    // Make the string all lower case
    String workStr = name.trim().toLowerCase();
    // Collapse strings of spaces to single spaces
    workStr = workStr.replaceAll("[ ]+", " ");
    // "mc" names
    workStr = workStr.replaceAll("( mc)", " $1 ");
    // Ensure there is one space after periods and commas
    workStr = workStr.replaceAll("(\\.|,)([^ ])", "$1 $2");
    // Add 2 spaces after periods, commas, hyphens and apostrophes
    workStr = workStr.replaceAll("(\\.|,|-|')", "$1 ");
    // Add a double space to the front of the string
    workStr = " " + workStr;
    // Upshift each character that is preceded by a space and remove double spaces
    // Can't upshift using regular expressions and String methods
    // workStr = workStr.replaceAll("( [^ ])([^ ]+)", "$1"toUpperCase() + "$2");
    StringBuilder titleCase = new StringBuilder();
    for (int i = 0; i < workStr.length(); i++) {
        if (workStr.charAt(i) == ' ') {
            if (workStr.charAt(i+1) == ' ') {
                i += 2;
            while (i < workStr.length() && workStr.charAt(i) == ' ') {
                titleCase.append(workStr.charAt(i++));
            if (i < workStr.length()) {
                titleCase.append(workStr.substring(i, i+1).toUpperCase());
        } else {
            titleCase.append(workStr.charAt(i));
    return titleCase.toString();
{code}

Using regular expressions to find and replace code.

Hi! Semi-newbie coder here.
I'm trying to strip out code from multiple pages, I've tried regular expressions but I'm struggling to understand them. I also need to do it across a LOT of pages, so I need an automated way of doing it.
The best way I can explain is with an analogy:
I want to delete any string of characters that start with c, ends with t and includes anything inbetween, so it would pick up "cat, cut, chat, coconut, can do it" whatever appears in the middle of those.
Except, instead of c and t, I want it to find strings of code starting with <div class="advert" and ending with Vote<br> while picking up everything in between, (including spaces, code, comments, etc.). Then, deletes that whole string including the starting and ending.Is there a regular expression I could use in dreamweaver that could do this? Is there a way to do this at all?

Let me begin by saying I'm a complete idiot with DW's Reg Ex. I use Search Specific Tag whenever possible. See screenshot below.
Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
Good luck,
Nancy O.

Using Regular Expressions to Find Quoted Text

I have run into a couple problems with the following code.
1) Slash-Star and Slash-Slash commented text must be ignored.
2) It does not detect backslashed quotes, or if that backslash is backslashed.
Can this be accomplished with Regular Expressions, or should I implement this using if/indexOf logic?
Thank You in advance,
Brian
    * Finds position of next quoted string in a line
    * of source code.
    * If no strings exist, then a Pointer position of
    * (0,0) is returned.
    * @param startPos position to start search from
    * @param argText the line of text to search
    * @returns next string position
   public Pointer getQuotedStringPosition(int startPos, String aString) {
      String argText = new String( aString );
      Pattern p = Pattern.compile("[\"][^\"]+[\"]");
      Matcher m = p.matcher( argText.substring(startPos); );
      if( m.find() )
         return new Pointer( m.start() + startPos, m.end() + startPos );
      else
         return new Pointer( 0, 0 ); // indicates nothing was found
   }

YATArchivist was right about the regular expressions.
I think I've got it but somebody test it if you want. Let me know what you find.
I've included a barebones Position class as well...
import java.util.regex.*;
import java.io.*;
import java.util.*;
@author Joshua A. Logan, Jr.
public class RegexTest
   private static final String SLASH_SLASH = "(//.*)";
   private static final String SLASH_STAR =
                           "(/\\*(?:[^\\*]|(?:\\*(?!/)))+(\\*/)?)";
   private static final Pattern COMMENT_PATTERN =
                     Pattern.compile( SLASH_SLASH + "|" + SLASH_STAR );
   private static final Pattern QUOTED_STRING_PATTERN =
                  Pattern.compile( "\" ( (?:(\\\\.) | [^\\\"])*+ )     \"",
                                   Pattern.COMMENTS );
   // Breaking the above regular expression down, you'd have:
   //   " ( (?: (\\ .) | [^\\ "] ) *+ )   "
   //   ^          ^     ^     ^       ^      ^
   //   |          |     |     |       |      |
   //   1          2     3     4       5      6
   // which matches:
   // 1) The starting quote...
   // Followed by something that is either:
   // 2) some escaped sequence ( e.g. _\n_ or even _\"_ ),
   // 3)                ...or...
   // 4) a character that is neither a _\_ nor a _"_ .
   // 5) Keep searching this as much as possible, w/o giving up
   //                    any found text at the end.
   //        Note: the text found would be in group(1)
   // 6) Finally, find the ending quote!!
   public static Position [] getQuotedStringPosition( final String text )
      Matcher cm = COMMENT_PATTERN.matcher( text ),
              qm = QUOTED_STRING_PATTERN.matcher( text );
      final int len = text.length();
      int startPos = 0;
      List positions = new ArrayList();
      while ( startPos < len )
         if ( cm.find(startPos) )
            int commStart = cm.start(),
                commEnd   = cm.end();
            // are we starting @ a comment?
            if ( commStart == startPos )
               startPos = commEnd;
            else if ( qm.find(startPos) )
               // Search for unescaped strings in here.
               int stringStart = qm.start(1),
                   stringEnd   = qm.end(1);
               // Is the quote start after comment start?
               if ( stringStart > commStart )
                  startPos = commEnd; // restart search after comment end...
               else if ( (stringEnd > commEnd) ||
                         (stringEnd < commStart) )
                  // In this case, the "comment" is actually part of
                  // the quoted string. We found a match.
                  positions.add( new Position(text, qm.group(1),
                                              stringStart,
                                              stringEnd) );
                  int quoteEnd = qm.end();
                  startPos = quoteEnd;
               else
                  throw new IllegalStateException( "illegal case" );
            else
               startPos = commEnd;
         else
            // no comments were found. Search for unescaped strings.
            int quoteEnd = len;
            if ( qm.find( startPos ) ) {
               quoteEnd = qm.end();
               positions.add( new Position(text,
                                           qm.group(1),
                                           qm.start(1),
                                           qm.end(1)) );
            startPos = quoteEnd;
      return positions.isEmpty() ? Position.EMPTY_ARRAY
                                 : (Position[])positions.toArray(
                                          Position.EMPTY_ARRAY);
   public static void main( String [] args )
      try
         BufferedReader br = new BufferedReader(
                  new InputStreamReader(System.in) );
         String input = null;
         final String prompt = "\nText (q to quit): ";
         System.out.print( prompt );
         while ( (input = br.readLine()) != null )
            if ( input.equals("q") ) return;
            Position [] matches = getQuotedStringPosition( input );
            // What does it do?
            for ( int i = 0, max = matches.length; i < max; i++ )
               System.out.println( "-->" + matches[i] );
            System.out.print( prompt );
      catch ( Exception e )
         System.out.println ( "Exception caught: " + e.getMessage () );
class Position
   public Position( String target,
                    String match,
                    int start,
                    int end )
      this.target = target;
      this.match = match;
      this.start = start;
      this.end = end;
   public String toString()
      return "match==" + match + ",{" + start + "," + end + "}";
   final String target;
   final int start;
   final int end;
   final String match;
   public static final Position [] EMPTY_ARRAY = { };
}

Using regular expressions

Hi Experts,
After going through some documentation on regular expressions in Oracle I have tried to draw some conclusions about the same. As I wasn’t much confident on how the patterns are built, I have tried to interpret them by looking at the output. It’s basically a reverse engineering I have tried to do.
Please let me know if my interpretations are correct. Any additions /suggestions/corrections are most welcome.
Some of the examples may lack conclusions, please ignore those.
select regexp_substr('1PSN/231_3253/ABc','^([[:alnum:]]*)') from dual;
Output: 1PSN
Interpreted as:
^ From the start of the source string
([[:alnum:]]*) zero or more occurrences of alphanumeric characters
select regexp_substr('@@/231_3253/ABc','@*([[:alnum:]]+)') from dual;
Output: 231
Interpreted as:
@* Search for zero or more occurrences of @
([[:alnum:]]+) followed by one or more occurrences of alphanumeric characters
Note: In the above example oracle looks for @(zero times or more) immediately followed by alphanumeric characters.
Since a '/' comes between @ and 231 the o/p is 0 occurences of @ + one or more occurrences of alphanumerics.
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]*)') from dual;
Output: @
Interpreted as:
@+ one or more ocurrences of @
([[:alnum:]]*) followed by 0 or more occurrences of alphanumerics
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]+)') from dual;
Output: Null
Interpreted as:
@+ one or more occurences of @
([[:alnum:]]+) followed by one or more occurences of aplhanumerics
select regexp_substr('@1PSN/231_3253/ABc125','([[:digit:]]+)$') from dual;
Output: 125
Interpreted as:
([[:digit:]]+) one or more occurences of digits only
$ at the end of the string
select regexp_substr('@1PSN/231_3253/ABc','([^[:digit:]]+)$') from dual;
output: /ABc
Interpreted as:
([^[:digit:]]+)$ one or more occurrences of non-digit literals at the end of the string
'^' inside square brackets marks the negation of the class
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.)
SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT
FROM dual;
Output: http://www.oracle.com
Interpreted as:
[[:alnum:]]+ one or more occurences of alplanumeric characters
\.? dot optionally (backslash represents escape sequence,? represents optionally)
{3,4} 3 or 4 times
/? followed by forward slash optionally
If you have www.oracle.co.uk; {3,4} extracts it for you as well
Validate email:
select case when
       REGEXP_LIKE('[email protected]',
                   '^([[:alnum:]]+(\_?|\.))([[:alnum:]]*)@([[:alnum:]]+)(.([[:alnum:]]+)){1,2}$') then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Interpreted as:
([[:alnum:]]+(\_?|\.)) one or more occurrences of alpha numerics optionally followed by an underscore or dot
([[:alnum:]]*) followed by 0 or more occurrences of alplhanumerics
@ followed by @
([[:alnum:]]+) followed by one or more occurrences of alplhanumerics
(.([[:alnum:]]+)){1,2} followed by a dot followed by alphanumerics from once till max of twice (Ex- .com or .co.uk)
Output: Match Found
Input: [email protected]
Output: Match Found
Input: [email protected]
Output: No Match Found
Truncate the part, ending with digits
select regexp_substr('Yahoo11245@US','^.*[[:digit:]]',1) from dual;
Output: Yahoo11245
select regexp_substr('*Yahoo*11245@US','^.*[[:digit:]]',1) from dual;
Output: *Yahoo*11245
Interpreted as:
.* zero or more occurrences of any characters (dot signifies any character)
Replace 2 to 8 spaces with single space
select regexp_replace('Hello   you      OPs       there','[[:space:]]{2,8}',' ')
from dual;
Search for control characters
select case when
       regexp_like('Super' || chr(13) || 'Star' ,'[[:cntrl:]]')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Search for lower case letters only with a string length varying from a min of 3 to max of 12
select case when
       regexp_like('terminator' ,'^[[:lower:]]{3,12}$')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
4th character must be a special character
select case when
       regexp_like('ter*minator' ,'^...[^[:alnum:]]')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Ouput: Match Found
Case Sensitive Search
select case when
       regexp_like('Republic Of Africa' ,'of','c')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: No match found
c stands for case sensitive
select case when
       regexp_like('Republic Of africa' ,'of','i')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
i stands for case insensitive
Two consecutive occurences of characters from a to z
select regexp_substr('Republicc Of Africaa' ,'([a-z])\1', 1,1,'i') from dual;
Output: cc
Interpreted as:
([a-z]) character set a-z
\1 consecutive occurence of any character
1 starting from 1st character in the string
1 First occurence
i case insensitive
Three consecutive occurences of characters from 6 to 9
select case when
       regexp_like('Patch 10888 applied' ,'([7-9])\1\1')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Phone validator:
select case when
       regexp_like('123-44-5555' ,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$')
              then 'Match Found'
       else 'No Match Found'
       end
as output from dual;
Output: Match Found
Input: 111-222-3333
Output: No match found
Interpreted as:
^ start of the string
[0-9]{3} three ocurrences of digits from 0-9
- followed by hyphen
[0-9]{2} two ocurrences of digits from 0-9
- followed by hyphen
[0-9]{4} four ocurrences of digits from 0-9
$ end of the string
************************************************************************Source Links:
http://www.psoug.org/reference/regexp.html
http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
Edited by: Preta on Feb 25, 2010 4:38 PM
Corrected the example for www.oracle.com
Edited by: Preta Incorported Logan's comments

Hi,
It looks like you have a good understanding of how regular expressions work.
You can put comments like the ones in your message directly in the code. For example, your validate e-mail code could be re-written
select      case
         when REGEXP_LIKE ( '[email protected]'
                    , '^'          || -- Starting from the beginning of the string
                    '('          || -- Begin \1
                      '[[:alnum:]]+'|| --     0 or more alphnumerics
                      '(\_?|\.)'     || --     optional underscore or dot
                    ')'          || -- End \1
                    '([[:alnum:]]*)'|| -- 0 or more alphnumerics
                    '@'          || -- @ sign
                    '([[:alnum:]]+)'|| -- 1 or more alpanumerics
                    '('          || -- Begin \5
                      '\.'          || --   dot
                      '([[:alnum:]]+)'
                              || --   1 or more alphanumerics
                    ')'          || -- End \5
                    '{1,2}'          || -- \5 can occur 1 or 2 times
                    '$'             -- End of string
         then 'Match Found'
                else 'No Match Found'
            end          as output
from      dual;I find this easier to debug and maintain.
There's no denying, it does make the code very long. You be the judge of when to do this.
You use parentheses and \ unnceccessarily sometimes. That's not really an error; if you find they make the code easier to develop and maintain, use them as much as you like.
For example, about the 4th line of the regular expression as I formatted it above:
'(\_?|\.)'     || --     optional underscore or dotUnderscore has no special meaning in regular expressions (only in LIKE), so you don't have to escape it.
I might write that line:
'(_|\.)?'     || --     optional underscore or dotjust because I think it's clearer.
I think you forgot a \ about 7 lines later:
'\.'          || --   dotBe very careful about testing patterns that include literal dots; always make sure that a random character, like ~ , fails in a place where a dot is expected.

Pattern matching regular expressions

I'm attempting to determine if a string matches a pattern of containing less than 100 alphanumeric characters a-z or 0-9 case insensitive. So my regular expression string looks like:
"^[a-zA-Z0-9]{0,100}$"And I use something like...
Pattern pattern = Pattern.compile( regexString );I'd like to modify my regex string to include the email 'at' symbol "@". So that the at symbol will be allowed. But my understanding of regex is very limited. How do I include an "or at symbol" in my regex expression?
Thanks for your help.

* Code by sabre150
private static final Pattern emailMatcher;
    static
        // Build up the regular expression according to RFC821
        // http://www.ietf.org/rfc/rfc0821.txt
        // <x> ::= any one of the 128 ASCII characters (no exceptions)
        String x_ = "\u0000-\u007f";
        // <special> ::= "<" | ">" | "(" | ")" | "[" | "]" | "\" | "."
        //              | "," | ";" | ":" | "@" """ | the control
        //              characters (ASCII codes 0 through 31 inclusive and
        //              127)
        String special_ = "<>()\\[\\]\\\\\\.,;:@\"\u0000-\u001f\u007f";
        // <c> ::= any one of the 128 ASCII characters, but not any
        //             <special> or <SP>
        String c_ = "[" + x_ + "&&" + "[^" + special_ + "]&&[^ ]]";
        // <char> ::= <c> | "\" <x>
        String char_ = "(?:" + c_ + "|\\\\[" + x_ + "])";
        // <string> ::= <char> | <char> <string>
        String string_ = char_ + "+";
        // <dot-string> ::= <string> | <string> "." <dot-string>
        String dot_string_ = string_ + "(?:\\." + string_ + ")*";
        // <q> ::= any one of the 128 ASCII characters except <CR>,
        //               <LF>, quote ("), or backslash (\)
        String q_ = "["+x_+"$$[^\r\n\"\\\\]]";
        // <qtext> ::= "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext>
        String qtext_ = "(?:\\\\[" + x_ + "]|" + q_ + ")+";
        // <quoted-string> ::= """ <qtext> """
        String quoted_string_ = "\"" + qtext_ + "\"";
        // <local-part> ::= <dot-string> | <quoted-string>
        String local_part_ = "(?:(?:" + dot_string_ + ")|(?:" + quoted_string_ + "))";
        // <a> ::= any one of the 52 alphabetic characters A through Z
        //              in upper case and a through z in lower case
        String a_ = "[a-zA-Z]";
        // <d> ::= any one of the ten digits 0 through 9
        String d_ = "[0-9]";
        // <let-dig> ::= <a> | <d>
        String let_dig_ = "[" + a_ + d_ + "]";
        // <let-dig-hyp> ::= <a> | <d> | "-"
        String let_dig_hyp_ = "[-" + a_ + d_ + "]";
        // <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
        // String ldh_str_ = let_dig_hyp_ + "+";
        // RFC821 looks wrong since the production "<name> ::= <a> <ldh-str> <let-dig>"
        // forces a name to have at least 3 characters and country codes such as
        // uk,ca etc would be illegal! I shall change this to make the
        // second term of <name> optional by make a zero length ldh-str allowable.
        String ldh_str_ = let_dig_hyp_ + "*";
        // <name> ::= <a> <ldh-str> <let-dig>
        String name_ = "(?:" + a_ + ldh_str_ + let_dig_ + ")";
        // <number> ::= <d> | <d> <number>
        String number_ = d_ + "+";
        // <snum> ::= one, two, or three digits representing a decimal
        //              integer value in the range 0 through 255
        String snum_ = "(?:[01]?[0-9]{2}|2[0-4][0-9]|25[0-5])";
        // <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum>
        String dotnum_ = snum_ + "(?:\\." + snum_ + "){3}"; // + Dotted quad
        // <element> ::= <name> | "#" <number> | "[" <dotnum> "]"
        String element_ = "(?:" + name_ + "|#" + number_ + "|\\[" + dotnum_ + "\\])";
        // <domain> ::= <element> | <element> "." <domain>
        String domain_ = element_ + "(?:\\." + element_ + ")*";
        // <mailbox> ::= <local-part> "@" <domain>
        String mailbox_ = local_part_ + "@" + domain_;
        emailMatcher = Pattern.compile(mailbox_);
        System.out.println("Email address regex = " + emailMatcher);
    }Wow. Sheesh, sabre150 that's pretty impressive. I like it for two reasons. First it avoids some false negatives that I would have gotten using the regex I mentioned. Like, [email protected] is a valid email address which my regex pattern has rejected and yours accepts. It's unusual but it's valid. And second I like the way you have compartmentalized each rule so that changes, if any custom changes are desired, are easier to make. Like if I want to specifically aim for a particular domain for whatever reason. And you've commented it so that it is easier to read, for someone like myself who knows almost nothing about regex.
Thanks, Good stuff!

Regular Expressions find and replace

Hi ,
I have a question on using Regular Expressions in Java(java.util.regex).
Problem Description:
I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
I have to do it inside a servlet and hence have to use java.
I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
//Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
          String ddurl="http://www.google.com/";
String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
Matcher m = p.matcher (strHTML);
while(m.find())
m.replaceAll(ddurl+m.group(1));
what is wrong in this?
Thanks,
Rajiv

Right, here's the full monte (whatever that means):import java.util.regex.*;
public class Test1
public static void main(String[] args)
    String domain = "http://www.google.com/";
    String strHTML =
      " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
      " <img alt=\"Google logo\" src=images/logo.gif >\n" +
      " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
      " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
      " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
    String regex =
      "(<\\s*img.+?src\\s*=\\s*)   # Capture preliminaries in $1. \n" +
      "(?:                         # First look for URL in quotes. \n" +
      "   ([\"\'])                 #   Capture open quote in $2.   \n" +
      "   (?!http:)                #   If it isn't absolute...     \n" +
      "   /?(.+?)                  #    ...capture URL in $3       \n" +
      "   \\2                      #   Match the closing quote     \n" +
      " |                          # Look for non-quoted URL.      \n" +
      "   (?!http:)                #   If it isn't absolute...     \n" +
      "   /?([^\\s>]+)             #    ...capture URL in $4       \n" +
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
    Matcher m = p.matcher(strHTML);
    StringBuffer sbuf = new StringBuffer();
    while (m.find())
      String relURL = m.group(3) != null ? m.group(3) : m.group(4);
      m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
    m.appendTail(sbuf);
    System.out.println(sbuf.toString());
}First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

ACE20 Module, webservices and regular expressions.

Hello All,
I am trying to loadbalance requests for webservices in a serverfarm. But for some reason, ACE20 module y not making matches on the requests.
We have a serverfarm Prod1 with 2 real servers and another serverfarm named WSDL with other 2 real servers.
The idea is the following, if we receive the following string, /App.WebService, the ACE should redirect it to serverfarm Prod1, but if it receives /App.WebService?wsdl, it should be redirected to WSDL.
Request with string /App.WebService --------------> ServerFarm Prod1
Request with string /App.WebService?wsdl -----> ServerFarm WSDL
We use regular expression in L7 class maps to make the loadbalance to happen.
class-map type http loadbalance match-all APP.WEBSERVICES-L7-SLB
2 match http url /App\.WebService\?wsdl
class-map type http loadbalance match-all APP-L7-SLB
2 match http url /App\.WebService
policy-map type loadbalance first-match L7_SLB-POLICY
class APP.WEBSERVICES-L7-SLB
    serverfarm WSDL
class APP-L7-SLB
    serverfarm Prod1
class L4_SLB_DATAPOWER(9050)
    loadbalance vip inservice
    loadbalance policy L7_SLB-POLICY
    loadbalance vip icmp-reply
    appl-parameter http advanced-options HTTP_PARAM
    ssl-proxy server wildcard.test.org
    connection advanced-options TCP_PARAM
But the ACE20 Module seems to be removing the ?wsdl from the URL and only the class-map called APP-L7-SLB is being matched.
Any comments or suggestions on why this could be happening?
Thanks in advance,
Fernando

Hello Kanwal and all,
Finally, after reading and reading I found a fix to this problem. Seems that the HTTP protocol uses the question mark (?) character as a delimiter for data appended to the URL. So, if you get the following:
www.test1.org/App.WebService?wsdl
If you configured a L7 class map to parse the URL, it will only parse until the question mark (?).
So you need to create a PARAMETER-MAP changing the URL delimiter start. Here is an example:
parameter-map type http HTTP_PARAMETER_MAP_WSDL
persistence-rebalance strict
set secondary-cookie-delimiters ;!@?
set secondary-cookie-start ;
I used the semicolon ( ; ) as delimiter.
Hope this helps.
Fernando

Need to remove Commas REgular Expressions?

How can I use java to remove commas from a number.
1,000 string
need it to be
1000
can I pass it through some sort of regular expression?

I was attempting to do it with regular expressions to learn how to do them better.
Thanks for your good comment.
nupevic

Trouble with tribbles, i mean regular expressions

Hi all
im trying to make a regular expression that finds all the comments in a given string
stuff like this
/* this is my comment */
then what i want to do is to remove all those found comments from the string and leave me with the original string without any comments.
anyway i know i need a pattern but i cant get my pattern right, this is what i have at the moment
Pattern remComment = Pattern.compile(" ^\\*?[\\w\\s\\W]+?*\\ ", Pattern.DOTALL | Pattern.MULTILINE);
can anyone help me and let me know where im going wrong
im basically trying to say any thing that starts with \* <any other text here> until a *\
thanks

There is something else wrong, I just tryied:
* Comments.java
* version 1.0
* 07/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class Comments {
     * @param args
    public static void main(String[] args) {
        String comment =
            "/*\r\n" +
            "does this do it won\r\n" +
            "*/\r\n" +
            ".activegrouptab\r\n" +
            "{\r\n" +
            "white-space:nowrap;\r\n" +
            "border-top-width:1pt;\r\n" +
            "cursor:hand;\r\n" +
            "}\r\n" +
            "/*\r\n" +
            "does this do it too\r\n" +
            "*/\r\n" +
            ".activegrouptabdisabled\r\n" +
            "{\r\n" +
            "font-family:verdana;\r\n" +
            "font-size:7pt;\r\n" +
            "}\r\n" +
            "/*\r\n" +
            "does this do it free\r\n" +
            "*/\r\n" +
            ".activesectiontab\r\n" +
            "{\r\n" +
            "width:90%;\r\n" +
            "background-image:url(hocbt.gif);\r\n" +
            "overflow:auto;\r\n" +
            "color:#ffffff;\r\n" +
        Pattern remComment = Pattern.compile("/\\*.*?\\*/", Pattern.DOTALL | Pattern.MULTILINE);
        Matcher matcher = remComment.matcher(comment);
        while( matcher.find() ) {
            System.out.println( "The comment: \n" + matcher.group() );
            System.out.println("----");
}with the output:
The comment:
does this do it won
The comment:
does this do it too
The comment:
does this do it free
----

Regular Expression - Comments Elimination

Similar Messages

Maybe you are looking for