Using Regular Expressions to Find Quoted Text

I have run into a couple problems with the following code.
1) Slash-Star and Slash-Slash commented text must be ignored.
2) It does not detect backslashed quotes, or if that backslash is backslashed.
Can this be accomplished with Regular Expressions, or should I implement this using if/indexOf logic?
Thank You in advance,
Brian
    * Finds position of next quoted string in a line
    * of source code.
    * If no strings exist, then a Pointer position of
    * (0,0) is returned.
    * @param startPos position to start search from
    * @param argText  the line of text to search
    * @returns next string position
   public Pointer getQuotedStringPosition(int startPos, String aString) {
      String argText = new String( aString );
      Pattern p = Pattern.compile("[\"][^\"]+[\"]");
      Matcher m = p.matcher( argText.substring(startPos); );
      if( m.find() )
         return new Pointer( m.start() + startPos, m.end() + startPos );
      else
         return new Pointer( 0, 0 ); // indicates nothing was found
   }

YATArchivist was right about the regular expressions.
I think I've got it but somebody test it if you want. Let me know what you find.
I've included a barebones Position class as well...
import java.util.regex.*;
import java.io.*;
import java.util.*;
  @author Joshua A. Logan, Jr.
public class RegexTest
   private static final String SLASH_SLASH = "(//.*)";
   private static final String SLASH_STAR =
                           "(/\\*(?:[^\\*]|(?:\\*(?!/)))+(\\*/)?)";
   private static final Pattern COMMENT_PATTERN =
                     Pattern.compile( SLASH_SLASH + "|" + SLASH_STAR );
   private static final Pattern QUOTED_STRING_PATTERN =
                  Pattern.compile( "\"  ( (?:(\\\\.) | [^\\\"])*+  )     \"",
                                   Pattern.COMMENTS );
   // Breaking the above regular expression down, you'd have:
   //   "  ( (?: (\\ .)  |  [^\\ "]  ) *+ )   "
   //   ^          ^     ^     ^       ^      ^
   //   |          |     |     |       |      |
   //   1          2     3     4       5      6
   // which matches:
   // 1) The starting quote...
   // Followed by something that is either:
   // 2) some escaped sequence ( e.g. _\n_  or even _\"_ ),
   // 3)                ...or...
   // 4) a character that is neither a _\_ nor a _"_ .
   // 5) Keep searching this as much as possible, w/o giving up
   //                    any found text at the end.
   //        Note: the text found would be in group(1)
   // 6) Finally, find the ending quote!!
   public static Position [] getQuotedStringPosition( final String text )
      Matcher cm = COMMENT_PATTERN.matcher( text ),
              qm = QUOTED_STRING_PATTERN.matcher( text );
      final int len = text.length();
      int startPos = 0;
      List positions = new ArrayList();
      while ( startPos < len )
         if ( cm.find(startPos) )
            int commStart = cm.start(),
                commEnd   = cm.end();
            // are we starting @ a comment?
            if ( commStart == startPos )
               startPos = commEnd;
            else if ( qm.find(startPos) )
               // Search for unescaped strings in here.
               int stringStart = qm.start(1),
                   stringEnd   = qm.end(1);
               // Is the quote start after comment start?
               if ( stringStart > commStart )
                  startPos = commEnd; // restart search after comment end...
               else if ( (stringEnd > commEnd) ||
                         (stringEnd < commStart) )
                  // In this case, the "comment" is actually part of
                  // the quoted string. We found a match.
                  positions.add( new Position(text, qm.group(1),
                                              stringStart,
                                              stringEnd) );
                  int quoteEnd = qm.end();
                  startPos = quoteEnd;
               else
                  throw new IllegalStateException( "illegal case" );
            else
               startPos = commEnd;
         else
            // no comments were found. Search for unescaped strings.
            int quoteEnd = len;
            if ( qm.find( startPos ) ) {
               quoteEnd = qm.end();
               positions.add( new Position(text,
                                           qm.group(1),
                                           qm.start(1),
                                           qm.end(1)) );
            startPos = quoteEnd;
      return positions.isEmpty() ? Position.EMPTY_ARRAY
                                 : (Position[])positions.toArray(
                                          Position.EMPTY_ARRAY);
   public static void main( String [] args )
      try
         BufferedReader br = new BufferedReader(
                  new InputStreamReader(System.in) );
         String input = null;
         final String prompt = "\nText (q to quit): ";
         System.out.print( prompt );
         while ( (input = br.readLine()) != null )
            if ( input.equals("q") ) return;
            Position [] matches = getQuotedStringPosition( input );
            // What does it do?
            for ( int i = 0, max = matches.length; i < max; i++ )
               System.out.println( "-->" + matches[i] );
            System.out.print( prompt );
      catch ( Exception e )
         System.out.println ( "Exception caught: " + e.getMessage () );
class Position
   public Position( String target,
                    String match,
                    int start,
                    int end )
      this.target = target;
      this.match = match;
      this.start = start;
      this.end = end;
   public String toString()
      return "match==" + match + ",{" + start + "," + end + "}";
   final String target;
   final int start;
   final int end;
   final String match;
   public static final Position [] EMPTY_ARRAY = { };
}

Similar Messages

  • Using Regular Expressions to replace Quotes in Strings

    I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
    String temp = "\"Hello\" i am a \"variable\"";
    temp = temp.replaceAll("\"","\\\\\"");
    however, this does not work and when i print out the code to the file the resulting code appears as:
    String someVar = ""Hello" i am a "variable"";
    and not as:
    String someVar = "\"Hello\" i am a \"variable\"";
    I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
    Thanks in advance.

    Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

  • Unable To Use Regular Expression To Find Function Names

    Hi,
    I am trying to create a regular expression to find function and procedure names without the static designation and the parameter list.  Sample source document:
    static function test
    static function test(i,j)
    function test
    function test(i,j)
    static procedure test
    static procedure test(i,j)
    procedure test
    procedure test(i,j)
    For each of the above samples, I would like only the word "test" to be found.
    Thanks

    I suggest starting with this expression:
    ^\s*(static\s+)?(function|procedure)\s+(?<NAME>\w+)
    Programmatically, the name can be extracted from the group called “NAME”.
    The expression can be improved.

  • How to use regular expression to find string

    hi,
    who know how to get all digits from the string "Alerts 4520 ( 227550 )  (  98 Available  )" by regular expression, thanks
    br, Andrew

    Liu,
    You can use RegEx as   
    d+
    Whether you are using CL_ABAP_REGEX class then
    report  zars.
    data: regex   type ref to cl_abap_regex,
          matcher type ref to cl_abap_matcher,
          match   type c length 1.
    create object regex exporting pattern = 'd+'
                                  ignore_case = ''.
    matcher = regex->create_matcher( text = 'Test123tes456' ).
    match = matcher->match( ).
    write match
    You can find more details regarding REGEX and POSIX examples here
    http://www.regular-expressions.info/tutorial.html

  • Using regular expressions to find and replace code.

    Hi! Semi-newbie coder here.
    I'm trying to strip out code from multiple pages, I've tried regular expressions but I'm struggling to understand them. I also need to do it across a LOT of pages, so I need an automated way of doing it.
    The best way I can explain is with an analogy:
    I want to delete any string of characters that start with c, ends with t and includes anything inbetween, so it would pick up "cat, cut, chat, coconut, can do it" whatever appears in the middle of those.
    Except, instead of c and t, I want it to find strings of code starting with <div class="advert" and ending with Vote<br> while picking up everything in between, (including spaces, code, comments, etc.). Then, deletes that whole string including the starting and ending.Is there a regular expression I could use in dreamweaver that could do this? Is there a way to do this at all?

    Let me begin by saying I'm a complete idiot with DW's Reg Ex.   I use Search Specific Tag whenever possible.  See screenshot below.
    Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
    Good luck,
    Nancy O.

  • Find/Replace Using Regular Expressions

    Can someone help me with this...I am using Regular expressions to
    FIND:
    http.*lid=([^&"]*)[^"]*
    REPLACE:
    $set(\1,ID_id,code)$
    So that in the following it will change this:
    a href="http://www.test.com/shc/s/home_10153_12605?lid=Search" rilt="Search"
    To this:
    a href="$set(Search,ID_id,code)$" rilt="Search
    Those expressions  work in Notepad++ but when i use dreamweaver it just replaces the http... with "$set(\1,ID_id,code)$" and doesnt reference the "search"
    Any help?
    Thanks

    Let me begin by saying I'm a complete idiot with DW's Reg Ex.   I use Search Specific Tag whenever possible.  See screenshot below.
    Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
    Good luck,
    Nancy O.

  • Find text using regular expression and add highlight annotation

    Hi Friends
                       Is it possible to find text using regular expression and add highlight annotation using plugin

    A plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.

  • Finding URLs using regular expression.

    I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
    "Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
    I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
    For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
    My acutal problem is to find the URL irrespective its position within the string.
    Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
    Matcher matcher = urlPattern.matcher(plainText);
    Map stringIndexMap = new HashMap();
    //Searching the input string for urlPattern...
    while(matcher.find()) {
    String urlString = matcher.group();
    //Storing the urls in a hashmap with their indices as keys....
    stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
    Set keySet = stringIndexMap.keySet();
    Iterator it = keySet.iterator();
    //Iterating over the hashmap containing urls...
    while(it.hasNext()) {
    String urlString = (String) stringIndexMap.get(it.next());
    * Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
    * using String index
    clickableURLString.replace(clickableURLString.indexOf(urlString),
    clickableURLString.indexOf(urlString) + urlString.length(),
    "<a href=\"#\" onclick=\"window.open('" + urlString
    + "')\">" + urlString + "</a>");
    return clickableURLString.toString();

    The end of the input is '$' as a regex.
    import java.util.regex.*;
    public class Prasanna{
      public static void main(String[] args){
        String text
    = "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
    //    String regex = "(http|https)://.+?(?:\\s|$)"; // this works
        String regex = "(http|https)://[^ ]+";          // this also works
        Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher mat = pat.matcher(text);
        while (mat.find()){
          System.out.println(mat.group());
    }

  • Using regular expressions for validating time fields

    Similar to my problem with converting a big chunk of validation into smaller chunks of functions I am trying to use Regular Expressions to handle the validation of many, many time fields in a flexible working time sheet.
    I have a set of FormCalc scripts to calculate the various values for days, hours and the gain/loss of hours over a four week period. For these scripts to work the time format must be in HH:MM.
    Accessibility guidelines nix any use of message box pop ups so I wanted to get around this by having a hidden/visible field with warning text but can't get it to work.
    So far I have:
    var r = new RegExp(); // Create a new Regular Expression Object
    r.compile ("^[00-99]:\\] + [00-59]");
    var result = r.test(this.rawValue);
    if (result == true){
    true;
    form1.flow.page.parent.part2.part2body.errorMessage.presence = "visible";
    else (result == false){
    false;
    form1.flow.page.parent.part2.part2body.errorMessage.presence = "hidden";
    Any help would be appreciated!

    Date and time fields are tricky because you have to consider the formattedValue versus the rawValue. If I am going to use regular expressions to do validation I find it easier to make them text fields and ignore the time patterns (formattedValue). Something like this works (as far as my very brief testing goes) for 24 hour time where time format is HH:MM.
    // form1.page1.subform1.time_::exit - (JavaScript, client)
    var error = false;
    form1.page1.subform1.errorMsg.rawValue = "";
    if (!(this.isNull)) {
      var time_ = this.rawValue;
      if (time_.length != 5) {
        error = true;
      else {
        var regExp = /^([01]?[0-9]|2[0-3]):[0-5][0-9]$/;
        if (!(regExp.test(time_))) {
          error = true;
    if (error == true) {
      form1.page1.subform1.errorMsg.rawValue = "The time must be in the format HH:MM where HH is 00-23 and MM is 00-59.";
      form1.page1.subform1.errorMsg.presence = "visible";
    Steve

  • How to fetch substring using regular expression

    Hi,
    I am new to using regular expression and would like to know some basic details of how to use them in Java.
    I have a String example= "http://www.google.com/foobar.html#*q*=database&aq=f&aqi=g10&fp=c9fe100d9e542c1e" and would like to get the value of "q" parameter (in bold) using regular expression in java.
    For the same example, when we tried using javascript:
    match = example.match("/^http:\/\/(?:(?!mail\.)[^\.]+?\.)?google\.[^\?#]+(?:.*[\?#&](?:as_q|q)=([^&]+))?/i}");
    document.write('
    ' + match);
    We are getting the output as: http://www.google.com/foobar.html#q=database,*database* where the bold text is the value of "q" parameter.
    In Java we are trying to get the value of the q parameter separately or atleast resembles the output given by JavaScript. Please help me resolving this issue.
    Regards
    Praveen

    BalusC wrote:
    Regex is a cumbersome solution for fixed patterns like URL's. String#substring() in combination with String#indexOf would most likely already suffice.I usually agree, although, in this case, finding the exact parameter might be difficult without a small regex, perhaps:
    "\\wq=\\s*"in conjunction with Pattern/Matcher, used similarly to an indexOf() to find the start of the parameter value.
    Winston

  • Searching for a substring using Regular Expression

    I have a lengthy String similar to repetetion of the one below
    String str="<option value='116813070'>Something1</option><option value='ABCDEF' selected>Something 2</option>"I need to search for the Sub string "<option value='ABCDEF' selected>" (need to get the starting index of sub string) and but the value ABCDEF can be anything numberic with varying length.
    Is there any way i can do it using regular expressions(I have no other options than regular expression)?
    thanks in advance.

    If you go through the tutorial then you will find this on the second page:
    import java.io.Console;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    public class RegexTestHarness {
        public static void main(String[] args){
            Console console = System.console();
            if (console == null) {
                System.err.println("No console.");
                System.exit(1);
            while (true) {
                Pattern pattern =
                Pattern.compile(console.readLine("%nEnter your regex: "));
                Matcher matcher =
                pattern.matcher(console.readLine("Enter input string to search: "));
                boolean found = false;
                while (matcher.find()) {
                    console.format("I found the text \"%s\" starting at " +
                       "index %d and ending at index %d.%n",
                        matcher.group(), matcher.start(), matcher.end());
                    found = true;
                if(!found){
                    console.format("No match found.%n");
    }It's does everything you need and a bit more. Adapt it to your needs then write a regular expression. Then if you have problems by all means come back and post them up here, but first at least attempt to solve it yourself.

  • Regular expression in FIND statement

    Hi All,
    I am writing the regular expressions.
    But i didn't get properly how to write them.
    I have one internal table with the five fields.
    Exapmle wa-mandt = '800'.
                 wa_number = '3768'
                 wa_path = '/usr/tmp/sapuser/3768/test.txt.'
    append wa to itab.
    Loop at itab itno wa.
    Here i need to find client and number system id from the WA using regular expression in singe line
    endloop.
    Can anybody please explain how to write this.
    Thanks,

    Hi,
    What do you mean by FIND?
    If I got it right, you can use a READ statement with KEY f1 f2 etc BINARY SEARCH.Mention all the fields you want in the KEY fields.
    Dont forget to SORT this itab before the loop.
    Thanks
    Kiran

  • Rplacing space with &nbsb; in html using regular expressions

    Hi
    I want to replace space with &nbsb; in HTML.
    I used  the below method to replace space in my html file.
    var spacePattern11:RegExp =/(\s)/g; 
    str= str.replace(spacePattern," "
    Here str varaible contains below html file.In this html file i want to replace space present between " What number does this  represents" with &nbsb;
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head>
    <body>
    <b><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B></B></FONT></P></TEXTFORMAT><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B> What number does this Roman numeral represents MDCCCXVIII ?</B></FONT></P></TEXTFORMAT></b>
    </body>
    </html>
    But by using the above regular expression i am getting like this.
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head><body>
    <b><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B></B></FONT></P></TEXTFORMAT><TEXTFORMAT LEADING="2"><P A LIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0 " KERNING="0"><B> What number does this represents</B></FONT></P></TEXTFORMAT></b>
    </body>
    </html>
    Here what happening means it was replacing space with &nbsb; in HTML tags also.But want to replace space with &nbsb; present in the outside of the HTML tags.I want like this using regular expressions in FLEX
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head>
    <body>What number does this represents</body>
    </html>
    Hi,Please give me the solution to slove the above problem using regular expressions
    Thanks in Advance to all
    Regards
    ssssssss

    sorry i missed some information in above,The modified information was in red color
    Hi
    I want to replace space with &nbsb; in HTML.
    I used  the below method to replace space in my html file.
    var spacePattern11:RegExp =/(\s)/g; 
    str= str.replace(spacePattern," "
    Here str varaible contains below html file.In this html file i want to replace space present between " What number does this  represents" with &nbsb;
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head>
    <body>
    <b><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B></B></FONT></P></TEXTFORMAT><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B> What number does this Roman numeral represents MDCCCXVIII ?</B></FONT></P></TEXTFORMAT></b>
    </body>
    </html>
    But by using the above regular expression i am getting like this.
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head><body>
    <b><TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B333C" LETTERSPACING="0" KERNING="0"><B></B></FONT></P></TEXTFORMAT><TEXTFORMAT LEADIN G="2"><P ALIGN="LEFT"><FONT FACE="Verdana" style = 'font-size:10px' COLOR="#0B33 3C" LETTERSPACING="0" KERNING="0"><B> What number does this represents</B></FONT></P></TEXTFORMAT></b>
    </body>
    </html>
    Here what happening means it was replacing space with &nbsb; in HTML tags also.But want to replace space with &nbsb; present in the outside of the HTML tags.I want like this using regular expressions in FLEX
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Untitled Document</title>
    </head>
    <body>What&nbsb;number&nbsb;does&nbsb;this&nbsb;represents</body>
    </html>
    Hi,Please give me the solution to slove the above problem using regular expressions
    Thanks in Advance to all
    Regards
    ssssssss

  • Using regular expressions in java

    Does anyone of you know a good source or a tutorial for using regular expressions in java.
    I just want to look at some examples....
    Thanks

    thanks a lot... i have one more query
    Boundary matchers
    ^      The beginning of a line
    $      The end of a line
    \b      A word boundary
    \B      A non-word boundary
    \A      The beginning of the input
    \G      The end of the previous match
    \Z      The end of the input but for the final terminator, if any
    \z      The end of the input
    if i want to use the $ for comparing with string(text) then how can i use it.
    Eg if it is $120 i got a hit
    but if its other than that if should not hit.

  • Using Regular Expressions in Numbers 09?

    Is there any way to use regular expressions in Numbers 09 in Find & Replace?

    kilowattradio wrote:
    Is there any way to use regular expressions in Numbers 09 in Find & Replace?
    NO !
    _Go to "Provide Numbers Feedback" in the "Numbers" menu_, describe what you wish.
    Then, cross your fingers, and wait _at least_ for iWork'10
    Yvan KOENIG (VALLAURIS, France) vendredi 25 septembre 2009 14:49:49

Maybe you are looking for

  • Outlook 2013 shared calendars show "no connection" and users can't see other's free / busy time in scheduling assistant

    Info: Outlook 2013 Pro Plus - Windows 7 64bit - Exchange 2007 Hosted by Rack Space - everything fully patched.... every user in our company on Office 2013 now has the same issue: shared calendars won't populate with the following error: "no connectio

  • Updating an Xcontol's appearance in a VI during edits

    I'm writing my first Xcontrol and am confused about it's appearance in the VI in which it's used. I dropped the Xcontrol on my VIs front panel. Then I made some changes to the Xcontrol's Facade. I would have thought that after making that change that

  • XMLBeans - how to get rid of xsi:nill="true"

    Hi, I am using the XMLBeans framework bundled with Weblogic. If any element is not set or set to null, the toString method of the complex element returns with an attribute xsi:nill="true" alongf wit the namespace definition for xsi in the XML header.

  • Cleanup OWB repository

    The OWB repository browser has become very slow so I want to cleanup the job output in the repository. Does anyone have a good script to do this?

  • New to Aperture: keywords and icons

    Hello, although i'm not a pro photographer, I decided to switch from iPhoto to Aperture (thanks to the cheaper price Aperture now has on the app store). I have imported one of my iphoto libraries, the main one with travel pictures, and I have a coupl