String splitting with regular expressions

Hello everyone
I need some help in splitting the string using regular expressions
Suppose my String is : abc def "ghi jkl mno" pqr stu
after splitting the reulsting string array should contain the elements
abc
def
ghi jkl mno
pqr
stu
what my regular expression should be

Since this is essentially the same as parsing CSV data, you might want to download a CSV parser and adapt it to your need. But if you want to use regexes, split() is not the way to go. This approach should work for your sample data:
Pattern p = Pattern.compile("\"[^\"]*+\"|\\S+");
Matcher m = p.matcher(input);
while (m.find())
  System.out.println(m.group());
}

Similar Messages

  • Need advice on negating a whole string line with regular expression

    Hi All,
    I am not able to ignore / get rid of the following line even though my Java 6 (Windows XP) String Pattern matching has not taken cater for it:
    *% Cleared: 61%*
    Below is the existing Java String Pattern matching in the simple program:
    Pattern pattern = Pattern.compile("(^.*[A-Z][a-z]*){1,2} \\d{0,4}/?\\d{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La \\d br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}.*$");This pattern is working for valid strings.
    The following pattern has included "^(?!.*\.\.).*$" into the existing one but had no luck still:
    Pattern pattern = Pattern.compile("^(?!.*\.\.).*$|((^.*[A-Z][a-z]*){1,2} \\d{0,4}/?\\d{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La \\d br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}.*$)");This picked up other rubbish including "*% Cleared: 61%*".
    I am looking for a single regular expression that applies to the whole line.
    I am quite new to regular expression but has read through Regular Expressions Cookbook (Oreilly - 2009) and is still not familiar with advance functions such as lookahead / lookbehind...
    Your assistance would be appreciated.
    Thanks,
    Jack

    Hi Winston,
    I am still digesting the material from the regular expression book and will take sometime to become proficient with it.
    It seems that using groupCount() to eliminate the unwanted text does not work in this case, since all the lines returned the same value. Ie 3 posted earlier. This may be because the patterns are complex and only a few were grouped together. Otherwise, could you provide an example using the string posted as opposed to a hyperthetic one. In the meantime, at least one solution have been found by defining an additional special pattern “\\A[^%].*\\Z”, before combining / intersecting both existing and the new special pattern to get the best of both world. Another approach that should also work is to evaluate the size of String.split() and only accept those lines with a minimum number of tokens.
    Anyhow, I have come a crossed another minor stumbling block in the mean time with the following line, where some hidden characters is preventing the existing pattern from reading it:
    o;?Mervan Bay 40 Boyde St 7 br t $250,000 X West Park AE
    Below is the existing regular expression that works for other lines with the same pattern but not for special hidden characters such as “o;?”:
    \\A([A-Z][a-z]*){1,2} [0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La [0-9] br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}\\ZIs it possible to come up with a regular expression to ignore them so that this line could be picked up? Would also like to know whether I could combine both the special pattern “\\A[^%].*\\Z” with existing one as opposed to using 2 separate patterns altogether?
    Many thanks,
    Jack

  • Get the string between li tags, with regular expression

    I have a unordered list, and I want to store all the strings between the li tags (<li>.?</li>)in an array:
    <ul>
    <li>This is String One</li>
    <li>This is String Two</li>
    <li>This is String Three</li>
    </ul>
    This is what have so far:
    <li>(.*?)</li>
    but it is not correct, I only want the string without the li tags.
    Thanks.

    No one?
    Anoyone here experienced with Regular Expression?

  • Help with Regular Expressions

    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
        protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
        protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
        protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
        public String fixPrefix (String w)
            return w.replaceFirst(PUNC_PREFIX, "");
        public String fixSuffix (String w)
            return w.replaceFirst(PUNC_SUFFIX, "");
        }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. Doesn't the ^ mean in the front, and doesn't the $ mean in the back, so shouldn't this work by just changing replaceFirst to replace?

    JFactor2004 wrote:
    I have some sample code for string editing using regular expressions but I'm a little confused as to it's behavior. Here's what I have:
    public class WordFixer
    protected final String PUNC_MATCH = "[\\d\\p{Punct}]+";
    protected final String PUNC_PREFIX = "^" + PUNC_MATCH;
    protected final String PUNC_SUFFIX = PUNC_MATCH + "$";
    public String fixPrefix (String w)
    return w.replaceFirst(PUNC_PREFIX, "");
    public String fixSuffix (String w)
    return w.replaceFirst(PUNC_SUFFIX, "");
    }This replaces all leading and trailing punctuation with "" and it works. However, changing the replaceFirst's to just replace doesn't work. I don't understand why that is. The first parameter of the replaceFirst(...) is a regex-String. And the ^, $ and [...] stuff all have a special meaning in the regex language. The replace(...) method takes plain-Strings as a parameter, so the String "^\[\\d\\p{Punct}\]+" is interpreted as just that, without any special meaning.
    JFactor2004 wrote:
    Doesn't the ^ mean in the front, and doesn't the $ mean in the back, Yes, ^ means the start of the String (sometimes the start of a line) and $ means the end of the String (sometimes the end of a line).
    JFactor2004 wrote:
    so shouldn't this work by just changing replaceFirst to replace?Nope, see the explanation above.

  • Converting String Characters into Regular Expression Automatically?

    Hi guys.... is there any program or sample coding which is available to convert string characters into regular expression automatically when the program is run?
    Example:
    String Character Input: fnffffffffffnnnnnnnnnffffffnnfnnnnnnnnnfnnfnfnfffnfnfnfnfnfnnnnd
    When the program runs, it automatically convert into this :
    Regular Expression Output: f*d

    hey guys.... i am sorry for not providing all the information that you guys need as i was rushing off to urgent meeting... for my string characters i only have a to n.. all these characters are collected from sensors and stored inside database... from many demos i have done... i found out that every demo has different strings of characters collected and these string of characters will not match with the regular expressions that i had created due to several unwanted inputs and stuff... i have a lot of different types of plan activities and therefore a lot of regular expressions.... if i put [a-z|0-9]*... it will capture all characters but in the same time it will be showing 1 plan only.... therefore, i am finding ways to get the strings i collected and let it form into regular expression by themselves in the program so that it will appear as different plans as output with comparing with the regular expression that i had created.... is there any way to do so?
    please post again if there is any questions u are still not familiar with... thank you...

  • How to create a list of string if a regular expression is given ?

    Hi folks,
    I have a regular expression say abcd[a-z]\\\.[0-9] . ( please ignore one '\')
    For this string i know that
    following string matches successfully
    1. abca.0
    2. abcb.1
    3. abcz.9 ......etc n number of combination are possible.
    is there any algorithm which will create some randomn strings from a regular expression.
    input to algorithm : some string pattern
    output to algorithm : some matching strings ( can be a single or an array of matching strings)
    Thanks in advance..
    Sethu
    Edited by: Sethumadhavan on Apr 16, 2008 6:32 AM

    Can u please give little more explanation...
    If i get some some values i can exit with the values ... and from the values i got i can ignore the duplicates ...
    But i am not getting the basic algorithm to get list of strings.....( DFA? or NFA?)
    thanks
    sethu

  • Problem with Regular Expression

    Hi There!!
    I have a problem with regular expression. I want to validate that one word and second word are same. For that I have written a regex
    Pattern p=Pattern.compile("([a-z][a-zA-Z]*)\\s\1");
    Matcher m=p.matcher("nikhil nikhil");
    boolean t=m.matches();
    if (t)
              System.out.println("There is a match");
         else
              System.out.println("There is no match");
    The result I am getting is always "There is no match
    Your timely help will be much appreciated.
    Regards

    Ram wrote:
    ErasP wrote:
    You are missing a backward slash in the regex
    Pattern p = Pattern.compile("([a-z][a-zA-Z]*)\\s\\1");
    But this will fail in this case.
    Matcher m = p.matcher("Nikhil Nikhil");It is the reason for that *[a-z]*.The OP had [a-z][a-zA-Z]* in his code, so presumably he know what that means and wants that String not to match.

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Assistance with Regular Expression and Tcl

    Assistance with Regular Expression and Tcl
    Hello Everyone,
      I recently began learning Tcl to develop scripts for automating network switch deployments. 
    In my script, I want to name the device with a location and the last three octets of the base mac address.
    I can get the Base MAC address by : 
    show version | include Base
     Base ethernet MAC Address       : 00:00:00:DB:CE:00
    And I can get the last three octets of the MAC address using the following regular expression. 
    ([0-9a-f]{2}[:-]){2}([0-9a-f]{2}$)
    But I have not been able to figure out how to call the regular expression in the tcl script.
    I have checked several resources but have not been able to figure it out.  Suggestions?
    Ultimately, I want to set the last three octets to a variable (something like below) and then call the variable when I name the switch.
    set mac [exec "sh version | i Base"] (include the regular expression)
    ios_config "hostname location$mac"
    Thanks for any assistance in advance.
    Chris

    This worked for me.
    Switch_1(tcl)#set result [exec show ver | inc Base]   
    Base ethernet MAC Address       : 00:1B:D4:F8:B1:80
    Switch_1(tcl)#regexp {([0-9A-F:]{8}\r)} $result -> mac
    1
    Switch_1(tcl)#puts $mac                               
    F8:B1:80
    Switch_1(tcl)#ios_config "hostname location$mac"      
    %Warning! Hostname should contain at least one alphabet or '-' or '_' character
    locationF8:B1:80(tcl)#

  • How to search with regular expression

    I make pdx files so that I can search text quickly. But Acrobat doesn't provide a way to search with regular expression. I'm wondering if there is a way that I don't know to search for regular expression in Acrobat Pro 9?

    First, Acrobat must "mount" the PDX.
    As "Find" does not use the cataloged index, use Shift+Ctrl+F to open the advanced search dialog.
    It may be helpful to first enter Acrobat Preferences and for the Search category tick "Always use advanced search options".
    Back to the Search dialog - use the drop down menu for "Look In" to pick "Select Index" then, if no PDXs show, click the Add button.
    In the Open Index File dialog, browse to the location of the desired PDX and select it.
    OK out and use "Return results containing" to pick a "Match ..." requirement or Boolean.
    To become familiar with query syntax, for Acrobat, it is good to review Acrobat Help.
    http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS58a04a822e3e50102bd615109794195ff-7 c4b.w.html
    Be well...

  • How to split a string with regular expression

    Hi.
    I need to split a string with a regular expression.
    Example
    String = "this is; a test";rune haavik;12345;
    And I want the output to be:
    "this is; a test"
    rune haavik
    12345
    If I use this code:
    private void test1()
    String str = "\"this is; a test\";rune haavik;12345;";
    int i=0;
    String[] tmp = str.split(";");
    while(i<tmp.length)
    System.out.println(tmp);
    i++;
    Then it splits also in the "" text.
    Regards
    Rune haavik

    Rune haavik:
    The most effective way to achieve the end result is, I believe, to read the characters one by one, using a flag that indicates if we are inside quotation or not.
    Well, if we are in a mind game, then the following should do.
           String[] tmp = str.split(";(?![^\"]*\";)");

  • Replace string with regular expression

    I am new to the regular expression. I am just trying to remove double quotes from a string.
    My sample string is sql statement(SELECT "SCHEMA"."TABLE1"."COLUMN1", "SCHEMA"."TABLE1"."COLUMN2" FROM "SCHEMA"."TABLE1", "SCHEMA"."TABLE2" WHERE "SCHEMA"."TABLE1"."COLUMN1" = "SCHEMA"."TABLE2"."COLUMN1" );
    This is what I came up with.
    String s = query.replaceAll("\\.\"", ".").replaceAll("\"\\.", ".").replaceAll("\\s\"", " ").replaceAll("\\s\"", " ").replaceAll("\",", ",").replaceAll("\"\\s", " ");
    But I want to optimize this line.
    Could anybody tell me how to call replaceAll once with using regular expression instead of calling replaceAll multiple times? Some table or column names can have double quotes, so I want to remove the double quotes without replacing the double quotes which is part of the table or column name.
    I'd appreciated it.

    caesarkim1 wrote:
    I am new to the regular expression. I am just trying to remove double quotes from a string.
    My sample string is sql statement(SELECT "SCHEMA"."TABLE1"."COLUMN1", "SCHEMA"."TABLE1"."COLUMN2" FROM "SCHEMA"."TABLE1", "SCHEMA"."TABLE2" WHERE "SCHEMA"."TABLE1"."COLUMN1" = "SCHEMA"."TABLE2"."COLUMN1" );
    This is what I came up with.
    String s = query.replaceAll("\\.\"", ".").replaceAll("\"\\.", ".").replaceAll("\\s\"", " ").replaceAll("\\s\"", " ").replaceAll("\",", ",").replaceAll("\"\\s", " ");
    ...Note that the following two lines are equivalent:
    s = s.replaceAll("a","").replaceAll("b","");
    s = s.replaceAll("a|b","");

  • Splitting html ul tags and their content into string arrays using regular expression

    <ul data-role="listview" data-filter="true" data-inset="true">
    <li data-role="list-divider"></li><li><a href="#"><h3>
    my title
    </h3><p><strong></strong></p></a></li>
    </ul>
    <ul data-role="listview" data-filter="true" data-inset="true">
    <li data-role="list-divider"></li><li>test.</li>
    </ul>
    I need to be able to slip this html into two arrays hold the entire <ul></ul> tag. Please help.
    Thanks.

    Hi friend.
    This forum is to discuss problems of C# development. Your question is not related to the topic of this forum.
    You'll need to post it in the dedicated Archived Forums N-R  > Regular Expressions
     for better support. Thanks for understanding.
    Best Regards,
    Kristin

  • Match beginning of line with Regular Expression

    I'm confused about dreamweaver's treatment of the characters
    ^ and $ (beginning of line, end of line) in regex searches. It
    seems that these characters match the beginning of the file, not
    the beginning of the various lines in the file. I would expect it
    to work the other way around. A search like:
    (^.)
    should match every line in the file, so that a find/replace
    could be performed at the beginning of each line, like this:
    HELLO$1
    which would add 'HELLO' at the start of each line in the
    file.
    Instead, this action only matches the first character of the
    file, sticks 'HELLO' in front of it, and then quits (or moves on to
    the next file). The endline character $ behaves in a similar
    fashion, matching only the end of the file, not the end of each
    line.
    I've searched, and all the literature about regular
    expressions in dreamweaver seems to indicate that I'm expecting the
    correct behavior:
    www.adobe.com/devnet/dreamweaver/articles/regular_expressions_03.html
    quote:
    ^ Beginning of input or line ^T matches "T" in "This good
    earth" but not in "Uncle Tom's Cabin"
    $ End of input or line h$ matches "h" in "teach" but not in
    "teacher"
    Thanks for any insight, folks.

    Hi Winston,
    I am still digesting the material from the regular expression book and will take sometime to become proficient with it.
    It seems that using groupCount() to eliminate the unwanted text does not work in this case, since all the lines returned the same value. Ie 3 posted earlier. This may be because the patterns are complex and only a few were grouped together. Otherwise, could you provide an example using the string posted as opposed to a hyperthetic one. In the meantime, at least one solution have been found by defining an additional special pattern “\\A[^%].*\\Z”, before combining / intersecting both existing and the new special pattern to get the best of both world. Another approach that should also work is to evaluate the size of String.split() and only accept those lines with a minimum number of tokens.
    Anyhow, I have come a crossed another minor stumbling block in the mean time with the following line, where some hidden characters is preventing the existing pattern from reading it:
    o;?Mervan Bay 40 Boyde St 7 br t $250,000 X West Park AE
    Below is the existing regular expression that works for other lines with the same pattern but not for special hidden characters such as “o;?”:
    \\A([A-Z][a-z]*){1,2} [0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La [0-9] br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}\\ZIs it possible to come up with a regular expression to ignore them so that this line could be picked up? Would also like to know whether I could combine both the special pattern “\\A[^%].*\\Z” with existing one as opposed to using 2 separate patterns altogether?
    Many thanks,
    Jack

  • Help with regular expression needed

    Hi,
    Perhaps someone here can help me with my regular expression I'm trying to build in my Java code.
    The regular expression that I'm looking to build consists of any non-whitespace character up until it finds one or two <>= symbols and then any character thereafter. So both these Strings would match the expression:
    City 1==London
    Age>=18
    The regular expression that I'm using is as follows:
    (\\S+)([><=]){1,2}(.+)However, group 1 always retrieves the first <>= symbol as in "City 1=". How can I make the <>= part greedy so that it retrieves both operator symbols?
    Thanks.

    Make the first group, the non-spaces, reluctant:
    "(\\S+?)([<>=]{1,2})(.+)"

Maybe you are looking for