Regular expression - splitting a string

I have a long string that I'm trying to split into a series of substrings. I would like each of the substrings to start with "TTL.. I'm fairly certain that I'm missing something very basic here. I've attached my code which yield NO GROUPS. I didn't see another method for returning the text that the regular expression matched.
String finalLongstring="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+"+
   "clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
   "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
   "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
   "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
   "TTL28,clip28,TTL29,clip29"
List<String> chapters = new ArrayList<String>();
          chapters.clear();
          Pattern chapter=null;
          chapter=Pattern.compile("(TTL\\d+([+,]|clip\\d+)*)");
          //                      ||    |  |  | |  |    | |
          //                      ||    |  |  | |  |    | Repeat (commas pluses and clips group) 0 or more times
          //                      ||    |  |  | |  |    one or more digits following 'clip'
          //                      ||    |  |  | |  clip
          //                      ||    |  |  | or..
          //                      ||    |  |  plus or comma symbols
          //                      ||    |  group the +, and clip information together
          //                      ||    one or more digits
          //                      |Match clips starting with TTL
          //                      |
          Matcher cp = chapter.matcher(finalLongstring);  //NO MATCHES!!
          String [] temp = chapter.split(finalLongstring);  //temp =EMPTY STRING ARRAY
          do{
               String chapterPlus=cp.group(1);
               if(cp.hitEnd()){break;}
               chapters.add(chapterPlus);
          }while(true);Thanks in advance for the help.
Icesurfer

The main reason your matcher didn't work is because you never told it to do anything. You have to call one of the methods matches(), find() or lookingAt(), and make sure it returns true, before you can use the group() methods. When I did that, your regex worked, but then I modified it to demonstrate a better use of capturing groups, as shown here: import java.util.regex.*;
public class Test
  public static void main(String... args)
    String str="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+"+
       "TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
       "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
       "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
       "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
       "TTL28,clip28,TTL29,clip29";
    Pattern p = Pattern.compile("(TTL\\d+)[+,](clip\\d+)[+,]");
    Matcher m = p.matcher(str);
    while (m.find())
      System.out.printf("%6s %s%n", m.group(1), m.group(2));
}The reason your split() attempt didn't work is because the regex matched all of the text; the split() regex is supposed to match the parts you don't want. In fact, it did split the text, creating a list of empty strings, but then it threw them all away, because split() discards trailing empty fields by default.
Finally, the hitEnd() method is not appropriate in this context. It and the requireEnd() method were added to support the Scanner class in JDK 1.5. If you want to see how they work, look at the source code for Scanner, but for now, just classify them as an advanced topic. When you're iterating through text with the find() method, you stop when find() returns false, plain and simple.

Similar Messages

A: remove regular expression from a string

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
  char = str+index(1).
  WRITE char.
  if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''.  " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
  add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
  char = str+index(1).
  WRITE char.
  if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''.  " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
  add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

  • Regular expression: Split String

    Hi everybody,
    I got to split a string when <b>'</b> occurs. But the string should <u>not</u> be splitted, when a <b>?</b> is leading the <b>'</b>. Means: Not split when <b>?'</b>
    How has the regular expression to look like?
    This does NOT work:
    String constHOCHKOMMA = "[^?']'";
    <u>Sample-String:</u>
    String edi = "UNA'UNB'UNH?'xyz";
    Result should be
    UNA
    UNB
    UNH?'xyz
    Thanks regards
    Mario

    Hi
    I think u can meke it in two ways
    1. using split function u r giving single quote as your delimiter, after each split funcction just add one more split function with delimiter as ?, if it returns true add the previous splited string and next one
    2.  you are gooing through each and every char  of your string  and split when next single quote occur, for this u are comparing each of your char with ['] i believe,
    just compare the char with '?' if it match ignore next single quote.
    Regards
    Abhijith YS

  • How to use regular expression to find string

    hi,
    who know how to get all digits from the string "Alerts 4520 ( 227550 )  (  98 Available  )" by regular expression, thanks
    br, Andrew

    Liu,
    You can use RegEx as   
    d+
    Whether you are using CL_ABAP_REGEX class then
    report  zars.
    data: regex   type ref to cl_abap_regex,
          matcher type ref to cl_abap_matcher,
          match   type c length 1.
    create object regex exporting pattern = 'd+'
                                  ignore_case = ''.
    matcher = regex->create_matcher( text = 'Test123tes456' ).
    match = matcher->match( ).
    write match
    You can find more details regarding REGEX and POSIX examples here
    http://www.regular-expressions.info/tutorial.html

  • Regular expression - find if string does NOT contain text....

    I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
    joe schmoe<=jack, jane
    should become
    joe
    schmoe
    <=
    jack
    jane
    As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
    Here's the code I plan on using for this:
        public String[] getWords(String input) {
            Matcher matcher = WORD_PATTERN.matcher(input);
            ArrayList<String> words = new ArrayList<String>();
            while (matcher.find()) {
                words.add(matcher.group());
            return (String[]) words.toArray(new String[0]);
        }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
    ([^\s]|^(,)|^(<=))+|,|<=
    but that produces the following:
    joe
    schmoe<=jack,
    jane
    I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

    Try:
    * Tokenizer.java
    * version 1.0
    * 01/06/2005
    package samples;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * @author notivago
    public class StrangeTokenizer {
        public static void main(String[] args) {
            String text = "joe schmoe<=jack, jane";
            Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
            Matcher matcher = pattern.matcher(text);
            while( matcher.find() ) {
                System.out.println( "Item: " + matcher.group(1) );
    }May the code be with you.

  • Email Regular Expression with a String.Match()

    I'm currently using a RichTextEditor for a user to build HTML
    for a site. However, I want the application to scan for emails and
    encode them so they are protected from spam bots when they go to
    the live site. I've written a regular expression to find an email
    and it seems to work, but it only returns one email at a time from
    the string. I have had to revert to a while loop to traverse the
    string until I'm satisfied. I don't particularly like that method
    and would like to just do one String.match() query to retrieve all
    of the emails. Can anyone see something here that I'm missing?

    Try adding the global flag (g):
    var emailPattern:RegExp =
    /[a-z][\w.-]+@\w[\w.-]+\.[\w.-]*[a-z][a-z]+/g;
    TS

  • Regular expression with delimited string

    Hi,
    I'm trying extract all characters in a string (as word or words) which is delimited by ' -- '
    Been playing around with regular expression and got as far as this;
    with t_vw
    as (select 'hello -- world' txt from dual
    union all
    select 'hello-world' from dual
    union all
    select 'hello' from dual
    union all
    select 'hello -- world -- bye' from dual
    union all
    select 'hello--worldbye' from dual
    select txt, regexp_substr(txt,'[^ -- ]+',1,1) word1,
    regexp_substr(txt,'[^ -- ]+',1,2) word2,
    regexp_substr(txt,'[^ -- ]+',1,3) word3
    from t_vw;
    It's returning;
    "TXT","WORD1","WORD2","WORD3"
    "hello -- world" "hello","world",""
    "hello-world"          "hello","world",""
    "hello"               "hello","",""
    "hello -- world -- bye"     "hello","world","bye"
    "hello--worldbye"      "hello","worldbye",""
    So it seems to work in all cases apart from when there are no spaces before/after "--".
    Any ideas?

    Please enclose your code in *{noformat}{noformat}* tags to preserve your formatting and to prevent the forum software from mangling your regular expressions.
    Also, you've given your input and show the output that you are getting, but I don't know what your issue is.  If you could include the desired output and explain how it differs from what you are getting so far that would help.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Match Regular Expression Function input string format

    Hi,
    I am new to labview and was having some difficulties using the Match Regular Experssion Function.  
    I am using labview to communicate with a sensor.  I have installed the NI device driver to do so.  The output of my sensor is in the format, 
    X20
    R40 P20 A123.  The numbers in this case are arbitrary.  I am trying to use Match Regular Expression Function to display and perform mathematical operations on the numbers.  I am having difficulties formatting the input string on the Match Regular Expression Function.  Could you please give me some tips on how to format the example I provided.  
    Thank

    MoAgha wrote:
    Hi,
    I am new to labview and was having some difficulties using the Match Regular Experssion Function.  
    I am using labview to communicate with a sensor.  I have installed the NI device driver to do so.  The output of my sensor is in the format, 
    X20
    R40 P20 A123.  The numbers in this case are arbitrary.  I am trying to use Match Regular Expression Function to display and perform mathematical operations on the numbers.  I am having difficulties formatting the input string on the Match Regular Expression Function.  Could you please give me some tips on how to format the example I provided.  
    Thank
    Here is a way to do it if the format is constant (X R P A followed by a positive integer number).
    Ben64

  • Regular Expressions with Unicode Strings - length restriction?

    Hi,
    I can't quite figure this one out. I am checking a String for the presence of a URL.. more specifically, a jpg or gif URL.
    Anyway, the following reg exp will work fine for me. However, when testing with unicode data (chinese text) the expression will only work up to a certain string length. Here's an example:
    boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
    My thought is that since Unicode data takes up more space, there a limitation to dealing with Strings. Does anyone know what that number is? Or, is there another reason the reg exp fails??
    thanks,
    joe
    Example::
    This works for any length String I throw at it using standard ASCII text.. But a unicode string of a certain length won't recognize the URL (I doubt I can simply paste my example here and have it turn out correctly..)
    DOESN'T WORK: (length is reported via text.length() as 344
    "FWD: test_tancy: FWD: tancy: FWD: supporter:
    &#27973;&#28129;&#33394;&#24425;&#36896;&#28165;&#20937;
    &#35201;&#35753;&#23621;&#25152;&#30475;&#36215;&#26469;&#28165;&#29245;&#20937;&#24555;&#65292;&#21487;&#37319;&#29992;&#20197;&#30333;&#33394;&#20026;&#20027;&#35843;&#30340;&#24067;&#32622;&#12290;&#30333;&#33394;&#19981;&#20294;&#33021;&#22686;&#21152;&#31354;&#38388;&#24863;&#65292;&#36824;&#33021;&#33829;&#36896;&#26126;&#24555;&#23425;&#38745;&#30340;&#27668;&#27675;&#65292;&#35753;&#20154;&#24773;&#32490;&#31283;&#23450;&#12290;&#21478;&#22806;&#65292;&#26377;&#24847;&#35782;&#22320;&#22686;&#28155;&#19968;&#28857;&#20919;&#33394;&#65292;&#20063;&#33021;&#20196;&#20154;&#22312;&#35270;&#35273;&#19978;&#35273;&#24471;&#30021;&#24555;&#12290;&#19981;&#36807;&#65292;&#19968;&#38388;&#25151;&#20869;&#33509;&#20840;&#37096;&#20351;&#29992;&#20919;&#33394;&#65292;&#25110;&#20840;&#37096;&#37319;&#29992;&#26262;&#33394;&#65292;&#20250;&#20351;&#20154;&#24863;&#21040;&#19981;&#23433;&#12290;&#26368;&#22909;&#26159;&#30830;&#23450;&#20027;&#33394;&#21518;&#65292;&#23567;&#38754;&#31215;&#20351;&#29992;&#20123;&#21576;&#40092;&#26126;&#23545;&#27604;&#30340;&#33394;&#24425;&#12290;&#20837;&#22799;&#36141;&#32622;&#19968;&#20123;&#33394;&#35843;&#28165;&#20937;&#30340;&#39280;&#29289;&#25670;&#35774;&#65292;&#26159;&#26368;&#30465;&#38065;&#26377;&#25928;&#30340;&#19968;&#25307;&#65292;&#22914;&#20026;&#21488;&#28783;&#25442;&#20010;&#30333;&#33394;&#28783;&#32617;&#12289;&#22312;&#27927;&#25163;&#38388;&#25918;&#19968;&#22871;&#20912;&#34013;&#33394;&#30340;&#27792;&#28020;&#29992;&#20855;&#31561;&#12290;(UU&#20026;&#24744;&#25552;&#20379;&#29983;&#27963;&#21672;&#35759;&#24182;&#31069;&#24744;&#29983;&#27963;&#24841;&#24555;&#65281;&#22914;&#19981;&#24076;&#26395;&#25171;&#25200;&#35831;&#22238;&#22797;?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"
    WORKS: (length is reported via text.length() as 296
    "FWD: Joe: &#35201;&#35753;&#23621;&#25152;&#30475;&#36215;&#26469;&#28165;&#29245;&#20937;&#24555;&#65292;&#21487;&#37319;&#29992;&#20197;&#30333;&#33394;&#20026;&#20027;&#35843;&#30340;&#24067;&#32622;&#12290;&#30333;&#33394;&#19981;&#20294;&#33021;&#22686;&#21152;&#31354;&#38388;&#24863;&#65292;&#36824;&#33021;&#33829;&#36896;&#26126;&#24555;&#23425;&#38745;&#30340;&#27668;&#27675;&#65292;&#35753;&#20154;&#24773;&#32490;&#31283;&#23450;&#12290;&#21478;&#22806;&#65292;&#26377;&#24847;&#35782;&#22320;&#22686;&#28155;&#19968;&#28857;&#20919;&#33394;&#65292;&#20063;&#33021;&#20196;&#20154;&#22312;&#35270;&#35273;&#19978;&#35273;&#24471;&#30021;&#24555;&#12290;&#19981;&#36807;&#65292;&#19968;&#38388;&#25151;&#20869;&#33509;&#20840;&#37096;&#20351;&#29992;&#20919;&#33394;&#65292;&#25110;&#20840;&#37096;&#37319;&#29992;&#26262;&#33394;&#65292;&#20250;&#20351;&#20154;&#24863;&#21040;&#19981;&#23433;&#12290;&#26368;&#22909;&#26159;&#30830;&#23450;&#20027;&#33394;&#21518;&#65292;&#23567;&#38754;&#31215;&#20351;&#29992;&#20123;&#21576;&#40092;&#26126;&#23545;&#27604;&#30340;&#33394;&#24425;&#12290;&#20837;&#22799;&#36141;&#32622;&#19968;&#20123;&#33394;&#35843;&#28165;&#20937;&#30340;&#39280;&#29289;&#25670;&#35774;&#65292;&#26159;&#26368;&#30465;&#38065;&#26377;&#25928;&#30340;&#19968;&#25307;&#65292;&#22914;&#20026;&#21488;&#28783;&#25442;&#20010;&#30333;&#33394;&#28783;&#32617;&#12289;&#22312;&#27927;&#25163;&#38388;&#25918;&#19968;&#22871;&#20912;&#34013;&#33394;&#30340;&#27792;&#28020;&#29992;&#20855;&#31561;&#12290;(UU&#20026;&#24744;&#25552;&#20379;&#29983;&#27963;&#21672;&#35759;&#24182;&#31069;&#24744;&#29983;&#27963;&#24841;&#24555;&#65281;&#22914;&#19981;&#24076;&#26395;&#25171;&#25200;&#35831;&#22238;&#22797;?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"

    Perhaps you should check the version of Java you are using. I am using 1.4.2_04
    public class A {
        public static void main(String[] args) throws UnsupportedEncodingException {
            String text = "FWD: test_tancy: FWD: tancy: FWD: supporter:                   " +
                    new String(new char[]{(char) 35201, (char) 35753, (char) 23621, (char) 25152, (char) 30475, (char) 36215,
                                          (char) 26469, (char) 28165, (char) 29245, (char) 20937, (char) 24555, (char) 65292,
                                          (char) 21487, (char) 37319, (char) 29992, (char) 20197, (char) 30333, (char) 33394,
                                          (char) 20026, (char) 20027, (char) 35843, (char) 30340, (char) 24067, (char) 32622,
                                          (char) 12290, (char) 30333, (char) 33394, (char) 19981, (char) 20294, (char) 33021,
                                          (char) 22686, (char) 21152, (char) 31354, (char) 38388, (char) 24863, (char) 65292,
                                          (char) 36824, (char) 33021, (char) 33829, (char) 36896, (char) 26126, (char) 24555,
                                          (char) 23425, (char) 38745, (char) 30340, (char) 27668, (char) 27675, (char) 65292,
                                          (char) 35753, (char) 20154, (char) 24773, (char) 32490, (char) 31283, (char) 23450,
                                          (char) 12290, (char) 21478, (char) 22806, (char) 65292, (char) 26377, (char) 24847,
                                          (char) 35782, (char) 22320, (char) 22686, (char) 28155, (char) 19968, (char) 28857,
                                          (char) 20919, (char) 33394, (char) 65292, (char) 20063, (char) 33021, (char) 20196,
                                          (char) 20154, (char) 22312, (char) 35270, (char) 35273, (char) 19978, (char) 35273,
                                          (char) 24471, (char) 30021, (char) 24555, (char) 12290, (char) 19981, (char) 36807,
                                          (char) 65292, (char) 19968, (char) 38388, (char) 25151, (char) 20869, (char) 33509,
                                          (char) 20840, (char) 37096, (char) 20351, (char) 29992, (char) 20919, (char) 33394,
                                          (char) 65292, (char) 25110, (char) 20840, (char) 37096, (char) 37319, (char) 29992,
                                          (char) 26262, (char) 33394, (char) 65292, (char) 20250, (char) 20351, (char) 20154,
                                          (char) 24863, (char) 21040, (char) 19981, (char) 23433, (char) 12290, (char) 26368,
                                          (char) 22909, (char) 26159, (char) 30830, (char) 23450, (char) 20027, (char) 33394,
                                          (char) 21518, (char) 65292, (char) 23567, (char) 38754, (char) 31215, (char) 20351,
                                          (char) 29992, (char) 20123, (char) 21576, (char) 40092, (char) 26126, (char) 23545,
                                          (char) 27604, (char) 30340, (char) 33394, (char) 24425, (char) 12290, (char) 20837,
                                          (char) 22799, (char) 36141, (char) 32622, (char) 19968, (char) 20123, (char) 33394,
                                          (char) 35843, (char) 28165, (char) 20937, (char) 30340, (char) 39280, (char) 29289,
                                          (char) 25670, (char) 35774, (char) 65292, (char) 26159, (char) 26368, (char) 30465,
                                          (char) 38065, (char) 26377, (char) 25928, (char) 30340, (char) 19968, (char) 25307,
                                          (char) 65292, (char) 22914, (char) 20026, (char) 21488, (char) 28783, (char) 25442,
                                          (char) 20010, (char) 30333, (char) 33394, (char) 28783, (char) 32617, (char) 12289,
                                          (char) 22312, (char) 27927, (char) 25163, (char) 38388, (char) 25918, (char) 19968,
                                          (char) 22871, (char) 20912, (char) 34013, (char) 33394, (char) 30340, (char) 27792,
                                          (char) 28020, (char) 29992, (char) 20855, (char) 31561, (char) 12290, (char) 20026,
                                          (char) 24744, (char) 25552, (char) 20379, (char) 29983, (char) 27963, (char) 21672,
                                          (char) 35759, (char) 24182, (char) 31069, (char) 24744, (char) 29983, (char) 27963,
                                          (char) 24841, (char) 24555, (char) 65281, (char) 22914, (char) 19981, (char) 24076,
                                          (char) 26395, (char) 25171, (char) 25200, (char) 35831, (char) 22238, (char) 22797}) +
                    "?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg";
            boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
            System.out.println("isURL="+isURL+", length="+text.length());
    }Prints
    isURL=true, length=344

  • Regular Expressions, split(), and the caret

    I have a string delimited by the ^ character:
    ITEM1^ITEM2^ITEM3
    I've tried using:
    split("^")
    split("\^")
    split("\x5E")
    All to no avail. I either get "Invalid escape sequence" or I get the whole string. I'm looking to avoid the StringTokenizer way, just because this is neater IMHO.
    Is this possible?

    \ is the escape character for both Java and regex. The first escape character is for Java, so that the second \ will be treated as a literal in Java and passed as-is to the pattern matcher. The second \ is for the regex, so the pattern matcher will treat the ^ as a literal.

  • String splitting with regular expressions

    Hello everyone
    I need some help in splitting the string using regular expressions
    Suppose my String is : abc def "ghi jkl mno" pqr stu
    after splitting the reulsting string array should contain the elements
    abc
    def
    ghi jkl mno
    pqr
    stu
    what my regular expression should be

    Since this is essentially the same as parsing CSV data, you might want to download a CSV parser and adapt it to your need. But if you want to use regexes, split() is not the way to go. This approach should work for your sample data:
    Pattern p = Pattern.compile("\"[^\"]*+\"|\\S+");
    Matcher m = p.matcher(input);
    while (m.find())
      System.out.println(m.group());
    }

  • How to split a string with regular expression

    Hi.
    I need to split a string with a regular expression.
    Example
    String = "this is; a test";rune haavik;12345;
    And I want the output to be:
    "this is; a test"
    rune haavik
    12345
    If I use this code:
    private void test1()
    String str = "\"this is; a test\";rune haavik;12345;";
    int i=0;
    String[] tmp = str.split(";");
    while(i<tmp.length)
    System.out.println(tmp);
    i++;
    Then it splits also in the "" text.
    Regards
    Rune haavik

    Rune haavik:
    The most effective way to achieve the end result is, I believe, to read the characters one by one, using a flag that indicates if we are inside quotation or not.
    Well, if we are in a mind game, then the following should do.
           String[] tmp = str.split(";(?![^\"]*\";)");

  • Command line style regular expression string parser

    Hi people,
    I am currently working on a program where I need to parse a file (or any input stream) line by line. I then need to parse every line for arguments. Each line is formatted similar to how arguments are passed to the command line. The regular expression needs to split every line by any encountered whitespace, but needs to be able to retain any whitespace within double quotes (i.e. "some spaced text here"). Arguments can be numbers, booleans and (quoted) strings. Quoted strings must also be able to have escaped quotes in it (as below). The quotes for the quoted string (the outer ones, obviously not the escaped ones) do not necessarily have to be retained.
    An example input line:
    arg1 arg2 "arg 3" "arg 4" 987 arg6 "arg \"arg \"arg 7"
    Desired example output:
    arg1
    arg2
    arg 3
    arg 4
    987
    arg6
    arg "arg "arg 7
    After the input line has been split up the program will handle any parsing (i.e. numbers, booleans, etc.). The program currently uses a simple for loop to iterate over all characters in the line and splits it up appropriately by checking every character. However, if this can be done automatically by using a regular expression passed to String.split() (or with some use of the regex package), it would remove quite a bit of redunant code and make the program that much more maintainable.
    I do not have much experience with regular expressions since I have never really had the need to use them, but if they can work in this case it would be great.
    Thanks in advance for any help.

    Almost any parsing problem can be solved if you throw a big enough and ugly enough regex at it, or so I'm told.
    I think what you are doing is also amenable to java.io.StreamTokenizer:
    import java.io.*;
    import static java.io.StreamTokenizer.*;
    public class StreamTokenizerExample {
        public static void main(String[] args) throws IOException {
            StringReader input = new StringReader("arg1 arg2 \"arg 3\" \"arg 4\" 987 arg6 \"arg \\\"arg\\\" arg 7\"\nnextline");
            StreamTokenizer in = new StreamTokenizer(input);
            in.eolIsSignificant(true);
            for(int ttype; (ttype = in.nextToken()) != TT_EOF; ) {
                switch (ttype) {
                    case TT_WORD:
                        System.out.println("String[" + in.sval + "]");
                        break;
                    case TT_NUMBER:
                        System.out.println("number[" + in.nval + "]");
                        break;
                    case TT_EOL:
                        System.out.println("[EOL]");
                        break;
                    case '"':
                        System.out.println("quoted[" + in.sval + "]");
                        break;
                    default:
                        System.out.println("unexpected " + ttype);
    {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Regular expressions its URGENT !!!

    i have a long string of regular expressions seperated by "|" and i need to know which regular expression the particular string matched how can i find that and can i do it using java .util.regex
    thanks in advance

    Consider to use "capturing groups" or a better solution should be to split this long regular expression with alternations in small ones that will cause considerable reduction in backtracking. Also in this way will be easier to find what regular expression matches the target string.
    Regards.

  • Regular expressions and limiting matched input

    Hi everyone :) I am trying to put together a regular expression that matches strings that contain elements of the form;
    {<some text>}
    However, each piece of text may contain multiple embedded instances of this pattern. I want to ensure that I am always getting the first (or outermost) instance.
    So, if I had;
    {OneStart}{TwoStart}{TwoEnd}{OneEnd}
    I want to make sure that I get 'One' first and 'Two' second. So I have to place a stipulation in my regular expression to match this pattern only if it has not located the patten previously.
    At the moment, I have this -
    ([^\\{]*?)(\\{TagStart\\})(.*?)(\\{TagEnd\\})(.*)
    What I think that I have to do is modify the first capture group '([^\\{]*?)' which at the moment only does not match if a preceding '{' is found to match only if a preceding '{<text>}' sequence is not found.
    Anyone got any idea how to do this?
    Thanks in advance.
    Ben

    Doesn't it work anyway, since you're using greedy operators? If not, won't it work if you remove the .* at the end and use find() rather than matches()? And finally, what's the (.*?) supposed to match? Looks to me like that should be .*

  • Maybe you are looking for