Regular expression - splitting a string

I have a long string that I'm trying to split into a series of substrings. I would like each of the substrings to start with "TTL.. I'm fairly certain that I'm missing something very basic here. I've attached my code which yield NO GROUPS. I didn't see another method for returning the text that the regular expression matched.
String finalLongstring="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+"+
 "clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
 "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
 "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
 "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
 "TTL28,clip28,TTL29,clip29"
List<String> chapters = new ArrayList<String>();
 chapters.clear();
 Pattern chapter=null;
 chapter=Pattern.compile("(TTL\\d+([+,]|clip\\d+)*)");
 // || | | | | | | |
 // || | | | | | | Repeat (commas pluses and clips group) 0 or more times
 // || | | | | | one or more digits following 'clip'
 // || | | | | clip
 // || | | | or..
 // || | | plus or comma symbols
 // || | group the +, and clip information together
 // || one or more digits
 // |Match clips starting with TTL
 // |
 Matcher cp = chapter.matcher(finalLongstring); //NO MATCHES!!
 String [] temp = chapter.split(finalLongstring); //temp =EMPTY STRING ARRAY
 do{
 String chapterPlus=cp.group(1);
 if(cp.hitEnd()){break;}
 chapters.add(chapterPlus);
 }while(true);Thanks in advance for the help.
Icesurfer

The main reason your matcher didn't work is because you never told it to do anything. You have to call one of the methods matches(), find() or lookingAt(), and make sure it returns true, before you can use the group() methods. When I did that, your regex worked, but then I modified it to demonstrate a better use of capturing groups, as shown here: import java.util.regex.*;
public class Test
public static void main(String... args)
    String str="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+"+
       "TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
       "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
       "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
       "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
       "TTL28,clip28,TTL29,clip29";
    Pattern p = Pattern.compile("(TTL\\d+)[+,](clip\\d+)[+,]");
    Matcher m = p.matcher(str);
    while (m.find())
      System.out.printf("%6s %s%n", m.group(1), m.group(2));
}The reason your split() attempt didn't work is because the regex matched all of the text; the split() regex is supposed to match the parts you don't want. In fact, it did split the text, creating a list of empty strings, but then it threw them all away, because split() discards trailing empty fields by default.
Finally, the hitEnd() method is not appropriate in this context. It and the requireEnd() method were added to support the Scanner class in JDK 1.5. If you want to see how they work, look at the source code for Scanner, but for now, just classify them as an advanced topic. When you're iterating through text with the find() method, you stop when find() returns false, plain and simple.

Similar Messages

Remove regular expression from a string

Hello,
I have a string like this
@1test;'"{input+
Please help me to remove special characters from the string.

A: remove regular expression from a string

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
char = str+index(1).
WRITE char.
if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''. " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

Regular expression: Split String

Hi everybody,
I got to split a string when ' occurs. But the string should not be splitted, when a ? is leading the '. Means: Not split when ?'
How has the regular expression to look like?
This does NOT work:
String constHOCHKOMMA = "[^?']'";
Sample-String:
String edi = "UNA'UNB'UNH?'xyz";
Result should be
UNA
UNB
UNH?'xyz
Thanks regards
Mario

Hi
I think u can meke it in two ways
1. using split function u r giving single quote as your delimiter, after each split funcction just add one more split function with delimiter as ?, if it returns true add the previous splited string and next one
2. you are gooing through each and every char of your string and split when next single quote occur, for this u are comparing each of your char with ['] i believe,
just compare the char with '?' if it match ignore next single quote.
Regards
Abhijith YS

How to use regular expression to find string

hi,
who know how to get all digits from the string "Alerts 4520 ( 227550 ) ( 98 Available )" by regular expression, thanks
br, Andrew

Liu,
You can use RegEx as
d+
Whether you are using CL_ABAP_REGEX class then
report zars.
data: regex   type ref to cl_abap_regex,
      matcher type ref to cl_abap_matcher,
      match   type c length 1.
create object regex exporting pattern = 'd+'
                              ignore_case = ''.
matcher = regex->create_matcher( text = 'Test123tes456' ).
match = matcher->match( ).
write match
You can find more details regarding REGEX and POSIX examples here
http://www.regular-expressions.info/tutorial.html
a®

Regular expression - find if string does NOT contain text....

I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
joe schmoe<=jack, jane
should become
joe
schmoe
<=
jack
jane
As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Here's the code I plan on using for this:
 public String[] getWords(String input) {
 Matcher matcher = WORD_PATTERN.matcher(input);
 ArrayList<String> words = new ArrayList<String>();
 while (matcher.find()) {
 words.add(matcher.group());
 return (String[]) words.toArray(new String[0]);
 }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
([^\s]|^(,)|^(<=))+|,|<=
but that produces the following:
joe
schmoe<=jack,
jane
I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

Try:
* Tokenizer.java
* version 1.0
* 01/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class StrangeTokenizer {
 public static void main(String[] args) {
 String text = "joe schmoe<=jack, jane";
 Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
 Matcher matcher = pattern.matcher(text);
 while( matcher.find() ) {
 System.out.println( "Item: " + matcher.group(1) );
}May the code be with you.

Email Regular Expression with a String.Match()

I'm currently using a RichTextEditor for a user to build HTML
for a site. However, I want the application to scan for emails and
encode them so they are protected from spam bots when they go to
the live site. I've written a regular expression to find an email
and it seems to work, but it only returns one email at a time from
the string. I have had to revert to a while loop to traverse the
string until I'm satisfied. I don't particularly like that method
and would like to just do one String.match() query to retrieve all
of the emails. Can anyone see something here that I'm missing?

Try adding the global flag (g):
var emailPattern:RegExp =
/[a-z][\w.-]+@\w[\w.-]+\.[\w.-]*[a-z][a-z]+/g;
TS

Regular expression with delimited string

Hi,
I'm trying extract all characters in a string (as word or words) which is delimited by ' -- '
Been playing around with regular expression and got as far as this;
with t_vw
as (select 'hello -- world' txt from dual
union all
select 'hello-world' from dual
union all
select 'hello' from dual
union all
select 'hello -- world -- bye' from dual
union all
select 'hello--worldbye' from dual
select txt, regexp_substr(txt,'[^ -- ]+',1,1) word1,
regexp_substr(txt,'[^ -- ]+',1,2) word2,
regexp_substr(txt,'[^ -- ]+',1,3) word3
from t_vw;
It's returning;
"TXT","WORD1","WORD2","WORD3"
"hello -- world" "hello","world",""
"hello-world"          "hello","world",""
"hello"               "hello","",""
"hello -- world -- bye"     "hello","world","bye"
"hello--worldbye"      "hello","worldbye",""
So it seems to work in all cases apart from when there are no spaces before/after "--".
Any ideas?

Please enclose your code in *{noformat}{noformat}* tags to preserve your formatting and to prevent the forum software from mangling your regular expressions.
Also, you've given your input and show the output that you are getting, but I don't know what your issue is. If you could include the desired output and explain how it differs from what you are getting so far that would help.

Match Regular Expression Function input string format

Hi,
I am new to labview and was having some difficulties using the Match Regular Experssion Function.
I am using labview to communicate with a sensor. I have installed the NI device driver to do so. The output of my sensor is in the format,
X20
R40 P20 A123. The numbers in this case are arbitrary. I am trying to use Match Regular Expression Function to display and perform mathematical operations on the numbers. I am having difficulties formatting the input string on the Match Regular Expression Function. Could you please give me some tips on how to format the example I provided.
Thank

MoAgha wrote:
Hi,
I am new to labview and was having some difficulties using the Match Regular Experssion Function.
I am using labview to communicate with a sensor. I have installed the NI device driver to do so. The output of my sensor is in the format,
X20
R40 P20 A123. The numbers in this case are arbitrary. I am trying to use Match Regular Expression Function to display and perform mathematical operations on the numbers. I am having difficulties formatting the input string on the Match Regular Expression Function. Could you please give me some tips on how to format the example I provided.
Thank
Here is a way to do it if the format is constant (X R P A followed by a positive integer number).
Ben64

Regular Expressions with Unicode Strings - length restriction?

Hi,
I can't quite figure this one out. I am checking a String for the presence of a URL.. more specifically, a jpg or gif URL.
Anyway, the following reg exp will work fine for me. However, when testing with unicode data (chinese text) the expression will only work up to a certain string length. Here's an example:
boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
My thought is that since Unicode data takes up more space, there a limitation to dealing with Strings. Does anyone know what that number is? Or, is there another reason the reg exp fails??
thanks,
joe
Example::
This works for any length String I throw at it using standard ASCII text.. But a unicode string of a certain length won't recognize the URL (I doubt I can simply paste my example here and have it turn out correctly..)
DOESN'T WORK: (length is reported via text.length() as 344
"FWD: test_tancy: FWD: tancy: FWD: supporter:
浅淡色彩造清凉
要让居所看起来清爽凉快，可采用以白色为主调的布置。白色不但能增加空间感，还能营造明快宁静的气氛，让人情绪稳定。另外，有意识地增添一点冷色，也能令人在视觉上觉得畅快。不过，一间房内若全部使用冷色，或全部采用暖色，会使人感到不安。最好是确定主色后，小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设，是最省钱有效的一招，如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快！如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"
WORKS: (length is reported via text.length() as 296
"FWD: Joe: 要让居所看起来清爽凉快，可采用以白色为主调的布置。白色不但能增加空间感，还能营造明快宁静的气氛，让人情绪稳定。另外，有意识地增添一点冷色，也能令人在视觉上觉得畅快。不过，一间房内若全部使用冷色，或全部采用暖色，会使人感到不安。最好是确定主色后，小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设，是最省钱有效的一招，如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快！如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"

Perhaps you should check the version of Java you are using. I am using 1.4.2_04
public class A {
    public static void main(String[] args) throws UnsupportedEncodingException {
        String text = "FWD: test_tancy: FWD: tancy: FWD: supporter:                   " +
                new String(new char[]{(char) 35201, (char) 35753, (char) 23621, (char) 25152, (char) 30475, (char) 36215,
                                      (char) 26469, (char) 28165, (char) 29245, (char) 20937, (char) 24555, (char) 65292,
                                      (char) 21487, (char) 37319, (char) 29992, (char) 20197, (char) 30333, (char) 33394,
                                      (char) 20026, (char) 20027, (char) 35843, (char) 30340, (char) 24067, (char) 32622,
                                      (char) 12290, (char) 30333, (char) 33394, (char) 19981, (char) 20294, (char) 33021,
                                      (char) 22686, (char) 21152, (char) 31354, (char) 38388, (char) 24863, (char) 65292,
                                      (char) 36824, (char) 33021, (char) 33829, (char) 36896, (char) 26126, (char) 24555,
                                      (char) 23425, (char) 38745, (char) 30340, (char) 27668, (char) 27675, (char) 65292,
                                      (char) 35753, (char) 20154, (char) 24773, (char) 32490, (char) 31283, (char) 23450,
                                      (char) 12290, (char) 21478, (char) 22806, (char) 65292, (char) 26377, (char) 24847,
                                      (char) 35782, (char) 22320, (char) 22686, (char) 28155, (char) 19968, (char) 28857,
                                      (char) 20919, (char) 33394, (char) 65292, (char) 20063, (char) 33021, (char) 20196,
                                      (char) 20154, (char) 22312, (char) 35270, (char) 35273, (char) 19978, (char) 35273,
                                      (char) 24471, (char) 30021, (char) 24555, (char) 12290, (char) 19981, (char) 36807,
                                      (char) 65292, (char) 19968, (char) 38388, (char) 25151, (char) 20869, (char) 33509,
                                      (char) 20840, (char) 37096, (char) 20351, (char) 29992, (char) 20919, (char) 33394,
                                      (char) 65292, (char) 25110, (char) 20840, (char) 37096, (char) 37319, (char) 29992,
                                      (char) 26262, (char) 33394, (char) 65292, (char) 20250, (char) 20351, (char) 20154,
                                      (char) 24863, (char) 21040, (char) 19981, (char) 23433, (char) 12290, (char) 26368,
                                      (char) 22909, (char) 26159, (char) 30830, (char) 23450, (char) 20027, (char) 33394,
                                      (char) 21518, (char) 65292, (char) 23567, (char) 38754, (char) 31215, (char) 20351,
                                      (char) 29992, (char) 20123, (char) 21576, (char) 40092, (char) 26126, (char) 23545,
                                      (char) 27604, (char) 30340, (char) 33394, (char) 24425, (char) 12290, (char) 20837,
                                      (char) 22799, (char) 36141, (char) 32622, (char) 19968, (char) 20123, (char) 33394,
                                      (char) 35843, (char) 28165, (char) 20937, (char) 30340, (char) 39280, (char) 29289,
                                      (char) 25670, (char) 35774, (char) 65292, (char) 26159, (char) 26368, (char) 30465,
                                      (char) 38065, (char) 26377, (char) 25928, (char) 30340, (char) 19968, (char) 25307,
                                      (char) 65292, (char) 22914, (char) 20026, (char) 21488, (char) 28783, (char) 25442,
                                      (char) 20010, (char) 30333, (char) 33394, (char) 28783, (char) 32617, (char) 12289,
                                      (char) 22312, (char) 27927, (char) 25163, (char) 38388, (char) 25918, (char) 19968,
                                      (char) 22871, (char) 20912, (char) 34013, (char) 33394, (char) 30340, (char) 27792,
                                      (char) 28020, (char) 29992, (char) 20855, (char) 31561, (char) 12290, (char) 20026,
                                      (char) 24744, (char) 25552, (char) 20379, (char) 29983, (char) 27963, (char) 21672,
                                      (char) 35759, (char) 24182, (char) 31069, (char) 24744, (char) 29983, (char) 27963,
                                      (char) 24841, (char) 24555, (char) 65281, (char) 22914, (char) 19981, (char) 24076,
                                      (char) 26395, (char) 25171, (char) 25200, (char) 35831, (char) 22238, (char) 22797}) +
                "?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg";
        boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
        System.out.println("isURL="+isURL+", length="+text.length());
}Prints
isURL=true, length=344

Regular Expressions, split(), and the caret

I have a string delimited by the ^ character:
ITEM1^ITEM2^ITEM3
I've tried using:
split("^")
split("\^")
split("\x5E")
All to no avail. I either get "Invalid escape sequence" or I get the whole string. I'm looking to avoid the StringTokenizer way, just because this is neater IMHO.
Is this possible?

\ is the escape character for both Java and regex. The first escape character is for Java, so that the second \ will be treated as a literal in Java and passed as-is to the pattern matcher. The second \ is for the regex, so the pattern matcher will treat the ^ as a literal.

String splitting with regular expressions

Hello everyone
I need some help in splitting the string using regular expressions
Suppose my String is : abc def "ghi jkl mno" pqr stu
after splitting the reulsting string array should contain the elements
abc
def
ghi jkl mno
pqr
stu
what my regular expression should be

Since this is essentially the same as parsing CSV data, you might want to download a CSV parser and adapt it to your need. But if you want to use regexes, split() is not the way to go. This approach should work for your sample data:
Pattern p = Pattern.compile("\"[^\"]*+\"|\\S+");
Matcher m = p.matcher(input);
while (m.find())
System.out.println(m.group());
}

How to split a string with regular expression

Hi.
I need to split a string with a regular expression.
Example
String = "this is; a test";rune haavik;12345;
And I want the output to be:
"this is; a test"
rune haavik
12345
If I use this code:
private void test1()
String str = "\"this is; a test\";rune haavik;12345;";
int i=0;
String[] tmp = str.split(";");
while(i<tmp.length)
System.out.println(tmp);
i++;
Then it splits also in the "" text.
Regards
Rune haavik

Rune haavik:
The most effective way to achieve the end result is, I believe, to read the characters one by one, using a flag that indicates if we are inside quotation or not.
Well, if we are in a mind game, then the following should do.
String[] tmp = str.split(";(?![^\"]*\";)");

Command line style regular expression string parser

Hi people,
I am currently working on a program where I need to parse a file (or any input stream) line by line. I then need to parse every line for arguments. Each line is formatted similar to how arguments are passed to the command line. The regular expression needs to split every line by any encountered whitespace, but needs to be able to retain any whitespace within double quotes (i.e. "some spaced text here"). Arguments can be numbers, booleans and (quoted) strings. Quoted strings must also be able to have escaped quotes in it (as below). The quotes for the quoted string (the outer ones, obviously not the escaped ones) do not necessarily have to be retained.
An example input line:
arg1 arg2 "arg 3" "arg 4" 987 arg6 "arg \"arg \"arg 7"
Desired example output:
arg1
arg2
arg 3
arg 4
987
arg6
arg "arg "arg 7
After the input line has been split up the program will handle any parsing (i.e. numbers, booleans, etc.). The program currently uses a simple for loop to iterate over all characters in the line and splits it up appropriately by checking every character. However, if this can be done automatically by using a regular expression passed to String.split() (or with some use of the regex package), it would remove quite a bit of redunant code and make the program that much more maintainable.
I do not have much experience with regular expressions since I have never really had the need to use them, but if they can work in this case it would be great.
Thanks in advance for any help.

Almost any parsing problem can be solved if you throw a big enough and ugly enough regex at it, or so I'm told.
I think what you are doing is also amenable to java.io.StreamTokenizer:
import java.io.*;
import static java.io.StreamTokenizer.*;
public class StreamTokenizerExample {
    public static void main(String[] args) throws IOException {
        StringReader input = new StringReader("arg1 arg2 \"arg 3\" \"arg 4\" 987 arg6 \"arg \\\"arg\\\" arg 7\"\nnextline");
        StreamTokenizer in = new StreamTokenizer(input);
        in.eolIsSignificant(true);
        for(int ttype; (ttype = in.nextToken()) != TT_EOF; ) {
            switch (ttype) {
                case TT_WORD:
                    System.out.println("String[" + in.sval + "]");
                    break;
                case TT_NUMBER:
                    System.out.println("number[" + in.nval + "]");
                    break;
                case TT_EOL:
                    System.out.println("[EOL]");
                    break;
                case '"':
                    System.out.println("quoted[" + in.sval + "]");
                    break;
                default:
                    System.out.println("unexpected " + ttype);
{code}

Regular expressions its URGENT !!!

i have a long string of regular expressions seperated by "|" and i need to know which regular expression the particular string matched how can i find that and can i do it using java .util.regex
thanks in advance

Consider to use "capturing groups" or a better solution should be to split this long regular expression with alternations in small ones that will cause considerable reduction in backtracking. Also in this way will be easier to find what regular expression matches the target string.
Regards.

Regular expressions and limiting matched input

Hi everyone :) I am trying to put together a regular expression that matches strings that contain elements of the form;
{<some text>}
However, each piece of text may contain multiple embedded instances of this pattern. I want to ensure that I am always getting the first (or outermost) instance.
So, if I had;
{OneStart}{TwoStart}{TwoEnd}{OneEnd}
I want to make sure that I get 'One' first and 'Two' second. So I have to place a stipulation in my regular expression to match this pattern only if it has not located the patten previously.
At the moment, I have this -
([^\\{]*?)(\\{TagStart\\})(.*?)(\\{TagEnd\\})(.*)
What I think that I have to do is modify the first capture group '([^\\{]*?)' which at the moment only does not match if a preceding '{' is found to match only if a preceding '{<text>}' sequence is not found.
Anyone got any idea how to do this?
Thanks in advance.
Ben

Doesn't it work anyway, since you're using greedy operators? If not, won't it work if you remove the .* at the end and use find() rather than matches()? And finally, what's the (.*?) supposed to match? Looks to me like that should be .*

Regular expression - splitting a string

Similar Messages

Maybe you are looking for