Regular Expressions with Unicode Strings - length restriction?

Hi,
I can't quite figure this one out. I am checking a String for the presence of a URL.. more specifically, a jpg or gif URL.
Anyway, the following reg exp will work fine for me. However, when testing with unicode data (chinese text) the expression will only work up to a certain string length. Here's an example:
boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
My thought is that since Unicode data takes up more space, there a limitation to dealing with Strings. Does anyone know what that number is? Or, is there another reason the reg exp fails??
thanks,
joe
Example::
This works for any length String I throw at it using standard ASCII text.. But a unicode string of a certain length won't recognize the URL (I doubt I can simply paste my example here and have it turn out correctly..)
DOESN'T WORK: (length is reported via text.length() as 344
"FWD: test_tancy: FWD: tancy: FWD: supporter:
浅淡色彩造清凉
要让居所看起来清爽凉快，可采用以白色为主调的布置。白色不但能增加空间感，还能营造明快宁静的气氛，让人情绪稳定。另外，有意识地增添一点冷色，也能令人在视觉上觉得畅快。不过，一间房内若全部使用冷色，或全部采用暖色，会使人感到不安。最好是确定主色后，小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设，是最省钱有效的一招，如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快！如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"
WORKS: (length is reported via text.length() as 296
"FWD: Joe: 要让居所看起来清爽凉快，可采用以白色为主调的布置。白色不但能增加空间感，还能营造明快宁静的气氛，让人情绪稳定。另外，有意识地增添一点冷色，也能令人在视觉上觉得畅快。不过，一间房内若全部使用冷色，或全部采用暖色，会使人感到不安。最好是确定主色后，小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设，是最省钱有效的一招，如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快！如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"

Perhaps you should check the version of Java you are using. I am using 1.4.2_04
public class A {
    public static void main(String[] args) throws UnsupportedEncodingException {
        String text = "FWD: test_tancy: FWD: tancy: FWD: supporter:                   " +
                new String(new char[]{(char) 35201, (char) 35753, (char) 23621, (char) 25152, (char) 30475, (char) 36215,
                                      (char) 26469, (char) 28165, (char) 29245, (char) 20937, (char) 24555, (char) 65292,
                                      (char) 21487, (char) 37319, (char) 29992, (char) 20197, (char) 30333, (char) 33394,
                                      (char) 20026, (char) 20027, (char) 35843, (char) 30340, (char) 24067, (char) 32622,
                                      (char) 12290, (char) 30333, (char) 33394, (char) 19981, (char) 20294, (char) 33021,
                                      (char) 22686, (char) 21152, (char) 31354, (char) 38388, (char) 24863, (char) 65292,
                                      (char) 36824, (char) 33021, (char) 33829, (char) 36896, (char) 26126, (char) 24555,
                                      (char) 23425, (char) 38745, (char) 30340, (char) 27668, (char) 27675, (char) 65292,
                                      (char) 35753, (char) 20154, (char) 24773, (char) 32490, (char) 31283, (char) 23450,
                                      (char) 12290, (char) 21478, (char) 22806, (char) 65292, (char) 26377, (char) 24847,
                                      (char) 35782, (char) 22320, (char) 22686, (char) 28155, (char) 19968, (char) 28857,
                                      (char) 20919, (char) 33394, (char) 65292, (char) 20063, (char) 33021, (char) 20196,
                                      (char) 20154, (char) 22312, (char) 35270, (char) 35273, (char) 19978, (char) 35273,
                                      (char) 24471, (char) 30021, (char) 24555, (char) 12290, (char) 19981, (char) 36807,
                                      (char) 65292, (char) 19968, (char) 38388, (char) 25151, (char) 20869, (char) 33509,
                                      (char) 20840, (char) 37096, (char) 20351, (char) 29992, (char) 20919, (char) 33394,
                                      (char) 65292, (char) 25110, (char) 20840, (char) 37096, (char) 37319, (char) 29992,
                                      (char) 26262, (char) 33394, (char) 65292, (char) 20250, (char) 20351, (char) 20154,
                                      (char) 24863, (char) 21040, (char) 19981, (char) 23433, (char) 12290, (char) 26368,
                                      (char) 22909, (char) 26159, (char) 30830, (char) 23450, (char) 20027, (char) 33394,
                                      (char) 21518, (char) 65292, (char) 23567, (char) 38754, (char) 31215, (char) 20351,
                                      (char) 29992, (char) 20123, (char) 21576, (char) 40092, (char) 26126, (char) 23545,
                                      (char) 27604, (char) 30340, (char) 33394, (char) 24425, (char) 12290, (char) 20837,
                                      (char) 22799, (char) 36141, (char) 32622, (char) 19968, (char) 20123, (char) 33394,
                                      (char) 35843, (char) 28165, (char) 20937, (char) 30340, (char) 39280, (char) 29289,
                                      (char) 25670, (char) 35774, (char) 65292, (char) 26159, (char) 26368, (char) 30465,
                                      (char) 38065, (char) 26377, (char) 25928, (char) 30340, (char) 19968, (char) 25307,
                                      (char) 65292, (char) 22914, (char) 20026, (char) 21488, (char) 28783, (char) 25442,
                                      (char) 20010, (char) 30333, (char) 33394, (char) 28783, (char) 32617, (char) 12289,
                                      (char) 22312, (char) 27927, (char) 25163, (char) 38388, (char) 25918, (char) 19968,
                                      (char) 22871, (char) 20912, (char) 34013, (char) 33394, (char) 30340, (char) 27792,
                                      (char) 28020, (char) 29992, (char) 20855, (char) 31561, (char) 12290, (char) 20026,
                                      (char) 24744, (char) 25552, (char) 20379, (char) 29983, (char) 27963, (char) 21672,
                                      (char) 35759, (char) 24182, (char) 31069, (char) 24744, (char) 29983, (char) 27963,
                                      (char) 24841, (char) 24555, (char) 65281, (char) 22914, (char) 19981, (char) 24076,
                                      (char) 26395, (char) 25171, (char) 25200, (char) 35831, (char) 22238, (char) 22797}) +
                "?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg";
        boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
        System.out.println("isURL="+isURL+", length="+text.length());
}Prints
isURL=true, length=344

Similar Messages

Regular expression with delimited string

Hi,
I'm trying extract all characters in a string (as word or words) which is delimited by ' -- '
Been playing around with regular expression and got as far as this;
with t_vw
as (select 'hello -- world' txt from dual
union all
select 'hello-world' from dual
union all
select 'hello' from dual
union all
select 'hello -- world -- bye' from dual
union all
select 'hello--worldbye' from dual
select txt, regexp_substr(txt,'[^ -- ]+',1,1) word1,
regexp_substr(txt,'[^ -- ]+',1,2) word2,
regexp_substr(txt,'[^ -- ]+',1,3) word3
from t_vw;
It's returning;
"TXT","WORD1","WORD2","WORD3"
"hello -- world" "hello","world",""
"hello-world"          "hello","world",""
"hello"               "hello","",""
"hello -- world -- bye"     "hello","world","bye"
"hello--worldbye"      "hello","worldbye",""
So it seems to work in all cases apart from when there are no spaces before/after "--".
Any ideas?

Please enclose your code in *{noformat}{noformat}* tags to preserve your formatting and to prevent the forum software from mangling your regular expressions.
Also, you've given your input and show the output that you are getting, but I don't know what your issue is. If you could include the desired output and explain how it differs from what you are getting so far that would help.

Email Regular Expression with a String.Match()

I'm currently using a RichTextEditor for a user to build HTML
for a site. However, I want the application to scan for emails and
encode them so they are protected from spam bots when they go to
the live site. I've written a regular expression to find an email
and it seems to work, but it only returns one email at a time from
the string. I have had to revert to a while loop to traverse the
string until I'm satisfied. I don't particularly like that method
and would like to just do one String.match() query to retrieve all
of the emails. Can anyone see something here that I'm missing?

Try adding the global flag (g):
var emailPattern:RegExp =
/[a-z][\w.-]+@\w[\w.-]+\.[\w.-]*[a-z][a-z]+/g;
TS

Remove regular expression from a string

Hello,
I have a string like this
@1test;'"{input+
Please help me to remove special characters from the string.

A: remove regular expression from a string

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
char = str+index(1).
WRITE char.
if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''. " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

Regular expression - splitting a string

I have a long string that I'm trying to split into a series of substrings. I would like each of the substrings to start with "TTL.. I'm fairly certain that I'm missing something very basic here. I've attached my code which yield NO GROUPS. I didn't see another method for returning the text that the regular expression matched.
String finalLongstring="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+"+
   "clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
   "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
   "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
   "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
   "TTL28,clip28,TTL29,clip29"
List<String> chapters = new ArrayList<String>();
          chapters.clear();
          Pattern chapter=null;
          chapter=Pattern.compile("(TTL\\d+([+,]|clip\\d+)*)");
          //                      ||    | | | | |    | |
          //                      ||    | | | | |    | Repeat (commas pluses and clips group) 0 or more times
          //                      ||    | | | | |    one or more digits following 'clip'
          //                      ||    | | | | clip
          //                      ||    | | | or..
          //                      ||    | | plus or comma symbols
          //                      ||    | group the +, and clip information together
          //                      ||    one or more digits
          //                      |Match clips starting with TTL
          //                      |
          Matcher cp = chapter.matcher(finalLongstring); //NO MATCHES!!
          String [] temp = chapter.split(finalLongstring); //temp =EMPTY STRING ARRAY
          do{
               String chapterPlus=cp.group(1);
               if(cp.hitEnd()){break;}
               chapters.add(chapterPlus);
          }while(true);Thanks in advance for the help.
Icesurfer

The main reason your matcher didn't work is because you never told it to do anything. You have to call one of the methods matches(), find() or lookingAt(), and make sure it returns true, before you can use the group() methods. When I did that, your regex worked, but then I modified it to demonstrate a better use of capturing groups, as shown here: import java.util.regex.*;
public class Test
public static void main(String... args)
    String str="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+"+
       "TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
       "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
       "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
       "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
       "TTL28,clip28,TTL29,clip29";
    Pattern p = Pattern.compile("(TTL\\d+)[+,](clip\\d+)[+,]");
    Matcher m = p.matcher(str);
    while (m.find())
      System.out.printf("%6s %s%n", m.group(1), m.group(2));
}The reason your split() attempt didn't work is because the regex matched all of the text; the split() regex is supposed to match the parts you don't want. In fact, it did split the text, creating a list of empty strings, but then it threw them all away, because split() discards trailing empty fields by default.
Finally, the hitEnd() method is not appropriate in this context. It and the requireEnd() method were added to support the Scanner class in JDK 1.5. If you want to see how they work, look at the source code for Scanner, but for now, just classify them as an advanced topic. When you're iterating through text with the find() method, you stop when find() returns false, plain and simple.

Regular Expression with comma and encapsulated charaters

Would appreciate some help. Looking for a regular expression to remove comma's from encapsulated text as follows
For example
- Input
1,"This is a string, need to remove the comma",Another text string,10
- Required output
1,"This is a string; need to remove the comma",Another text string,10
Have tried to use the REGEXP_REPLACE but could not grasp the pattern matching.
Thanks John

John Heaton wrote:
Thanks for the solution,this works great for a single field encapsulated by " and containing ,. I am parsing several different file definitions so it would need to cascade through the string for a undetermined number of times and replace all occurrences. Then try (performance-wise) MODEL solution:
{code}
with t as (
select '1,"This is a string, need to remove the comma",Another text string,10' txt from dual union all
select '1,"remove this comma,",Another text string,10,"remove this comma,",xxx,"remove this comma,",11' txt from dual
select txt_original,
txt
from t
model
partition by(row_number() over(order by 1) p)
dimension by(1 rn)
measures(txt txt_original,txt,0 quote)
rules
iterate(
1e9
until(
iteration_number + 1 = length(txt[1])
quote[1] = case substr(txt[1],iteration_number + 1,1)
when '"' then quote[1] + 1
else quote[1]
end,
txt[1] = case substr(txt[1],iteration_number + 1,1)
when ',' then case mod(quote[1],2)
when 1 then substr(txt[1],1,iteration_number) || ';' || substr(txt[1],iteration_number + 2)
else txt[1]
end
else txt[1]
end
TXT_ORIGINAL TXT
1,"This is a string, need to remove the comma",Another text string,10 1,"This is a string; need to remove the comma",Another text string,10
1,"remove this comma,",Another text string,10,"remove this comma,",xxx,"remove this comma,",11 1,"remove this comma;",Another text string,10,"remove this comma;",xxx,"remove this comma;",11
SQL>
SY.

Problem in creating a Regular Expression with gnu

Hi All,
iam trying to create a regular expression using gnu package api..
gnu.regex.RE;
i need to validate the browser's(MSIE) userAgent through my regular expression
userAgent is like :First one ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
i wrote an regular expression like this:
Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)
Actaully this is validating my userAgent and returns true, my problem is, it is returning true if userAgent is having more words at the end after Windows NT 5.0 like Second One ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Testing
i want the regularExpression pattern to validate the First one and return true for it, and has to return false for the Second one..
my code is:
import gnu.regexp.*;
import gnu.regexp.REException;
public class TestRegexp
public static boolean getUserAgentDetails(String userAgent)
     boolean isvalid = false;
     RE regexp = new RE("Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)");
     isvalid = regexp.isMatch(userAgent);
     return isvalid;
public static void main(String a[])
     String userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)";
     boolean regoutput = getUserAgentDetails(userAgent);
     System.out.println("***** regoutput is ****** " + regoutput);
}please help me in solving this..
Thanks in Advance..
thanx,
krishna

Ofcourse, i can do comparision with simple string matching..
but problem is the userAgent that i want to support is for all the MSIE versions ranging from 5.0 onwards, so there will the version difference of IE like MSIE 6.0..! or MSIE 5.5 some thing like that..
any ways i will try with StringTokenizer once..!
seems that will do my work..
Thanks,
krishna

Regular expressions with boolean connectives (AND, OR, NOT) in Java?

I'd like to use regular expression patterns that are made up of simple regex patterns connected via AND, OR, or NOT operators, in order to do some keyword-style pattern matching.
A pattern could look like this:
(.*Is there.*) && (.*library.*) && !((.*badword.*) || (^$))
Is there any Java regex library that allows these operators?
I know that in principle these operators should be available, since Regular languages are closed under union, intersection, and complement.

AND is implicit,
xy -- means x AND yThat's not what I need, though, since this is just
concatenation of a regex.
Thus, /xy/ would not match the string "a y a x",
because y precedes x.So it has to contain both x and y, but they could be
in any order?
You can't do that easily or generally.
"x.*y|y.*x" wouldll work here, but obviously
it will get ugly factorially fast as you add more
terms.You got that right: AND means the regex operands can appear in any order.
That's why I'm looking for some regex library that does all this ugly work for me. Again, from a theoretical point of view, it IS possible to express the described semantics of AND with regular expressions, although they will get rather obfuscated.
Unless somebody has done something similar in java (e.g., for C++, there's Ragel: http://www.cs.queensu.ca/~thurston/ragel/) , I will probably use some finite-state-machine libraries and compile the complex regex's into automata (which can be minimized using well-defined operations on FSMs).
>
You'd probably just be better off doing multiple
calls to matches() or whatever. Yes, that's another possibility, do the boolean operators in Java itself.
Of course, if you
really are just looking for literals, then you can
just use str.contains(a) && !str.contains(b) &&
(str.contains(c) || str.contains(d)). You don't
seem to need regex--at least not from your example.OK, bad example, I do have "real" regexp's in there :)

Regular expressions with multi character separator

I have data like the
where |`| is the separator for distinguishing two fields of data. I am having trouble writing a regular expression to display the data correctly.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> declare
2 l_string varchar2 (200) :='123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes';
3 v varchar2(40);
4 begin
5 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 1);
6 dbms_output.put_line(v);
7 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 2);
8 dbms_output.put_line(v);
9 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 3);
10 dbms_output.put_line(v);
11 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 4);
12 dbms_output.put_line(v);
13 v:=regexp_substr(l_string, '[^(|`|)]+', 1, 5);
14 dbms_output.put_line(v);
15 end;
16 /
123
456
789 10 here
223
5434I need it to display
123` 456
789 10 here
|223
5434|`}22
yesI am not sure how to handle multi character separators in data using reg expressions
Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:35 PM
Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:37 PM

Hi,
Actually, using non-greedy matching, you can do what you want with regular expressions:
VARIABLE     l_string     VARCHAR2 (100)
EXEC :l_string := '123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes'
SELECT     LEVEL
,     REPLACE ( REGEXP_SUBSTR ( '|`|' || REPLACE ( :l_string
                                 , '|`|'
                                  , '|`||`|'
                                 ) || '|`|'
                    , '\|`\|.*?\|`\|'
                    , 1
                    , LEVEL
           , '|`|'
           )     AS ITEM
FROM     dual
CONNECT BY     LEVEL     <= 7
;Output:
LEVEL ITEM
    1 123` 456
    2 789 10 here
    3 |223
    4 5434|`}22
    5 yes
    6
    7Here's how it works:
The pattern
~.*?~is non-greedy ; it matches the smallest possible string that begins and ends with a '~'. So
REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 1) returns '~SHALL~'. However,
REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 2) returns '~COMPARE~'. Why not '~I~'? Because the '~' between 'SHALL' and 'I' was part of the 1st pattern, so it can't be part of the 2nd pattern. So the first thing we have to do is double the delimiters; that's what the inner REPLACE does. The we add delimiters to the beginning and end of the list. Once we've done prepared the string like that, we can use the non-greedy REGEXP_SUBSTR to bring back the delimited items, with a delimiter at either end. We don't want those delimiters, so the outer REPLACE removes them.
I'm not sure this is any better than Sri's solution.

Regular Expressions with Java Regex

Hi,
I'm playing around with regex and there's something I can't get to work. What I need, is to capture words between 2 other words and the words captured has to be higher than 5 characters, so for example:
Pattern "Just testing on something with regular expressions" and suppose I'll try to match all the words between "testing" and "regular", then only the word "something" should come out because "on" and "with" are not larger than 5 chars.
Now I'm quite new to regexps and I know that ((?<=\btesting\b).*(?=\bregular\b)) will return " on something with "
But I can't seem to come up with an expression that would only output the word "something". I've tried a few expressions like ((?<=\btesting\b)((?:[\s\w{1,3}])*(\b\w{4,}\b)*(?:[\s\w{1,3}])*)*(?=\bregular\b)) which also returns " on something with " The others I tried would either return the whole " on something with " or return "Not Found!"
Does anyone have a tip for me? I'm well aware that it's not too hard to do something like this in Java, but I'm really looking to study regular expressions and would like to accomplish this using a regular expression.
The Java program I use is the following:
C:\Program Files\Java\jdk1.5.0_16\bin>java RegexTest "((?<=\btesting\b).*(?=\bregular\b))" "Just testing on something with regular expressions"
public class RegexTest {
     public static void main(String[] args) {
          Pattern RegexCompile = Pattern.compile(args[0]);
          Matcher m = RegexCompile.matcher(args[1]);
          boolean found = m.find(); // Perhaps there's another function to find () that would do the job?
          if (found)
          System.out.println(m.group()); // Perhaps group() is not the right function for this case?
          else
          System.out.println("Not Found!");
Edited by: dli2k3 on Sep 19, 2008 11:32 AM
Edited by: dli2k3 on Sep 19, 2008 11:33 AM

You're talking about a two-stage operation: find everything between those two words, then filter out anything that's less than five letters long. There's no single regex that will accomplish all that in one step.
By the way, please use {code} tags when you post source code.

Regular expressions with dates and multiple matches

I am currently attempting to automate modifying start and end dates within a .config file via powershell but I am having issues identifying the regular expression for the end date section since both are on the same line in the file. Below is the string that
I want to change.
Sometimes the dates are blank and sometimes the dates are filled in.
Dates are always in the same format (yyyy-MM-dd hh:mm).
I also want to note that there are multiple instances of 'StartDate="" EndDate=""' for other applications throughout the same config file so I cannot limit the expression to not include the App name.
I do not want to limit the search to a line number since there are instances where admins will add an extra space in the config file that may throw off the line number.
I want to replace the dates or lack there of in their respective spots on the line below via powershell:
<App name="TestApp" StartDate="2012-03-22 13:30" EndDate="">
I am successfully able to use
$startRegex = '(?<=<App name="TestApp" StartDate=")([^"]*)'
to replace the StartDate but I can't seem to single out the EndDate with regular expression. What expression can I use to have it ignore what is in the quotations after StartDate and only pay attention to the EndDate value?
Below is a snippet:
$path = d:\inetpub\website\app.config
$startRegex = '(?<=<App name="TestApp" StartDate=")([^"]*)'
$starttime = (get-date).ToString("yyyy-MM-dd hh:mm")
(gc $path) -replace $startregex, $starttime | set-content $path
I want to accomplish the same for EndDate.
Thanks in advance!

If you do this with XML it will be painless and less prone to error.
$n=$xml.SelectSingleNode('//App[@name="TestApp"]')
$n.StartDate=$newstartdate
$n.EndDate=$newenddate
$xml.Save($filename)
\_(ツ)_/

How to use regular expression to find string

hi,
who know how to get all digits from the string "Alerts 4520 ( 227550 ) ( 98 Available )" by regular expression, thanks
br, Andrew

Liu,
You can use RegEx as
d+
Whether you are using CL_ABAP_REGEX class then
report zars.
data: regex   type ref to cl_abap_regex,
      matcher type ref to cl_abap_matcher,
      match   type c length 1.
create object regex exporting pattern = 'd+'
                              ignore_case = ''.
matcher = regex->create_matcher( text = 'Test123tes456' ).
match = matcher->match( ).
write match
You can find more details regarding REGEX and POSIX examples here
http://www.regular-expressions.info/tutorial.html
a®

Regular expression - find if string does NOT contain text....

I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
joe schmoe<=jack, jane
should become
joe
schmoe
<=
jack
jane
As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Here's the code I plan on using for this:
    public String[] getWords(String input) {
        Matcher matcher = WORD_PATTERN.matcher(input);
        ArrayList<String> words = new ArrayList<String>();
        while (matcher.find()) {
            words.add(matcher.group());
        return (String[]) words.toArray(new String[0]);
    }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
([^\s]|^(,)|^(<=))+|,|<=
but that produces the following:
joe
schmoe<=jack,
jane
I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

Try:
* Tokenizer.java
* version 1.0
* 01/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class StrangeTokenizer {
    public static void main(String[] args) {
        String text = "joe schmoe<=jack, jane";
        Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
        Matcher matcher = pattern.matcher(text);
        while( matcher.find() ) {
            System.out.println( "Item: " + matcher.group(1) );
}May the code be with you.

Writing Regular Expression with a character ^, too difficult

I want to change "^1Mandrake ^3Style ^4DM" this sentence to "Mandrake Style DM".
(^ with number means color code)
So..I used String.replaceAll() method with regular expression.
But however hard I try, I cant find any solution for this.
In php I could use \^ as a ^ character, but java dosnt support \^.
How can I solve this problem?

Use \\^ in your regex (you have to escape the slash, too).

Help with unicode String?

Hi there,
I have a file that I need to read in and process. Took a while for me to realise it was unicode ("text from my file" was printing out as "t e x t f r o m m y f i l e") - Anyway, got there in teh end using:-
InputStreamReader fis = InputStreamReader(new FilInputeStream(filename), "UTF16");
dataSource = new BufferedReader(isr);My problem now is that I'm splitting the line (which is a comma seperated list of numbers) and coverting to int's:-
String line = dataSource.readLine();
String[] items = line.split(",");
int[] values = new int[12];
for(int i = 1; i < items.length; i++)
values[i-1] = Integer.parseInt(items);
Values is what I expect, a list of numbers, but items[] is being set to 0. Is this something to do with unicode? Must admit, I've never given the charater encoding any though up until now.
Any help would be really appreciated.
Thank,
Steve
using

Sorry, found it. It was actually a buffer issue. For the record, just because I'm printing output in the middle of the loop, doesn't mean the value exists to be printed by the time System.out.println gets to it (my code was creating an exception for an unrelated reason a few lines down)
Thanks for your responses.
Steve

Regular Expressions with Unicode Strings - length restriction?

Similar Messages

Maybe you are looking for