Regular Expressions with Unicode Strings - length restriction?

Hi,
I can't quite figure this one out. I am checking a String for the presence of a URL.. more specifically, a jpg or gif URL.
Anyway, the following reg exp will work fine for me. However, when testing with unicode data (chinese text) the expression will only work up to a certain string length. Here's an example:
boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
My thought is that since Unicode data takes up more space, there a limitation to dealing with Strings. Does anyone know what that number is? Or, is there another reason the reg exp fails??
thanks,
joe
Example::
This works for any length String I throw at it using standard ASCII text.. But a unicode string of a certain length won't recognize the URL (I doubt I can simply paste my example here and have it turn out correctly..)
DOESN'T WORK: (length is reported via text.length() as 344
"FWD: test_tancy: FWD: tancy: FWD: supporter:
浅淡色彩造清凉
要让居所看起来清爽凉快,可采用以白色为主调的布置。白色不但能增加空间感,还能营造明快宁静的气氛,让人情绪稳定。另外,有意识地增添一点冷色,也能令人在视觉上觉得畅快。不过,一间房内若全部使用冷色,或全部采用暖色,会使人感到不安。最好是确定主色后,小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设,是最省钱有效的一招,如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快!如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"
WORKS: (length is reported via text.length() as 296
"FWD: Joe: 要让居所看起来清爽凉快,可采用以白色为主调的布置。白色不但能增加空间感,还能营造明快宁静的气氛,让人情绪稳定。另外,有意识地增添一点冷色,也能令人在视觉上觉得畅快。不过,一间房内若全部使用冷色,或全部采用暖色,会使人感到不安。最好是确定主色后,小面积使用些呈鲜明对比的色彩。入夏购置一些色调清凉的饰物摆设,是最省钱有效的一招,如为台灯换个白色灯罩、在洗手间放一套冰蓝色的沐浴用具等。(UU为您提供生活咨讯并祝您生活愉快!如不希望打扰请回复?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg"

Perhaps you should check the version of Java you are using. I am using 1.4.2_04
public class A {
    public static void main(String[] args) throws UnsupportedEncodingException {
        String text = "FWD: test_tancy: FWD: tancy: FWD: supporter:                   " +
                new String(new char[]{(char) 35201, (char) 35753, (char) 23621, (char) 25152, (char) 30475, (char) 36215,
                                      (char) 26469, (char) 28165, (char) 29245, (char) 20937, (char) 24555, (char) 65292,
                                      (char) 21487, (char) 37319, (char) 29992, (char) 20197, (char) 30333, (char) 33394,
                                      (char) 20026, (char) 20027, (char) 35843, (char) 30340, (char) 24067, (char) 32622,
                                      (char) 12290, (char) 30333, (char) 33394, (char) 19981, (char) 20294, (char) 33021,
                                      (char) 22686, (char) 21152, (char) 31354, (char) 38388, (char) 24863, (char) 65292,
                                      (char) 36824, (char) 33021, (char) 33829, (char) 36896, (char) 26126, (char) 24555,
                                      (char) 23425, (char) 38745, (char) 30340, (char) 27668, (char) 27675, (char) 65292,
                                      (char) 35753, (char) 20154, (char) 24773, (char) 32490, (char) 31283, (char) 23450,
                                      (char) 12290, (char) 21478, (char) 22806, (char) 65292, (char) 26377, (char) 24847,
                                      (char) 35782, (char) 22320, (char) 22686, (char) 28155, (char) 19968, (char) 28857,
                                      (char) 20919, (char) 33394, (char) 65292, (char) 20063, (char) 33021, (char) 20196,
                                      (char) 20154, (char) 22312, (char) 35270, (char) 35273, (char) 19978, (char) 35273,
                                      (char) 24471, (char) 30021, (char) 24555, (char) 12290, (char) 19981, (char) 36807,
                                      (char) 65292, (char) 19968, (char) 38388, (char) 25151, (char) 20869, (char) 33509,
                                      (char) 20840, (char) 37096, (char) 20351, (char) 29992, (char) 20919, (char) 33394,
                                      (char) 65292, (char) 25110, (char) 20840, (char) 37096, (char) 37319, (char) 29992,
                                      (char) 26262, (char) 33394, (char) 65292, (char) 20250, (char) 20351, (char) 20154,
                                      (char) 24863, (char) 21040, (char) 19981, (char) 23433, (char) 12290, (char) 26368,
                                      (char) 22909, (char) 26159, (char) 30830, (char) 23450, (char) 20027, (char) 33394,
                                      (char) 21518, (char) 65292, (char) 23567, (char) 38754, (char) 31215, (char) 20351,
                                      (char) 29992, (char) 20123, (char) 21576, (char) 40092, (char) 26126, (char) 23545,
                                      (char) 27604, (char) 30340, (char) 33394, (char) 24425, (char) 12290, (char) 20837,
                                      (char) 22799, (char) 36141, (char) 32622, (char) 19968, (char) 20123, (char) 33394,
                                      (char) 35843, (char) 28165, (char) 20937, (char) 30340, (char) 39280, (char) 29289,
                                      (char) 25670, (char) 35774, (char) 65292, (char) 26159, (char) 26368, (char) 30465,
                                      (char) 38065, (char) 26377, (char) 25928, (char) 30340, (char) 19968, (char) 25307,
                                      (char) 65292, (char) 22914, (char) 20026, (char) 21488, (char) 28783, (char) 25442,
                                      (char) 20010, (char) 30333, (char) 33394, (char) 28783, (char) 32617, (char) 12289,
                                      (char) 22312, (char) 27927, (char) 25163, (char) 38388, (char) 25918, (char) 19968,
                                      (char) 22871, (char) 20912, (char) 34013, (char) 33394, (char) 30340, (char) 27792,
                                      (char) 28020, (char) 29992, (char) 20855, (char) 31561, (char) 12290, (char) 20026,
                                      (char) 24744, (char) 25552, (char) 20379, (char) 29983, (char) 27963, (char) 21672,
                                      (char) 35759, (char) 24182, (char) 31069, (char) 24744, (char) 29983, (char) 27963,
                                      (char) 24841, (char) 24555, (char) 65281, (char) 22914, (char) 19981, (char) 24076,
                                      (char) 26395, (char) 25171, (char) 25200, (char) 35831, (char) 22238, (char) 22797}) +
                "?NO?)http://www.blah.com/servlet/mailbox?item=fc-10Tq9aljw0w9.jpg";
        boolean isURL = text.matches(".*http\\S*(jpg|gif).*");
        System.out.println("isURL="+isURL+", length="+text.length());
}Prints
isURL=true, length=344

Similar Messages

  • Regular expression with delimited string

    Hi,
    I'm trying extract all characters in a string (as word or words) which is delimited by ' -- '
    Been playing around with regular expression and got as far as this;
    with t_vw
    as (select 'hello -- world' txt from dual
    union all
    select 'hello-world' from dual
    union all
    select 'hello' from dual
    union all
    select 'hello -- world -- bye' from dual
    union all
    select 'hello--worldbye' from dual
    select txt, regexp_substr(txt,'[^ -- ]+',1,1) word1,
    regexp_substr(txt,'[^ -- ]+',1,2) word2,
    regexp_substr(txt,'[^ -- ]+',1,3) word3
    from t_vw;
    It's returning;
    "TXT","WORD1","WORD2","WORD3"
    "hello -- world" "hello","world",""
    "hello-world"          "hello","world",""
    "hello"               "hello","",""
    "hello -- world -- bye"     "hello","world","bye"
    "hello--worldbye"      "hello","worldbye",""
    So it seems to work in all cases apart from when there are no spaces before/after "--".
    Any ideas?

    Please enclose your code in *{noformat}{noformat}* tags to preserve your formatting and to prevent the forum software from mangling your regular expressions.
    Also, you've given your input and show the output that you are getting, but I don't know what your issue is.  If you could include the desired output and explain how it differs from what you are getting so far that would help.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Email Regular Expression with a String.Match()

    I'm currently using a RichTextEditor for a user to build HTML
    for a site. However, I want the application to scan for emails and
    encode them so they are protected from spam bots when they go to
    the live site. I've written a regular expression to find an email
    and it seems to work, but it only returns one email at a time from
    the string. I have had to revert to a while loop to traverse the
    string until I'm satisfied. I don't particularly like that method
    and would like to just do one String.match() query to retrieve all
    of the emails. Can anyone see something here that I'm missing?

    Try adding the global flag (g):
    var emailPattern:RegExp =
    /[a-z][\w.-]+@\w[\w.-]+\.[\w.-]*[a-z][a-z]+/g;
    TS

  • Remove regular expression from a string

    Hello,
    I have a string like this
    @1test;'"{input+
    Please help me to remove special characters from the string.

A: remove regular expression from a string

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
  char = str+index(1).
  WRITE char.
  if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''.  " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
  add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

Hi Krishna,
DATA : str TYPE STRING VALUE '@1test;"{}]input+',
            char,
            length TYPE i,
            index TYPE i.
length = STRLEN( str ).
WHILE length > index.
  char = str+index(1).
  WRITE char.
  if char CA '+-*/!`@#$%^&()_=[]{};'.               " Add/Remove here to include numbers
    REPLACE ALL OCCURRENCES OF char in str WITH ''.
    REPLACE ALL OCCURRENCES OF '"' in str WITH ''.  " characters "{}[] are not comparable
    REPLACE ALL OCCURRENCES OF '{' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '}' in str WITH ''.
    REPLACE ALL OCCURRENCES OF '[' in str WITH ''.
    REPLACE ALL OCCURRENCES OF ']' in str WITH ''.
    length = STRLEN( str ).
    ENDIF.
  add 1 to index.
ENDWHILE.
WRITE str.
Add or remove special char from '+-*/!`@#$%^&()_=[]{};' in if part as per your requirement.
Hope it meets your requirement.
Do not forget to mark helpful/correct if ma answer is useful .
Thanks,
Karthik

  • Regular expression - splitting a string

    I have a long string that I'm trying to split into a series of substrings. I would like each of the substrings to start with "TTL.. I'm fairly certain that I'm missing something very basic here. I've attached my code which yield NO GROUPS. I didn't see another method for returning the text that the regular expression matched.
    String finalLongstring="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+"+
       "clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
       "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
       "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
       "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
       "TTL28,clip28,TTL29,clip29"
    List<String> chapters = new ArrayList<String>();
              chapters.clear();
              Pattern chapter=null;
              chapter=Pattern.compile("(TTL\\d+([+,]|clip\\d+)*)");
              //                      ||    |  |  | |  |    | |
              //                      ||    |  |  | |  |    | Repeat (commas pluses and clips group) 0 or more times
              //                      ||    |  |  | |  |    one or more digits following 'clip'
              //                      ||    |  |  | |  clip
              //                      ||    |  |  | or..
              //                      ||    |  |  plus or comma symbols
              //                      ||    |  group the +, and clip information together
              //                      ||    one or more digits
              //                      |Match clips starting with TTL
              //                      |
              Matcher cp = chapter.matcher(finalLongstring);  //NO MATCHES!!
              String [] temp = chapter.split(finalLongstring);  //temp =EMPTY STRING ARRAY
              do{
                   String chapterPlus=cp.group(1);
                   if(cp.hitEnd()){break;}
                   chapters.add(chapterPlus);
              }while(true);Thanks in advance for the help.
    Icesurfer

    The main reason your matcher didn't work is because you never told it to do anything. You have to call one of the methods matches(), find() or lookingAt(), and make sure it returns true, before you can use the group() methods. When I did that, your regex worked, but then I modified it to demonstrate a better use of capturing groups, as shown here: import java.util.regex.*;
    public class Test
      public static void main(String... args)
        String str="TTL1,clip1+TTL2+clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+"+
           "TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,"+
           "TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,"+
           "TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,"+
           "TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,"+
           "TTL28,clip28,TTL29,clip29";
        Pattern p = Pattern.compile("(TTL\\d+)[+,](clip\\d+)[+,]");
        Matcher m = p.matcher(str);
        while (m.find())
          System.out.printf("%6s %s%n", m.group(1), m.group(2));
    }The reason your split() attempt didn't work is because the regex matched all of the text; the split() regex is supposed to match the parts you don't want. In fact, it did split the text, creating a list of empty strings, but then it threw them all away, because split() discards trailing empty fields by default.
    Finally, the hitEnd() method is not appropriate in this context. It and the requireEnd() method were added to support the Scanner class in JDK 1.5. If you want to see how they work, look at the source code for Scanner, but for now, just classify them as an advanced topic. When you're iterating through text with the find() method, you stop when find() returns false, plain and simple.

  • Regular Expression with comma and encapsulated charaters

    Would appreciate some help. Looking for a regular expression to remove comma's from encapsulated text as follows
    For example
    - Input
    1,"This is a string, need to remove the comma",Another text string,10
    - Required output
    1,"This is a string; need to remove the comma",Another text string,10
    Have tried to use the REGEXP_REPLACE but could not grasp the pattern matching.
    Thanks John

    John Heaton wrote:
    Thanks for the solution,this works great for a single field encapsulated by " and containing ,. I am parsing several different file definitions so it would need to cascade through the string for a undetermined number of times and replace all occurrences. Then try (performance-wise) MODEL solution:
    {code}
    with t as (
    select '1,"This is a string, need to remove the comma",Another text string,10' txt from dual union all
    select '1,"remove this comma,",Another text string,10,"remove this comma,",xxx,"remove this comma,",11' txt from dual
    select txt_original,
    txt
    from t
    model
    partition by(row_number() over(order by 1) p)
    dimension by(1 rn)
    measures(txt txt_original,txt,0 quote)
    rules
    iterate(
    1e9
    until(
    iteration_number + 1 = length(txt[1])
    quote[1] = case substr(txt[1],iteration_number + 1,1)
    when '"' then quote[1] + 1
    else quote[1]
    end,
    txt[1] = case substr(txt[1],iteration_number + 1,1)
    when ',' then case mod(quote[1],2)
    when 1 then substr(txt[1],1,iteration_number) || ';' || substr(txt[1],iteration_number + 2)
    else txt[1]
    end
    else txt[1]
    end
    TXT_ORIGINAL TXT
    1,"This is a string, need to remove the comma",Another text string,10 1,"This is a string; need to remove the comma",Another text string,10
    1,"remove this comma,",Another text string,10,"remove this comma,",xxx,"remove this comma,",11 1,"remove this comma;",Another text string,10,"remove this comma;",xxx,"remove this comma;",11
    SQL>
    SY.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • Problem in creating a Regular Expression with gnu

    Hi All,
    iam trying to create a regular expression using gnu package api..
    gnu.regex.RE;
    i need to validate the browser's(MSIE) userAgent through my regular expression
    userAgent is like :First one ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
    i wrote an regular expression like this:
    Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)
    Actaully this is validating my userAgent and returns true, my problem is, it is returning true if userAgent is having more words at the end after Windows NT 5.0 like Second One ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Testing
    i want the regularExpression pattern to validate the First one and return true for it, and has to return false for the Second one..
    my code is:
    import gnu.regexp.*;
    import gnu.regexp.REException;
    public class TestRegexp
    public static boolean getUserAgentDetails(String userAgent)
         boolean isvalid = false;
         RE regexp = new RE("Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)");
         isvalid = regexp.isMatch(userAgent);
         return isvalid;
    public static void main(String a[])
         String userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)";
         boolean regoutput = getUserAgentDetails(userAgent);
         System.out.println("***** regoutput is ****** " + regoutput);
    }please help me in solving this..
    Thanks in Advance..
    thanx,
    krishna

    Ofcourse, i can do comparision with simple string matching..
    but problem is the userAgent that i want to support is for all the MSIE versions ranging from 5.0 onwards, so there will the version difference of IE like MSIE 6.0..! or MSIE 5.5 some thing like that..
    any ways i will try with StringTokenizer once..!
    seems that will do my work..
    Thanks,
    krishna

  • Regular expressions with boolean connectives (AND, OR, NOT) in Java?

    I'd like to use regular expression patterns that are made up of simple regex patterns connected via AND, OR, or NOT operators, in order to do some keyword-style pattern matching.
    A pattern could look like this:
    (.*Is there.*) && (.*library.*) && !((.*badword.*) || (^$))
    Is there any Java regex library that allows these operators?
    I know that in principle these operators should be available, since Regular languages are closed under union, intersection, and complement.

    AND is implicit,
    xy -- means x AND yThat's not what I need, though, since this is just
    concatenation of a regex.
    Thus, /xy/ would not match the string "a y a x",
    because y precedes x.So it has to contain both x and y, but they could be
    in any order?
    You can't do that easily or generally.
    "x.*y|y.*x" wouldll work here, but obviously
    it will get ugly factorially fast as you add more
    terms.You got that right: AND means the regex operands can appear in any order.
    That's why I'm looking for some regex library that does all this ugly work for me. Again, from a theoretical point of view, it IS possible to express the described semantics of AND with regular expressions, although they will get rather obfuscated.
    Unless somebody has done something similar in java (e.g., for C++, there's Ragel: http://www.cs.queensu.ca/~thurston/ragel/) , I will probably use some finite-state-machine libraries and compile the complex regex's into automata (which can be minimized using well-defined operations on FSMs).
    >
    You'd probably just be better off doing multiple
    calls to matches() or whatever. Yes, that's another possibility, do the boolean operators in Java itself.
    Of course, if you
    really are just looking for literals, then you can
    just use str.contains(a) && !str.contains(b) &&
    (str.contains(c) || str.contains(d)). You don't
    seem to need regex--at least not from your example.OK, bad example, I do have "real" regexp's in there :)

  • Regular expressions with multi character separator

    I have data like the
    where |`| is the separator for distinguishing two fields of data. I am having trouble writing a regular expression to display the data correctly.
    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    SQL> declare
      2  l_string varchar2 (200) :='123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes';
      3  v varchar2(40);
      4  begin
      5  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 1);
      6  dbms_output.put_line(v);
      7  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 2);
      8  dbms_output.put_line(v);
      9  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 3);
    10  dbms_output.put_line(v);
    11  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 4);
    12  dbms_output.put_line(v);
    13  v:=regexp_substr(l_string, '[^(|`|)]+', 1, 5);
    14  dbms_output.put_line(v);
    15  end;
    16  /
    123
    456
    789 10 here
    223
    5434I need it to display
    123` 456
    789 10 here
    |223
    5434|`}22
    yesI am not sure how to handle multi character separators in data using reg expressions
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:35 PM
    Edited by: Clearance 6`- 8`` on Apr 1, 2011 3:37 PM

    Hi,
    Actually, using non-greedy matching, you can do what you want with regular expressions:
    VARIABLE     l_string     VARCHAR2 (100)
    EXEC  :l_string := '123` 456 |`|789 10 here|`||223|`|5434|`}22|`|yes'
    SELECT     LEVEL
    ,     REPLACE ( REGEXP_SUBSTR ( '|`|' || REPLACE ( :l_string
                                     , '|`|'
                                      , '|`||`|'
                                     ) || '|`|'
                        , '\|`\|.*?\|`\|'
                        , 1
                        , LEVEL
               , '|`|'
               )     AS ITEM
    FROM     dual
    CONNECT BY     LEVEL     <= 7
    ;Output:
    LEVEL ITEM
        1 123` 456
        2 789 10 here
        3 |223
        4 5434|`}22
        5 yes
        6
        7Here's how it works:
    The pattern
    ~.*?~is non-greedy ; it matches the smallest possible string that begins and ends with a '~'. So
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 1) returns '~SHALL~'. However,
    REGEXP_SUBSTR ('~SHALL~I~COMPARE~THEE~', '~.*?~', 1, 2) returns '~COMPARE~'. Why not '~I~'? Because the '~' between 'SHALL' and 'I' was part of the 1st pattern, so it can't be part of the 2nd pattern. So the first thing we have to do is double the delimiters; that's what the inner REPLACE does. The we add delimiters to the beginning and end of the list. Once we've done prepared the string like that, we can use the non-greedy REGEXP_SUBSTR to bring back the delimited items, with a delimiter at either end. We don't want those delimiters, so the outer REPLACE removes them.
    I'm not sure this is any better than Sri's solution.

  • Regular Expressions with Java Regex

    Hi,
    I'm playing around with regex and there's something I can't get to work. What I need, is to capture words between 2 other words and the words captured has to be higher than 5 characters, so for example:
    Pattern "Just testing on something with regular expressions" and suppose I'll try to match all the words between "testing" and "regular", then only the word "something" should come out because "on" and "with" are not larger than 5 chars.
    Now I'm quite new to regexps and I know that ((?<=\btesting\b).*(?=\bregular\b)) will return " on something with "
    But I can't seem to come up with an expression that would only output the word "something". I've tried a few expressions like ((?<=\btesting\b)((?:[\s\w{1,3}])*(\b\w{4,}\b)*(?:[\s\w{1,3}])*)*(?=\bregular\b)) which also returns " on something with " The others I tried would either return the whole " on something with " or return "Not Found!"
    Does anyone have a tip for me? I'm well aware that it's not too hard to do something like this in Java, but I'm really looking to study regular expressions and would like to accomplish this using a regular expression.
    The Java program I use is the following:
    C:\Program Files\Java\jdk1.5.0_16\bin>java RegexTest "((?<=\btesting\b).*(?=\bregular\b))" "Just testing on something with regular expressions"
    public class RegexTest {
         public static void main(String[] args) {
              Pattern RegexCompile = Pattern.compile(args[0]);
              Matcher m = RegexCompile.matcher(args[1]);
              boolean found = m.find(); // Perhaps there's another function to find () that would do the job?
              if (found)
              System.out.println(m.group()); // Perhaps group() is not the right function for this case?
              else
              System.out.println("Not Found!");
    Edited by: dli2k3 on Sep 19, 2008 11:32 AM
    Edited by: dli2k3 on Sep 19, 2008 11:33 AM

    You're talking about a two-stage operation: find everything between those two words, then filter out anything that's less than five letters long. There's no single regex that will accomplish all that in one step.
    By the way, please use &#x7B;code} tags when you post source code.

  • Regular expressions with dates and multiple matches

    I am currently attempting to automate modifying start and end dates within a .config file via powershell but I am having issues identifying the regular expression for the end date section since both are on the same line in the file. Below is the string that
    I want to change.
    Sometimes the dates are blank and sometimes the dates are filled in.
    Dates are always in the same format (yyyy-MM-dd hh:mm).
    I also want to note that there are multiple instances of 'StartDate="" EndDate=""' for other applications throughout the same config file so I cannot limit the expression to not include the App name. 
    I do not want to limit the search to a line number since there are instances where admins will add an extra space in the config file that may throw off the line number.
    I want to replace the dates or lack there of in their respective spots on the line below via powershell:
     <App name="TestApp" StartDate="2012-03-22 13:30" EndDate="">
    I am successfully able to use 
    $startRegex = '(?<=<App name="TestApp" StartDate=")([^"]*)'
    to replace the StartDate but I can't seem to single out the EndDate with regular expression. What expression can I use to have it ignore what is in the quotations after StartDate and only pay attention to the EndDate value?
    Below is a snippet: 
    $path = d:\inetpub\website\app.config
    $startRegex = '(?<=<App name="TestApp" StartDate=")([^"]*)'
    $starttime = (get-date).ToString("yyyy-MM-dd hh:mm")
    (gc $path) -replace $startregex, $starttime | set-content $path
    I want to accomplish the same for EndDate.
    Thanks in advance!

    If you do this with XML it will be painless and less prone to error.
    $n=$xml.SelectSingleNode('//App[@name="TestApp"]')
    $n.StartDate=$newstartdate
    $n.EndDate=$newenddate
    $xml.Save($filename)
    \_(ツ)_/

  • How to use regular expression to find string

    hi,
    who know how to get all digits from the string "Alerts 4520 ( 227550 )  (  98 Available  )" by regular expression, thanks
    br, Andrew

    Liu,
    You can use RegEx as   
    d+
    Whether you are using CL_ABAP_REGEX class then
    report  zars.
    data: regex   type ref to cl_abap_regex,
          matcher type ref to cl_abap_matcher,
          match   type c length 1.
    create object regex exporting pattern = 'd+'
                                  ignore_case = ''.
    matcher = regex->create_matcher( text = 'Test123tes456' ).
    match = matcher->match( ).
    write match
    You can find more details regarding REGEX and POSIX examples here
    http://www.regular-expressions.info/tutorial.html

  • Regular expression - find if string does NOT contain text....

    I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
    joe schmoe<=jack, jane
    should become
    joe
    schmoe
    <=
    jack
    jane
    As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
    Here's the code I plan on using for this:
        public String[] getWords(String input) {
            Matcher matcher = WORD_PATTERN.matcher(input);
            ArrayList<String> words = new ArrayList<String>();
            while (matcher.find()) {
                words.add(matcher.group());
            return (String[]) words.toArray(new String[0]);
        }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
    ([^\s]|^(,)|^(<=))+|,|<=
    but that produces the following:
    joe
    schmoe<=jack,
    jane
    I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

    Try:
    * Tokenizer.java
    * version 1.0
    * 01/06/2005
    package samples;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * @author notivago
    public class StrangeTokenizer {
        public static void main(String[] args) {
            String text = "joe schmoe<=jack, jane";
            Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
            Matcher matcher = pattern.matcher(text);
            while( matcher.find() ) {
                System.out.println( "Item: " + matcher.group(1) );
    }May the code be with you.

  • Writing Regular Expression with a character ^, too difficult

    I want to change "^1Mandrake ^3Style ^4DM" this sentence to "Mandrake Style DM".
    (^ with number means color code)
    So..I used String.replaceAll() method with regular expression.
    But however hard I try, I cant find any solution for this.
    In php I could use \^ as a ^ character, but java dosnt support \^.
    How can I solve this problem?

    Use \\^ in your regex (you have to escape the slash, too).

  • Help with unicode String?

    Hi there,
    I have a file that I need to read in and process. Took a while for me to realise it was unicode ("text from my file" was printing out as "t e x t f r o m m y f i l e") - Anyway, got there in teh end using:-
    InputStreamReader fis = InputStreamReader(new FilInputeStream(filename), "UTF16");
    dataSource = new BufferedReader(isr);My problem now is that I'm splitting the line (which is a comma seperated list of numbers) and coverting to int's:-
    String line = dataSource.readLine();
    String[] items = line.split(",");
    int[] values = new int[12];
    for(int i = 1; i < items.length; i++)
       values[i-1] = Integer.parseInt(items);
    Values is what I expect, a list of numbers, but items[] is being set to 0. Is this something to do with unicode? Must admit, I've never given the charater encoding any though up until now.
    Any help would be really appreciated.
    Thank,
    Steve
    using

    Sorry, found it. It was actually a buffer issue. For the record, just because I'm printing output in the middle of the loop, doesn't mean the value exists to be printed by the time System.out.println gets to it (my code was creating an exception for an unrelated reason a few lines down)
    Thanks for your responses.
    Steve

  • Maybe you are looking for

    • 16:9 ratio

      I had to change my camera halfway through making a piece, as the old one gave up the ghost. Although the new one is set to capture in 16:9, when importing the video into FCP, there is a small black margin around the footage, on each side, and on the

    • How Insert Work on global cache group?

      Hi all , i'm doing some test about how many transactions for second TimesTen can process. With a normal configuration "direct" i reached 5200 transaction for second, on my machine (OS windows normal work station). now i'm using the global cache group

    • Macbook airport issues! Help!

      I have a macbook and last year I was having trouble with the airport wireless. It would lose connection and I would have to restart the Linksys router to make the internet work again. I got fed up with it and decided to buy the apple airport extreme

    • Modifying sql query in "region source" depending on hidden parameter?

      Hi, Is there a way to run 2 sql queries in "region source" depending on the hidden parameter passed? In other words, if I want to show all employees "select * from emp" when clicking on a "total" link as opposed to "select * from emp where dept=:xxx"

    • About Audsid and userenv('sessionid')

      Hi All, I think the AUDSID of v$session view and the userenv('sessionid') will have the same value. But i want to know how the number is generated, like, will it follow a number sequence from the first startup of a database since it has been created