Java Regular Expressions and Pattern

I have a file that i first want to get all the lines that match a given pattern. Then from these lines that match i want to extract two values.
Example line for the pattern to match
INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'
So all the lines that are like these i want to extract two variables
2006/11/07 15:14:09
and
/opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf
so i can store these variables in a database.
Can someone help me with writing the pattern to match and the regular express to extract? Also if anyone else has a better way of doing this i am all ears and i have a lot of log files to go through.

import java.util.regex.*;
class Main
  public static void main(String[] args)
    String txt="INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'";
    String re1=".*?";     // Non-greedy match on filler
    String re2="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";     // Time Stamp 1
    String re3=".*?";     // Non-greedy match on filler
    String re4="((?:\\/[\\w\\.]+)+)";     // Unix Path 1
    Pattern p = Pattern.compile(re1+re2+re3+re4,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    Matcher m = p.matcher(txt);
    if (m.find())
        String timestamp1=m.group(1);
        String unixpath1=m.group(2);
        System.out.print("("+timestamp1.toString()+")"+"("+unixpath1.toString()+")"+"\n");
}

Similar Messages

  • Regular expression and pattern matching/replacing

    I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
    My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
    I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replacementStr);
    But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
    Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
    Any suggestions are most welcome.

    I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
    What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

  • Logical AND in Java Regular Expressions

    I'm trying to implement logical AND using Java Regular Expressions.
    I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
    Any ideas, how to do it ?
    Baken

    First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
    Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

  • SQL Injection and Java Regular Expression: How to match words?

    Dear friends,
    I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
    The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
    I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
    Thanks,
    Ricky
    Edited by: Ricky Ru on 28/04/2011 02:29

    Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

  • "Match Regular Expression" and "Match Pattern" vi's behave differently

    Hi,
    I have a simple string matching need and by experimenting found that the "Match Regular Expression" and "Match Pattern" vi's behave somewhat differently. I'd assume that the regular expression inputs on both would behave the same. A difference I've discovered is that the "|" character (the "vertical bar" character, commonly used as an "or" operator) is recognized as such in the Match Regular Expression vi, but not in the Match Pattern vi (where it is taken literally). Furthermore, I cannot find any documentation in Help (on-line or in LabVIEW) about the "|" character usage in regular expressions. Is this documented anywhere?
    For example, suppose I want to match any of the following 4 words: "The" or "quick" or "brown" or "fox". The regular expression "The|quick|brown|fox" (without the quotes) works for the Match Regular Expression vi but not the Match Pattern vi. Below is a picture of the block diagram and the front panel results:
    The Help says that the Match Regular Expression vi performs somewhat slower than the Match Pattern vi, so I started with the latter. But since it doesn't work for me, I'll use the former. But does anyone have any idea of the speed difference? I'd assume it is negligible in such a simple example.
    Thanks!
    Solved!
    Go to Solution.

    Yep-
    You hit a point that's frustrated me a time or two as well (and incidentally, caused some hair-pulling that I can ill afford)
    The hint is in the help file:
    for Match regular expression "The Match Regular Expression function gives you more options for matching
    strings but performs more slowly than the Match Pattern function....Use regular
    expressions in this function to refine searches....
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    A plus sign or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [^0-9]+
    The word Level only if it
    appears at the beginning of the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The first string within parentheses but not containing any
    parentheses within it
    \([^()]*\)
    A left bracket
    A right bracket
    cat, cag, cot, cog, dat, dag, dot, and dag
    [cd][ao][tg]
    cat or dog
    cat|dog
    dog, cat
    dog, cat cat dog,cat
    cat cat dog, and so on
    ((cat )*dog)
    One or more of the letter a
    followed by a space and the same number of the letter a, that is, a a, aa aa, aaa aaa, and so
    on
    (a+) \1
    For Match Pattern "This function is similar to the Search and Replace
    Pattern VI. The Match Pattern function gives you fewer options for matching
    strings but performs more quickly than the Match Regular Expression
    function. For example, the Match Pattern function does not support the
    parenthesis or vertical bar (|) characters.
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    All uppercase and lowercase versions of volts, that is, VOLTS, Volts, volts, and so on
    [Vv][Oo][Ll][Tt][Ss]
    A space, a plus sign, or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [~0-9]+
    The word Level only if it begins
    at the offset position in the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The longest string within parentheses but not containing any
    parentheses within it
    ([~()]*)
    A left bracket
    A right bracket
    cat, dog, cot, dot, cog, and so on.
    [cd][ao][tg]
    Frustrating- but still managable.
    Jeff

  • Java – Regular Expressions – Finding any non digit byte in a multiple byte

    Hello,
    I’m new to JAVA and Regular Expressions; I’m trying to write a regular expression that will find any records that contain a non digit byte in a multiple byte field.
    I thought the following was the correct expression but it is only finding records that contain “all” non digit bytes.
    \D{1,}
    \D = Non Digit
    {1,} = at least 1 or more
    Below is my sample data. I would like the regular expression to find all of the records that are not all numeric. However when I use the regular expression \D{1,} it is only finding the 2 records that all bytes are non digits. (i.e. “ “ and “A “)
    “ 111229”
    “2 111229”
    “20091229”
    “200912c9”
    “201#1229”
    “20101229”
    “20110229”
    “20111*29”
    “20111029”
    “20111229”
    “20B11229”
    “A “
    “A0111229”
    Please note I have also tried \D{1,}+ and \D{1,}? And they also do not return my desired results
    Any assistance someone can provide would be greatly appreciated.

    You don't show the code you are using but I surmise you are using String.matches() which requires that the whole target must match the regular expression not just part of it. Instead you should create a Pattern and then a Matcher and use the Matcher.find() method. Check the Javadoc for Pattern and Matcher and look at the Java regex tutorial - http://docs.oracle.com/javase/tutorial/essential/regex/ .
    P.S. You can re-use the Pattern object - you don't have to create it every time you need one.
    P.P.S. Java regular expressions work with characters not bytes and characters are not not not bytes.

  • Regular expressions and back references

    Just wanted to know if anyone else noticed that.
    In the javadoc of java.util.regex.Pattern in the "Back references" section it says that you need to use \n to match capturing group but it does not work. To match a capturing group one need to use a "$" sign which is not standard for this type of operation.
    For example, the following code should work according to the API and most other regular expression engines:
    Pattern.compile("([A-Z])").matcher("ThisIsATestString").replaceAll(" \1");
    But to make this work you need to use:
    Pattern.compile("([A-Z])").matcher("ThisIsATestString").replaceAll(" $1");
    So, is this just a doc bug or am I missing something?
    Someone have any idea why Sun choose to use the "$" sign instead of the regular "\" sign??
    TIA,
    Shaul

    The doc you're referring to is talking about using back-refereneces within the regex, not in the replacement string. For instance, if you wanted to find all instances of things like "foo-foo" or "bar-bar", you would use a Pattern like   Pattern p = Pattern.compile("([a-z]+)-\\1");For the most part, they've made the syntax the same a Perl's regexes, and that's why they use $n instead of \n in the replacement string. The replacement string is described in the Matcher javadoc.

  • Problems with java regular expressions

    Hi everybody,
    Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
    For example, I have this code in java:
    import java.util.regex.*;
    String text = "abc";
              Pattern p = Pattern.compile("(a)b(c)");
              Matcher m = p.matcher(text);
    if (m.matches())
                   int count = m.groupCount();
                   System.out.println("Groups found " + String.valueOf(count) );
                   for (int i = 0; i < count; i++)
                        System.out.println("group " + String.valueOf(i) + " " + m.group(i));
    My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
    Groups found 2
    group 0 abc
    group 1 a
    I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
    I am using Java 1.5.0 on Mac OS X 10.4.
    Thank to all who can help.

    paulcw wrote:
    If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
    Matcher.groupCount():
    Group zero denotes the entire pattern by convention. It is not included in this count.

  • Regular expression and XML

    Hello,
    I have an XML file containing regular expressions and i parse the file, extract the pattern from it and search for it using java regex package. The problem is it works fine when patterns are words but when the pattern is something like
    write \\d+ (write followed by a space followed by one or mre digits) it doesn't work.
    I wrote the same code but with the pattern embedded in it,ie. without using XML and it worked. But when extracting with XML it fails.
    Also if the pattern is write[0-9] it only extracts write[0-9 and gives an error of no closing bracket.
    Could anyone please tell me what i am missing out
    Thank you

    thank you for your replies. Well i have still no got over the problem so i am posting my code here and hoping it can get solved
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    import java.io.*;
    import java.util.regex.*;
    class textextractor extends DefaultHandler{
         boolean regex=false;
    public void startElement(String namespaceURI,String localName,String qn,Attributes attr)
              if(localName.equals("REGEX"))
               regex=true;
    public void characters(char [] text,int start,int length)throws SAXException {
              String t=new String(text,start,length);
              boolean flag=false;
              if(regex==true)
                Pattern pattern;
                  String w=new String(t);
              pattern = Pattern.compile(w);
              Matcher matcher;
              matcher=pattern.matcher("there is a bat   read  write 13    error at line ");
              while(matcher.find())
               flag=true;
               System.out.println("I found the text \"" + matcher.group() +"\" starting at index "
               + matcher.start() +"and ending at index " + matcher.end() + ".");
             if(!flag)
               System.out.println("not found");
             regex=false;
    public class saxt2 {
         public static void main(String args[]) {
              try {
                    XMLReader parser= XMLReaderFactory.createXMLReader();
                    ContentHandler handler=new textextractor();
                    parser.setContentHandler(handler);
                                    parser.parse("d:\\regex.xml");
                  }catch (Exception e) {
                   System.err.println(e);
    }The xml file is
                      <RegularExpression>
                      <REGEX>write</REGEX>
                      <REGEX>write \\d+</REGEX>
                      <REGEX>read[0-9]</REGEX>
                      </RegularExpression>by running the code you can see that write is found,write \\d+ doesn't match write 13 in the string and read[0-9] gives and error.
    Any help will be greatly appreciated

  • Perl Regular expression to java Regular Expression

    HI all,
    How can i write java Regular expression for the below Perl Code
    where data.html is my original Html file
    and data2.html is output file.
    open(FPR, "data.html") || die("Could not open data file");
    while ($line=<FPR>) {
    $content .= $line;
    close(FPR);
    open(FPR, ">data2.html") || die("Could not open data2 file");
    # clean white spaces
    $content =~ s/[\n\r\0 ]//g;
    # divide data by td
    $rxp='<tr.*?><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><\/.*?tr>';
    while ($content=~ m/$rxp/g)
    print FPR "\n".$1."\t".$2."\t".$3."\t".$4."\t".$5."\t".$6."\t".$7."\t".$8."\t";
    print FPR "<br>";
    close(FPR);
    can you help in this regard
    Thanks

    I am able to retrive only one row in this format from data.html file
    <trvalign=middlebordercolor=#ffffff><tdwidth='40'CLASS='tdbgpricespagecolorgrey'><fontface='Arial,Helvetica,sans-serif'size='2'>SB</font></td><t
    dwidth="23"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sansserif'size='2'>USAirways</font></td><tdwidth="34"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>MIA</font></td><tdwidth="31"Class=tdbgpri
    cespagecolorgrey><fontface='Arial,Helvetica,sans-erif'size='2'>LGW</font></td><tdwidth="23"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>USAirways</font></td><tdwidth="34"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>LGW</font></td>
    But i need the output in this format
    <fontface='Arial,Helvetica,sans-serif'size='2'>SB     <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways     <fontface='Arial,Helvetica,sans-serif'size='2'>MIA     <fontface='Arial,Helvetica,sans-serif'size='2'>LGW     <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways     <fontface='Arial,Helvetica,sans-serif'size='2'>LGW     <fontface='Arial,Helvetica,sans-serif'size='2'>MIA          <br>
    <fontface='Arial,Helvetica,sans-serif'size='2'>CS     <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways     <fontface='Arial,Helvetica,sans-serif'size='2'>MIA     <fontface='Arial,Helvetica,sans-serif'size='2'>LON     <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways     <fontface='Arial,Helvetica,sans-serif'size='2'>LON     <fontface='Arial,Helvetica,sans-serif'size='2'>MIA          <br>
    How can i rewrite the code to achive this.
    Here is my java code
    import java.io.*;
    import java.util.*;
    import java.util.regex.*;
    public class parseHTML {
    public static void main(String[] args)
    try
    BufferedReader in = new BufferedReader(new FileReader("C:\\data.html"));
    PrintWriter out = new PrintWriter(new FileWriter("C:\\data1.html"));
    String aLine = null;
    String abc=null;
    String pattern1 ="<tr.+?><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td>++";
    Pattern p1 = Pattern.compile(pattern1);
    while((aLine = in.readLine()) != null)
    abc=aLine.replaceAll("(\n|\t|\r)","").replaceAll(" ","");
    Matcher m1 = p1.matcher(abc);
    if(m1.find())
    System.out.println("the value is...."+m1.group());
    out.print(m1.group());
    m1.reset(aLine);
    in.close();
    out.close();
    catch(IOException exception)
    exception.printStackTrace();
    Thanks

  • Improving Java Regular Expression Compile Time

    Hi,
    Just wondering if anyone here knows how can i improve the compile time of Java Regular Expression?
    The following is fragment of my code which I tired to see the running time.
    Calendar rightNow = Calendar.getInstance();
    System.out.println("Compile Pattern");
    startCompileTime = rightNow.getTimeInMillis();
    Pattern p = Pattern.compile(reg, Pattern.CASE_INSENSITIVE);
    rightNow = Calendar.getInstance();
    endCompileTime = rightNow.getTimeInMillis();
    Below is fragment of my regular expression:
    (?:tell|state|say|narrate|recount|spin|recite|order|enjoin|assure|ascertain|demonstrate|evidence|distinguish|separate|differentiate|secern|secernate|severalize|tell apart) me (?:about|abou|asti|approximately|close to|just about|some|roughly|more or less|around|or so|almost|most|all but|nearly|near|nigh|virtually|well-nigh) java
    My regular expression is a very long one and the Pattern.compile just take too long. The worst case that I experience is 2949342 milliseconds.
    Any idea how can I optimise my regular expression so that the compilation time is acceptable.
    Thanks in advance

    My regular expression is a very long one and the
    Pattern.compile just take too long. The worst case
    that I experience is 2949342 milliseconds.Wow, that's pretty pathological. I was going to tell you that you were measuring something wrong, because I had written a test program that could compile a 1 Mb "or" pattern (10,000 words, 100 bytes per) in under 200 ms ... but then I noticed that your patterns have two "or" components, so reran my test, and got over 14 seconds to run with a smaller pattern.
    My guess is that the RE compiler, rather than decomposing the RE into a tree, is taking the naive approach of translating it into a state machine, and replicating the second component for each path through the first component.
    If you can create a simple hand-rolled parser, that may be your best option. However, it appears that your substrings aren't easily tokenized (some include spaces), so your best bet is to break the regexes into pieces at the "or" constructs, and use Pattern.split() to apply each piece sequentially.
    import java.util.Random;
    import java.util.regex.Pattern;
    public class RegexTest
        public static void main(String[] argv) throws Exception
            long initial = System.currentTimeMillis();
            String[] words = generateWords(10000);
    //        String patStr = buildRePortion(words);
    //        String patStr = buildRePortion(words) + " xxx ";
            String patStr = buildRePortion(words) + " xxx " + buildRePortion(words);
            long startCompile = System.currentTimeMillis();
            Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
            long finishCompile = System.currentTimeMillis();
            System.out.println("Number of components = " + words.length);
            System.out.println("ms to create pattern = " + (startCompile - initial));
            System.out.println("ms to compile        = " + (finishCompile - startCompile));
        private final static String[] generateWords(int numWords)
            String[] results = new String[numWords];
            Random rnd = new Random();
            for (int ii = 0 ; ii < numWords ; ii++)
                char[] word = new char[20];
                for (int zz = 0 ; zz < word.length ; zz++)
                    word[zz] = (char)(65 + rnd.nextInt(26));
                results[ii] = new String(word);
            return results;
        private static String buildRePortion(String[] words)
            StringBuffer sb = new StringBuffer("(?:");
            for (int ii = 0 ; ii < words.length ; ii++)
                sb.append(ii > 0 ? "|" : "")
                  .append(words[ii]);
            sb.append(")");
            return sb.toString();
    }

  • Java regular expression for CSV?

    I found several regular expressions in the internet to parse/split csv data lines. Howeverm, they all don't work with the Java regular expression API. Is there a regular expression to tokenize CSV fields for the Java regexp API?

    If the licensing of the above solution is too restrictive for you...I'm sure there are other types of parsers out there that do that type of thing.
    In the meantime, here is some code I cooked up (no GPL...use it freely) that might get you started.
    Don't know that it handles everything, but I never said it would...
    Please READ and let me know what changes could be made. I'm always looking for improvements in my understanding of regular expressions...
    import java.util.regex.*;
    import java.util.*;
    import java.util.List;
    public class Example
       final static Pattern CSV_PATTERN;
       final static Pattern DOUBLE_QUOTE;
       static
          String regex = "(?:  ([^q;]+) | (?: q ((?: (?:[^q]+) | (?:qq) )+ ) q)  );?";
          //                       1          2          a           b       3    4
          // So, pretend your quote character is q
          // (you can change it to \" later when you understand what's going on.)
          // This regex (when applied iteravely) matches a token that:
          // 1) contains NO QUOTE MARKS whatsoever (;'s) (in group 1)
          //                       or
          // 2) starts with a QUOTE, then contains either
          //    a) no quotes at all inside or
          //    b) double quotes (to escape a quote)
          // 3) and ends with a QUOTE.
          // 4) and is followed by a separator (optional for the last value)
          // Note that (a) and (b) are captured in group 2 of the regex.
          CSV_PATTERN = Pattern.compile(regex, Pattern.COMMENTS);
          DOUBLE_QUOTE = Pattern.compile("qq");
        * Attempts to parse Excel CSV stuff...
        * @param text the CSV text.
        * @return a list of tokens.
       public static List parseCsv(String text)
          Matcher csvMatcher = CSV_PATTERN.matcher(text);
          Matcher doubleQuotes = DOUBLE_QUOTE.matcher("");
          List list = new ArrayList();
          while (csvMatcher.find())
             if (csvMatcher.group(1) != null)
                // The first one matched.
                list.add(csvMatcher.group(1));
             else
                doubleQuotes.reset(csvMatcher.group(2));
                list.add(doubleQuotes.replaceAll("q"));
          return list;
    }

  • Java Regular Expressions in J2EE

    Does anybody know when Java Regular Expressions will be available in J2EE. They are currently in the latest release of J2SE in the java.util.regex package.

    They are in the Standard Edition, so it does not make sense that they will also be in Enterprise Edition some day. You need to have the standard JRE installed before you can use the J2EE classes anyway.
    If you want to use the regular expressions, install version 1.4 (beta) of the J2SE and use the current version of J2EE on top of that.
    Jesper

  • Regular Expressions and Double Byte Characters ?

    Is it possible to use Java Regular Expressions to parse
    a file that will contain double byte characters ?
    For example, I want a regular expression to match the following line
    tag="double byte stuff" id="double byte stuff"

    The comments on the bytes/strings were helpful. Thanks.
    But I'm still confused as to what matching pattern could be used.
    For example a pattern like:
    [A-Za-z]
    I assume would not match any double byte characters.
    I also assume the following won't work either:
    [\\p{Alpah}]
    because it is posix - US-ASCII only.
    So how do you say "match the tag, then take any characters,
    double byte, ascii, whatever, then match the text tag - per the
    original example ?

  • Java regular expression for Arabic

    i want to use java regular expression to evaluate some string in Arabic
    can some body tell me how to do a match for arabic characters

    i have this code :
    String poem="���";
          //String m1="\\p?";   
           String m1= "\\p{�}";
            Matcher m =
          Pattern.compile(m1)
            .matcher(poem);
        while(m.find()) {
          for(int j = 0; j <= m.groupCount(); j++)
            System.out.print("[" + m.group(j) + "]");
          System.out.println();
        }i get the error:
    Exception java.util.regex.PatternSyntaxException: Unknown character property name {?} near index 2
    \p?
    if you find that is hard to help with Arabic regex, can someone post a code on how to match Arabic regex or chineese or any thing not latin regex match
    because a need to match a Strings in Arabic if some one can tell me how?

Maybe you are looking for