Java – Regular Expressions – Finding any non digit byte in a multiple byte

Hello,
I’m new to JAVA and Regular Expressions; I’m trying to write a regular expression that will find any records that contain a non digit byte in a multiple byte field.
I thought the following was the correct expression but it is only finding records that contain “all” non digit bytes.
\D{1,}
\D = Non Digit
{1,} = at least 1 or more
Below is my sample data. I would like the regular expression to find all of the records that are not all numeric. However when I use the regular expression \D{1,} it is only finding the 2 records that all bytes are non digits. (i.e. “ “ and “A “)
“ 111229”
“2 111229”
“20091229”
“200912c9”
“201#1229”
“20101229”
“20110229”
“20111*29”
“20111029”
“20111229”
“20B11229”
“A “
“A0111229”
Please note I have also tried \D{1,}+ and \D{1,}? And they also do not return my desired results
Any assistance someone can provide would be greatly appreciated.

You don't show the code you are using but I surmise you are using String.matches() which requires that the whole target must match the regular expression not just part of it. Instead you should create a Pattern and then a Matcher and use the Matcher.find() method. Check the Javadoc for Pattern and Matcher and look at the Java regex tutorial - http://docs.oracle.com/javase/tutorial/essential/regex/ .
P.S. You can re-use the Pattern object - you don't have to create it every time you need one.
P.P.S. Java regular expressions work with characters not bytes and characters are not not not bytes.

Similar Messages

Logical AND in Java Regular Expressions

I'm trying to implement logical AND using Java Regular Expressions.
I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
Any ideas, how to do it ?
Baken

First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

Java regular expression for Arabic

i want to use java regular expression to evaluate some string in Arabic
can some body tell me how to do a match for arabic characters

i have this code :
String poem="��";
 //String m1="\\p?";
 String m1= "\\p{�}";
 Matcher m =
 Pattern.compile(m1)
 .matcher(poem);
 while(m.find()) {
 for(int j = 0; j <= m.groupCount(); j++)
 System.out.print("[" + m.group(j) + "]");
 System.out.println();
 }i get the error:
Exception java.util.regex.PatternSyntaxException: Unknown character property name {?} near index 2
\p?
if you find that is hard to help with Arabic regex, can someone post a code on how to match Arabic regex or chineese or any thing not latin regex match
because a need to match a Strings in Arabic if some one can tell me how?

Improving Java Regular Expression Compile Time

Hi,
Just wondering if anyone here knows how can i improve the compile time of Java Regular Expression?
The following is fragment of my code which I tired to see the running time.
Calendar rightNow = Calendar.getInstance();
System.out.println("Compile Pattern");
startCompileTime = rightNow.getTimeInMillis();
Pattern p = Pattern.compile(reg, Pattern.CASE_INSENSITIVE);
rightNow = Calendar.getInstance();
endCompileTime = rightNow.getTimeInMillis();
Below is fragment of my regular expression:
(?:tell|state|say|narrate|recount|spin|recite|order|enjoin|assure|ascertain|demonstrate|evidence|distinguish|separate|differentiate|secern|secernate|severalize|tell apart) me (?:about|abou|asti|approximately|close to|just about|some|roughly|more or less|around|or so|almost|most|all but|nearly|near|nigh|virtually|well-nigh) java
My regular expression is a very long one and the Pattern.compile just take too long. The worst case that I experience is 2949342 milliseconds.
Any idea how can I optimise my regular expression so that the compilation time is acceptable.
Thanks in advance

My regular expression is a very long one and the
Pattern.compile just take too long. The worst case
that I experience is 2949342 milliseconds.Wow, that's pretty pathological. I was going to tell you that you were measuring something wrong, because I had written a test program that could compile a 1 Mb "or" pattern (10,000 words, 100 bytes per) in under 200 ms ... but then I noticed that your patterns have two "or" components, so reran my test, and got over 14 seconds to run with a smaller pattern.
My guess is that the RE compiler, rather than decomposing the RE into a tree, is taking the naive approach of translating it into a state machine, and replicating the second component for each path through the first component.
If you can create a simple hand-rolled parser, that may be your best option. However, it appears that your substrings aren't easily tokenized (some include spaces), so your best bet is to break the regexes into pieces at the "or" constructs, and use Pattern.split() to apply each piece sequentially.
import java.util.Random;
import java.util.regex.Pattern;
public class RegexTest
 public static void main(String[] argv) throws Exception
 long initial = System.currentTimeMillis();
 String[] words = generateWords(10000);
// String patStr = buildRePortion(words);
// String patStr = buildRePortion(words) + " xxx ";
 String patStr = buildRePortion(words) + " xxx " + buildRePortion(words);
 long startCompile = System.currentTimeMillis();
 Pattern pattern = Pattern.compile(patStr, Pattern.CASE_INSENSITIVE);
 long finishCompile = System.currentTimeMillis();
 System.out.println("Number of components = " + words.length);
 System.out.println("ms to create pattern = " + (startCompile - initial));
 System.out.println("ms to compile = " + (finishCompile - startCompile));
 private final static String[] generateWords(int numWords)
 String[] results = new String[numWords];
 Random rnd = new Random();
 for (int ii = 0 ; ii < numWords ; ii++)
 char[] word = new char[20];
 for (int zz = 0 ; zz < word.length ; zz++)
 word[zz] = (char)(65 + rnd.nextInt(26));
 results[ii] = new String(word);
 return results;
 private static String buildRePortion(String[] words)
 StringBuffer sb = new StringBuffer("(?:");
 for (int ii = 0 ; ii < words.length ; ii++)
 sb.append(ii > 0 ? "|" : "")
 .append(words[ii]);
 sb.append(")");
 return sb.toString();
}

Problems with java regular expressions

Hi everybody,
Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
For example, I have this code in java:
import java.util.regex.*;
String text = "abc";
 Pattern p = Pattern.compile("(a)b(c)");
 Matcher m = p.matcher(text);
if (m.matches())
 int count = m.groupCount();
 System.out.println("Groups found " + String.valueOf(count) );
 for (int i = 0; i < count; i++)
 System.out.println("group " + String.valueOf(i) + " " + m.group(i));
My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
Groups found 2
group 0 abc
group 1 a
I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
I am using Java 1.5.0 on Mac OS X 10.4.
Thank to all who can help.

paulcw wrote:
If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
Matcher.groupCount():
Group zero denotes the entire pattern by convention. It is not included in this count.

SQL Injection and Java Regular Expression: How to match words?

Dear friends,
I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
Thanks,
Ricky
Edited by: Ricky Ru on 28/04/2011 02:29

Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

Perl Regular expression to java Regular Expression

HI all,
How can i write java Regular expression for the below Perl Code
where data.html is my original Html file
and data2.html is output file.
open(FPR, "data.html") || die("Could not open data file");
while ($line=<FPR>) {
$content .= $line;
close(FPR);
open(FPR, ">data2.html") || die("Could not open data2 file");
# clean white spaces
$content =~ s/[\n\r\0 ]//g;
# divide data by td
$rxp='<tr.*?><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><td.*?>(.*?)<\/.*?td><\/.*?tr>';
while ($content=~ m/$rxp/g)
print FPR "\n".$1."\t".$2."\t".$3."\t".$4."\t".$5."\t".$6."\t".$7."\t".$8."\t";
print FPR " ";
close(FPR);
can you help in this regard
Thanks

I am able to retrive only one row in this format from data.html file
<trvalign=middlebordercolor=#ffffff><tdwidth='40'CLASS='tdbgpricespagecolorgrey'><fontface='Arial,Helvetica,sans-serif'size='2'>SB</td><t
dwidth="23"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sansserif'size='2'>USAirways</td><tdwidth="34"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>MIA</td><tdwidth="31"Class=tdbgpri
cespagecolorgrey><fontface='Arial,Helvetica,sans-erif'size='2'>LGW</td><tdwidth="23"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>USAirways</td><tdwidth="34"Class=tdbgpricespagecolorgrey><fontface='Arial,Helvetica,sans-serif'size='2'>LGW</td>
But i need the output in this format
<fontface='Arial,Helvetica,sans-serif'size='2'>SB <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways <fontface='Arial,Helvetica,sans-serif'size='2'>MIA <fontface='Arial,Helvetica,sans-serif'size='2'>LGW <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways <fontface='Arial,Helvetica,sans-serif'size='2'>LGW <fontface='Arial,Helvetica,sans-serif'size='2'>MIA 
<fontface='Arial,Helvetica,sans-serif'size='2'>CS <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways <fontface='Arial,Helvetica,sans-serif'size='2'>MIA <fontface='Arial,Helvetica,sans-serif'size='2'>LON <fontface='Arial,Helvetica,sans-serif'size='2'>USAirways <fontface='Arial,Helvetica,sans-serif'size='2'>LON <fontface='Arial,Helvetica,sans-serif'size='2'>MIA 
How can i rewrite the code to achive this.
Here is my java code
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class parseHTML {
public static void main(String[] args)
try
BufferedReader in = new BufferedReader(new FileReader("C:\\data.html"));
PrintWriter out = new PrintWriter(new FileWriter("C:\\data1.html"));
String aLine = null;
String abc=null;
String pattern1 ="<tr.+?><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td><td.+?>(.+?)</.+?td>++";
Pattern p1 = Pattern.compile(pattern1);
while((aLine = in.readLine()) != null)
abc=aLine.replaceAll("(\n|\t|\r)","").replaceAll(" ","");
Matcher m1 = p1.matcher(abc);
if(m1.find())
System.out.println("the value is...."+m1.group());
out.print(m1.group());
m1.reset(aLine);
in.close();
out.close();
catch(IOException exception)
exception.printStackTrace();
Thanks

Java regular expression for CSV?

I found several regular expressions in the internet to parse/split csv data lines. Howeverm, they all don't work with the Java regular expression API. Is there a regular expression to tokenize CSV fields for the Java regexp API?

If the licensing of the above solution is too restrictive for you...I'm sure there are other types of parsers out there that do that type of thing.
In the meantime, here is some code I cooked up (no GPL...use it freely) that might get you started.
Don't know that it handles everything, but I never said it would...
Please READ and let me know what changes could be made. I'm always looking for improvements in my understanding of regular expressions...
import java.util.regex.*;
import java.util.*;
import java.util.List;
public class Example
   final static Pattern CSV_PATTERN;
   final static Pattern DOUBLE_QUOTE;
   static
      String regex = "(?: ([^q;]+) | (?: q ((?: (?:[^q]+) | (?:qq) )+ ) q) );?";
      //                       1          2          a           b       3    4
      // So, pretend your quote character is q
      // (you can change it to \" later when you understand what's going on.)
      // This regex (when applied iteravely) matches a token that:
      // 1) contains NO QUOTE MARKS whatsoever (;'s) (in group 1)
      //                       or
      // 2) starts with a QUOTE, then contains either
      //    a) no quotes at all inside or
      //    b) double quotes (to escape a quote)
      // 3) and ends with a QUOTE.
      // 4) and is followed by a separator (optional for the last value)
      // Note that (a) and (b) are captured in group 2 of the regex.
      CSV_PATTERN = Pattern.compile(regex, Pattern.COMMENTS);
      DOUBLE_QUOTE = Pattern.compile("qq");
    * Attempts to parse Excel CSV stuff...
    * @param text the CSV text.
    * @return a list of tokens.
   public static List parseCsv(String text)
      Matcher csvMatcher = CSV_PATTERN.matcher(text);
      Matcher doubleQuotes = DOUBLE_QUOTE.matcher("");
      List list = new ArrayList();
      while (csvMatcher.find())
         if (csvMatcher.group(1) != null)
            // The first one matched.
            list.add(csvMatcher.group(1));
         else
            doubleQuotes.reset(csvMatcher.group(2));
            list.add(doubleQuotes.replaceAll("q"));
      return list;
}

Java Regular Expressions in J2EE

Does anybody know when Java Regular Expressions will be available in J2EE. They are currently in the latest release of J2SE in the java.util.regex package.

They are in the Standard Edition, so it does not make sense that they will also be in Enterprise Edition some day. You need to have the standard JRE installed before you can use the J2EE classes anyway.
If you want to use the regular expressions, install version 1.4 (beta) of the J2SE and use the current version of J2EE on top of that.
Jesper

Regular Expressions find and replace

Hi ,
I have a question on using Regular Expressions in Java(java.util.regex).
Problem Description:
I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
I have to do it inside a servlet and hence have to use java.
I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
//Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
 String ddurl="http://www.google.com/";
String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
Matcher m = p.matcher (strHTML);
while(m.find())
m.replaceAll(ddurl+m.group(1));
what is wrong in this?
Thanks,
Rajiv

Right, here's the full monte (whatever that means):import java.util.regex.*;
public class Test1
public static void main(String[] args)
 String domain = "http://www.google.com/";
 String strHTML =
 " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
 " <img alt=\"Google logo\" src=images/logo.gif >\n" +
 " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
 " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
 " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
 String regex =
 "(<\\s*img.+?src\\s*=\\s*) # Capture preliminaries in $1. \n" +
 "(?: # First look for URL in quotes. \n" +
 " ([\"\']) # Capture open quote in $2. \n" +
 " (?!http:) # If it isn't absolute... \n" +
 " /?(.+?) # ...capture URL in $3 \n" +
 " \\2 # Match the closing quote \n" +
 " | # Look for non-quoted URL. \n" +
 " (?!http:) # If it isn't absolute... \n" +
 " /?([^\\s>]+) # ...capture URL in $4 \n" +
 Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
 Matcher m = p.matcher(strHTML);
 StringBuffer sbuf = new StringBuffer();
 while (m.find())
 String relURL = m.group(3) != null ? m.group(3) : m.group(4);
 m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
 m.appendTail(sbuf);
 System.out.println(sbuf.toString());
}First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

Java Regular Expressions and Pattern

I have a file that i first want to get all the lines that match a given pattern. Then from these lines that match i want to extract two values.
Example line for the pattern to match
INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'
So all the lines that are like these i want to extract two variables
2006/11/07 15:14:09
and
/opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf
so i can store these variables in a database.
Can someone help me with writing the pattern to match and the regular express to extract? Also if anyone else has a better way of doing this i am all ears and i have a lot of log files to go through.

import java.util.regex.*;
class Main
public static void main(String[] args)
    String txt="INFO | jvm 1 | 2006/11/07 15:14:09 | INFO | Tue Nov 07 15:14:09 CET 2006 | XLDB PPS Data Dumper: MESSAGE:- 406 Processing .. '[ /opt/nexus/horizon/raw_data/network/pp_CE01S4H_sta_20050703T015717_SYDP3001_546.bdf ]'";
    String re1=".*?";     // Non-greedy match on filler
    String re2="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:T|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))";     // Time Stamp 1
    String re3=".*?";     // Non-greedy match on filler
    String re4="((?:\\/[\\w\\.]+)+)";     // Unix Path 1
    Pattern p = Pattern.compile(re1+re2+re3+re4,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    Matcher m = p.matcher(txt);
    if (m.find())
        String timestamp1=m.group(1);
        String unixpath1=m.group(2);
        System.out.print("("+timestamp1.toString()+")"+"("+unixpath1.toString()+")"+"\n");
}

Help with java regular expressions

Hi all ,
i am going to match a patternstring against an input string and print the result here is my code:
     import java.util.regex.*;
     import java.util.*;
     public class Main {
          private static final String CASE_INSENSITIVE = null;
          public static void main(String[] args)
          CharSequence inputStr = "i have 5 years FMCG saLEs exp on java/j2ee and i worked on java and j2ee and 2 projects on telecom java j2ee domain with your with saLEs maNAger experience of java j2ee and c# having very good on c++ exposure in JAVA"
         String patternStr = "\"java j2ee\" and \"c#\"";
          StringTokenizer st = new StringTokenizer(patternStr,"\",OR");
         Matcher matcher=null;
          while(st.hasMoreTokens()){
               String s=st.nextToken();
               Pattern pattern = Pattern.compile(s,Pattern.CASE_INSENSITIVE);
           matcher = pattern.matcher(inputStr);
           while (matcher.find()) {
              String result = matcher.group();
             if(!result.equalsIgnoreCase(" "))
                         System.out.println("result:"+result);
     when i compile this code i am getting the expected result...ie
result:java j2ee
result:java j2ee
result: and
result: and
result: and
result: and
result: and
result: and
result:c#
but when i replace String patternStr = "\"java j2ee\" and \"c#\""; with
String patternStr = "\"java j2ee\" and \"c++\""; i am just getting c in the result instead of c++ ie i am getting result :
result:java j2ee
result:java j2ee
result: and
result: and
result: and
result: and
result: and
result: and
result:C
result:c
result:c
result:c
result:c
result:c
result:c
In the last lines i should get result:c++ instead of result: c
Any ideas please
Thanks

In the last lines i should get result:c++ instead of result: cThe regular expression parser considers the plus sign '+' a special
character; it means: one or more times the previous regular expression.
So 'c++' means one or more 'c's on or more times. Obviously you don't
want that, you want a literal '+' plus sign. You can do that by prepending
the '+' with a backslash '\'. Unfortunately, the javac compiler considers
a backslash a special character and therefore you have to 'escape'
the backslash also, by adding another backslash. The result looks
like this:"c\\+\\+"kind regards,
Jos

Java regular expression patteren.

Ok can somone help me here. My freind wrote this program in java script and im trying to redo it in java. In one part he uses a regular expression some im trying to do th same thing but i cant gt it, heres what i have so far,
import java.util.regex.*;
public class Fred101
    public static void main(String[] args) throws Exception
        final String[] lines =
            "2-03-2007 02:59 [113.219.143.111] installed virus OpenRelay-backdoor.vspam on [localhost]",
            "12-03-2007 02:58 [113.219.143.111] uploaded file OpenRelay-backdoor.vspam(0.003 Gb) to [localhost]",
            "12-03-2007 02:56 admin logged in from []",
            "12-03-2007 02:55 admin logged in from [131.37.190.!]",
            "12-03-2007 02:55 [145.45.125.215] downloaded file Antivirus beta program.av(0.014 Gb) from [localhost]",
            "12-03-2007 02:53 [266.54.36.123] uploaded file Marketing.mailer(0.005 Gb) to [localhost]",
        final Pattern IPPattern = Pattern.compile("( /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))");
        for (String line : lines)
            Matcher m = IPPattern.matcher(line);
            while(m.find())
                System.out.println(m.group());
}if it worked correctly this should be the out print. AAnyone see whats wrong.
113.219.143.111
113.219.143.111
145.45.125.215

It almost did i used mainly ur code but i used some of the other guys.
import java.util.regex.*;
class main
public static void main(String[] args)
      final String[] lines =
          "2-03-2007 02:59 [113.219.143.111] installed virus OpenRelay-backdoor.vspam on [localhost]",
          "12-03-2007 02:58 [113.219.143.111] uploaded file OpenRelay-backdoor.vspam(0.003 Gb) to [localhost]",
          "12-03-2007 02:56 admin logged in from []",
          "12-03-2007 02:55 admin logged in from [131.37.190.!]",
          "12-03-2007 02:55 [145.45.125.215] downloaded file Antivirus beta program.av(0.014 Gb) from [localhost]",
          "12-03-2007 02:53 [266.54.36.123] uploaded file Marketing.mailer(0.005 Gb) to [localhost]",
    String re1=".*?";     // Non-greedy match on filler
    String re2="((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\\d])";     // IPv4 IP Address 1
    Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    //Matcher m = p.matcher(txt);
    for (String line : lines){
         Matcher m = p.matcher(line);
         while(m.find()){
              String ipaddress1=m.group(1);
            System.out.print("("+ipaddress1.toString()+")"+"\n");
    }which out printed:
(113.219.143.111)
(113.219.143.111)
(145.45.125.215)
(66.54.36.123)
see the problem?
the (66.54.36.123)
shouldnt bee their because it wasnt a vaild ip. It was originally 266.54.36.123 but since it wasnt balid it made it valid. Which it shouldnt it.
Message was edited by:
krrose27

Regular Expressions: Greedy vs Non-Greedy

Guys, I just can't explain and find any explanation in the doc for such a behaviour:
SQL> with t as (select 'the 1 january of the year 2007' str from dual)
2 select regexp_substr(str,'.*?[[:digit:]][ ][[:alpha:]]+.*') substr1,
3         regexp_substr(str,'.*?[[:digit:]][ ][[:alpha:]]+.*$') substr2
4 from t
5 /
SUBSTR1       SUBSTR2
the 1 january the 1 january of the year 2007
SQLthe first part of a pattern is '.*?' - non-greedy seacrh for combination of any symbols.
It is followed by 1 digit, then 1 space then a consequent greedy combination of alpha characters
an the last part of the mask is '.*' in the first case and '.*$' in the second.
The only difference in '$' in the end.
AFAIK '.*' in the first case should stand for GREEDY search of a combination of any symbols.
So in my opinion if '.*' stands in the end of the mask it should be equivalent to '.*$',
but somehow it becomes NON-GREEDY.
I just can't explain why.
Can anyone help?
Thanks.
PS
SQL> select * from v$version;
BANNER
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE     10.2.0.1.0     Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL

This doesn't make any sense at all to me either. The only thing I did find that could explain it is at:
http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions116.htm#SQLRF06303
match_parameter is a text literal that lets you change the default matching behavior of the function. You can specify one or more of the following values for match_parameter:
'n' allows the period (.), which is the match-any-character character, to match the newline character. If you omit this parameter, the period does not match the newline character.
So maybe since the . doesn't match newline and $ does?

Regular expression - find repeating +++ signs

I'm trying to use regular expressions to remove duplicate +++ signs in a string. When I test my pattern using the expresso test (www.ultrapico.com) it parses the string correctly, in Java 1.5 it doesn't work. .. mp.matches() is always false. Any suggestions would be appreciated.
finalLongstring = "TTL1,clip1+TTL2+++clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,TTL28,clip28,TTL29,clip29";
Pattern multiplePunctuation=null;
          multiplePunctuation=Pattern.compile("[,+]{2,6}");
          //                                     | |
          //                                     | 2 or more times
          //                                     a comma or plus character
          Matcher mp=multiplePunctuation.matcher(finalLongstring);
          if(mp.matches()){
               finalLongstring=mp.replaceAll("+");
/code]

Answere in your other thread.
http://forum.java.sun.com/thread.jspa?threadID=5143654

Java – Regular Expressions – Finding any non digit byte in a multiple byte

Similar Messages

Maybe you are looking for