Parsing xhtml using java.util.regex

I am parsing an XHTML file using the java.util.regex package and I am perplexed at why the following doesn�t work.
The lines I wish to match are either like this:
<span class="someclass"><b>Some String.</b></span></td>
or
Some String.</td>
The code I use to try to achieve this is:
Pattern somePattern = Pattern.compile(".*(<span class=\"someclass\"><b>)?(.*)[.](</b></span>)?</td>.*");
String s = null;
while((s = br.readLine()) != null) {
if(somePattern.matcher(s).matches()) {
System.out.println("0:" + eventMatcher.group(0));
System.out.println("1:" + eventMatcher.group(1));
System.out.println("2:" + eventMatcher.group(2));
System.out.println("3:" + eventMatcher.group(3));
I expect to get as output
0:<span class="someclass"><b>Some String.</b></span></td> 1:<span class="someclass"><b>
2:Some String
3:</b></span>
or
0:Some String.</td>
1:null
2:Some String
3:null
depending on which lines provide the match as mentioned above. Instead I get:
0:<span class="someclass"><b>Some String.</b></span></td>
1:null
2:(empty string)
3:</b></span>
or
0:Some String.</td>
1:null
2:(empty string)
3:null
Any ideas? Thanks in advance.

Consider the terms of ".*(<span class=\"someclass\"><b>)?(.*)[.](</b></span>)?</td>.*"
.* - greedily collect characters
(<span class=\"someclass\"><b>)? - optionallly collect information taht will always be matched by the previous .* pattern so will be empty.
(.*) - greedily collect characters that will also have been swallowed by the first .* so will be empty
[.] - a single .
(</b></span>)? - optionally collection
</td> - must be there
.* - collect the rest of the charcters in the line.
Therefore in general groups 1 and 2 will be empty because the first .* will have collected the information you wanted to capture!
You could just make the first .* non-greedy by using .*? but this may fail for other reasons.
So, in general terms, what are you trying to extract?

Similar Messages

  • How to check special characters in java code using Java.util.regex package

    String guid="first_Name;Last_Name";
    Pattern p5 = Pattern.compile("\\p{Punct}");
    Matcher m5 =p5.matcher(guid);
    boolean test=m5.matches();
    I want to find out the weather any speacial characters are there in the String guid using regex.
    but above code is always returning false. pls suggest.

    Pattern.compile ("[^\\w]");The above will match any non [a-zA-Z0-9_] character.
    Or you could do
    Pattern.compile("[^\\s^\\w]");This should match anything that is not a valid charcter and is not whitespace.

  • Remove all the special characters using java.util.regex

    Hi,
    How to remove the all the special characters in a String[] using regex, i have the following:-
    public class RegExpTest {
         private static String removeSplCharactersForNumber(String[] number) {
              String number= null;
              Matcher m = null;
                   Pattern p = Pattern.compile("\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)\\_\\+\\-\\{\\}\\|\\;\\\\\\'////\\,\\.\\?\\<\\>\\[\\]");
                   for (int i = 0; i < number.length; i++) {
                   m = p.matcher(number);
                   if (m.find()) {
                        number= m.replaceAll("");
                   System.out.println("Final Number is:::"+number);
                   return number;
              public static void main(String args[]){
                   String[] str = {"raghav!@#$%^&*()_+"};
                   RegExpTest regExpTest = new RegExpTest();
                   regExpTest.removeSplCharactersForNumber(str);
    This code is not working and m.find() is "false", here i want the output to be raghav for the entered string array, not only that it should remove all the special characters for a entered string[]. Is there a simple way to do this to remove all the special characters for a given string[]? More importantly the "spaces" (treated as a spl. character), should be removed as well. Please do provide a solution to this.
    Thanks

    You don't need the find(). Just use the replaceAll() on each element of the String[] i.e.
    String[] values = ...
    for (int i = 0; i < values.length; i++)
        values[i] = p.matcher(values).replaceAll("");
    }I can't understand your regex since the forum software has mangled it but you just need to add a space to the set of chars to remove. When you post code, surround it with CODE tags then the forum software won't mangle it.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • Doubt in Regular Expressions : java.util.regex

    I want to identify words starting with capital letters in a sentence and I want to replace the matched word with "#" added in front of it.... For example, if my input sentence is
    "Christopher Williams asked Mike Muller a question"
    my output should be,
    "#Christopher #Williams asked #Mike #Muller a question"
    How do I do that using java.util.regex ?
    In perl we can can use *"back references"* in *"replacement string"* . Perl replacement accepts back references whereas java replacement method accepts only strings....
    Please help me.....

    Your replacement is swallowing the space before the uppercase character, and won't match at the beginning of the line.
    Also, it's unnecesarily verbose. String has a replaceAll method (that calls the same methods of Pattern and Matcher under the covers)sentence = s.replaceAll("(^| )([A-Z])", "$1#$2");Disclaimer: I'm no prome, sabre or u/a :-) That can probably be simplified.
    db

  • Ignore word (Java.util.regex )

    Hello All,
    Can anyone help me to solve this probelm:
    Probelm: I have a text file and i want to search a word or combination of words in that using java.util.regex
    Example : in the sentence "things like the Forestry in the Commission (FC)." i want to search "Forestry Commission" while ignoring the word "in the". This ignoring criteria is specific i.e. search return true only if it ignore "in the" word not any other word.
    Also how i ignore multiple words in ignore condition.
    Thanks in advance.

    Try out this line of code:
    System.out.println("Forestry in the Commission".replaceAll("Forestry\\s(.*?\\s)?Commission", "Hello"));In EBNF, it looks like this:
    match ::= "Forestry" <whitespace> [<character> <whitespace>] "Commission"
    whitespace ::= <space> | \t | \n | \x0B | \f | \r
    character ::= (any one character)
    This, of course, is an ambiguous EBNF definition. The breakdown of the expression, however, reveals why this works. In the string, "\\s" refers to a character of whitespace. "(.*?\\s)?" is where the magic happens: it causes ".*?\\s" to happen either not at all or once. ".*?" will consume as few "." (any character) as possible to make the match, and the following "\\s" is to make sure that strings like "Forestry deCommission" don't match. The EBNF's ambiguity comes from EBNF's lack of ability to describe "reluctant qualifiers": qualifiers that indicate that as few of the given expression should be matched as possible.
    Cheers!

  • Please help on java.util.regex.*

    Hi all,
    My RTF file looks like this:
    Project Num\tab N/A
    \par Project Name\tab Hook-up Installation and Service
    \par
    My intension is to read the file until the \tab and store Project Num as a string into a
    variable. Similarly read until \par and store the value of Project Num into another variable.
    So that i can use those variables further in my program.
    I used java.util.regex.* package for this purpose. I could successfully split the sentence whenever it sees \tab and \par but don't know how to get the text before and after the delimeters and store them into variables.
    The code which i wrote is:
    import java.util.regex.*;
    import java.io.*;
    import java.nio.*;
    import java.nio.charset.*;
    import java.nio.channels.*;
    public class RegexDemo{
    public static void main(String[] args){
    // Create a pattern to match breaks
    Pattern p = Pattern.compile("\\\\.[a-z][a-z]",Pattern.DOTALL);
    try{
    File file = new File("sample.rtf");
    FileInputStream fis = new FileInputStream(file);
    FileChannel fc = fis.getChannel();
    // Get a CharBuffer from the source file
    ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int)fc.size());
    Charset cs = Charset.forName("8859_1");
    CharsetDecoder cd = cs.newDecoder();
    CharBuffer cb = cd.decode(bb);
    // Run some matches
    Matcher m = p.matcher(cb);
    while (m.find())
    System.out.println("Found comment: "+m.group());
    }catch(Exception e){
         e.printStackTrace();
    Please somebody help me in this regard. I have spent lot of time searching the forums but couldn't find any solution.
    Thanks in advance
    rnallu

    Just put target inside parenthesis with delimiters at boundaries.
    Example: "(\\w+)\\t(\\d)\\s" will match occurrences of a word followed with a tab char then a digit followed with a whitespace. If target string matches pattern then m.group(1) contains the word and m.group(2) contains the digit.

  • Who use sql-mapping with java.util.regex?

    Hi everyone:
    I use the IBatis SQL-Mapping and I think it is very good.Now I want to add the search function to my BBS forum.I also want to display the content high light like jive.I mean that if I want to find the string "ibatis",then the search result "ibatis" will be high light displayed.
    So I must use the java.util.regex in jsdk1.4.But the problem is that what I get is a List if I use sql-mapping.For example:
              String resource="conf/XML/sql/lyo-sql-map.xml";
              Reader reader=Resources.getResourceAsReader(resource);
              sqlmap=XmlSqlMapBuilder.buildSqlMap(reader);
              List articlelist=sqlmap.executeQueryForList("selectSiteArticle","%"+icontent+"%");
    The result I get is a List and I have no time to use regex.
    I don't know whether I could do this:
    Iterate the List,use the regex and later place all the object back to the List.
    It's right?
    How to use regex with sql-mapping?Thks

    Any idea? :(

  • RFC used for java.util.regex

    Hi,
    Does anyone know the RFC used for java.util.regex ??
    Thanks & Regards,
    Gurushant Hanchinal

    Can you please give me the link to view to specifications for java.util.regex.. I have tried the link which is available in :
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html page with name " Java Language Specification"
    on click of this link, i am getting page not found error..
    Please give me any other alternate links to view the regular expression specifications..
    Thanks,
    Gurushant Hanchinal

  • Regular expressions with java.util.regex

    Hello Guys,
    I wrote last time this
    * Uses split to break up a string of input separated by
    * commas and/or whitespace.
    * See: http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/
    * Change: I have slightly modified the source
    import java.util.regex.*;
    public class Splitter {
    public static void main(String[] args) throws Exception {
    // Create a pattern to match breaks
    Pattern p = Pattern.compile("[<>\\s]+");
    // Split input with the pattern
    String[] result =
    p.split("<element attributname1 = \"attributwert1\" attributname2 = \"attributwert2\">");
    for (int i=0; i<result.length; i++)
    if (result.equals(""))
    System.out.println("EMPTY");
    else
    System.out.println(result[i]);
    int res = result.length - 1;
    System.out.println("\nStringlaenge: " + res);
    I wonder, why I got an empty element in reult[0]. Have anyone an idea?
    We'll come together next time
    ... �nhan Inay ([email protected])

    What is wrong with this Pattern?
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    This time i used following Split:
    p.split("<element attributname1=\"attributwert1\" attributname2=\"attributwert2\">");
    I've got a compilation error:
    U:\qms_neu\htdocs\inay\Source\myWork\Regex-Samples>javac Splitter.java
    Splitter.java:14: illegal start of expression
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    ^
    Splitter.java:14: illegal character: \92
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    ^
    Splitter.java:14: illegal character: \92
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    ^
    Splitter.java:14: unclosed string literal
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    ^
    Splitter.java:17: ')' expected
    p.split("<element attributname1=\"attributwert1\" attributname2
    =\"attributwert2\">");
    ^
    Splitter.java:14: unexpected type
    required: variable
    found : value
    Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
    ^
    Splitter.java:18: cannot resolve symbol
    symbol : variable result
    location: class Splitter
    for (int i=0; i<result.length; i++)
    ^
    Splitter.java:19: cannot resolve symbol
    symbol : variable result
    location: class Splitter
    if (result.equals("")){
    ^
    Splitter.java:21: cannot resolve symbol
    symbol : variable result
    location: class Splitter
    System.out.println(result[0]);
    ^
    Splitter.java:24: cannot resolve symbol
    symbol : variable result
    location: class Splitter
    System.out.println(result[i]);
    ^
    Splitter.java:25: cannot resolve symbol
    symbol : variable result
    location: class Splitter
    int res = result.length - 1;
    ^
    11 errors

  • Regular Expressions (java.util.regex)

    I am developing using a product that must
    use java 1.2.2_05a but I want to use regular
    expressions, does anybody know where I can
    get of the package java.util.regex without
    having to download the whole java 1.4 release.
    Or does someone know of an alternative that
    I can use ?

    There is another regex pack for java available from Apache Foundation Project. You can try it.
    Take a look at http://jakarta.apache.org/

  • [bug]Jdev 11g:NullPointerException at java.util.regex.Matcher.getTextLength

    Hi,
    Jdev 11.1.1.0.31.51.56
    If somebody of you get the following trace stack when running a jspx using ViewCriteriaRow.setOperator :
    There is bug 7534359 and metalink note 747353.1 available.
    java.lang.NullPointerException
    at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
    at java.util.regex.Matcher.reset(Matcher.java:291)
    at java.util.regex.Matcher.<init>(Matcher.java:211)
    at java.util.regex.Pattern.matcher(Pattern.java:888)
    at oracle.adfinternal.view.faces.model.binding.FacesCtrlSearchBinding._loadFilter
    CriteriaValues(FacesCtrlSearchBinding.java:3695)
    Truncated. see log file for complete stacktrace
    Workaround:
    If you use 
            vcr.setAttribute("Job",job);
    or
            vcr.setAttribute("Job","="+job);
    than add following line of code:
            vcr.setOperator("Job","=");   regards
    Peter

    Hi,
    useful to mention that this happens when setting the equal operator or LIKE operator
    vcr.setAttribute("Job","= '"+job+"'");
    or
    vcr.setOperator("Job","=");
    Frank

  • About the error of java.util.regex in jdk1.4's docs

    In java.util.regex,the class Pattern's document says:
    Greedy quantifiers
    X? X, once or not at all
    X* X, zero or more times
    X X, one or more times
    X{n} X, exactly n times
    X(n,} X, at least n times
    X{n,m} X, at least n but not more than m times
    Why don't metion �X+�?
    I think that should be "X+ X, one or more times",right?

    Agreed. I use Regex in many places (and used
    Oromatcher before 1.4), and I've verified that I
    use the + operator in several places-- it works.

  • Problem in Creating a jar file using java.util.jar and deploying in jboss 4

    Dear Techies,
    I am facing this peculiar problem. I am creating a jar file programmatically using java.util.jar api. The jar file is created but Jboss AS is unable to deploy this jar file. I have also tested that my created jar file contains the same files. When I create a jar file from the command using jar -cvf command, Jboss is able to deploy. I am sending the code , please review it and let me know the problem. I badly require your help. I am unable to proceeed in this regard. Please help me.
    package com.rrs.corona.solutionsacceleratorstudio.solutionadapter;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.util.jar.JarEntry;
    import java.util.jar.JarOutputStream;
    import java.util.jar.Manifest;
    import com.rrs.corona.solutionsacceleratorstudio.SASConstants;
    * @author Piku Mishra
    public class JarCreation
         * File object
         File file;
         * JarOutputStream object to create a jar file
         JarOutputStream jarOutput ;
         * File of the generated jar file
         String jarFileName = "rrs.jar";
         *To create a Manifest.mf file
         Manifest manifest = null;
         //Attributes atr = null;
         * Default Constructor to specify the path and
         * name of the jar file
         * @param destnPath of type String denoting the path of the generated jar file
         public JarCreation(String destnPath)
         {//This constructor initializes the destination path and file name of the jar file
              try
                   manifest = new Manifest();
                   jarOutput = new JarOutputStream(new FileOutputStream(destnPath+"/"+jarFileName),manifest);
              catch(Exception e)
                   e.printStackTrace();
         public JarCreation()
         * This method is used to obtain the list of files present in a
         * directory
         * @param path of type String specifying the path of directory containing the files
         * @return the list of files from a particular directory
         public File[] getFiles(String path)
         {//This method is used to obtain the list of files in a directory
              try
                   file = new File(path);
              catch(Exception e)
                   e.printStackTrace();
              return file.listFiles();
         * This method is used to create a jar file from a directory
         * @param path of type String specifying the directory to make jar
         public void createJar(String path)
         {//This method is used to create a jar file from
              // a directory. If the directory contains several nested directory
              //it will work.
              try
                   byte[] buff = new byte[2048];
                   File[] fileList = getFiles(path);
                   for(int i=0;i<fileList.length;i++)
                        if(fileList.isDirectory())
                             createJar(fileList[i].getAbsolutePath());//Recusive method to get the files
                        else
                             FileInputStream fin = new FileInputStream(fileList[i]);
                             String temp = fileList[i].getAbsolutePath();
                             String subTemp = temp.substring(temp.indexOf("bin")+4,temp.length());
    //                         System.out.println( subTemp+":"+fin.getChannel().size());
                             jarOutput.putNextEntry(new JarEntry(subTemp));
                             int len ;
                             while((len=fin.read(buff))>0)
                                  jarOutput.write(buff,0,len);
                             fin.close();
              catch( Exception e )
                   e.printStackTrace();
         * Method used to close the object for JarOutputStream
         public void close()
         {//This method is used to close the
              //JarOutputStream
              try
                   jarOutput.flush();
                   jarOutput.close();
              catch(Exception e)
                   e.printStackTrace();
         public static void main( String[] args )
              JarCreation jarCreate = new JarCreation("destnation path where jar file will be created /");
              jarCreate.createJar("put your source directory");
              jarCreate.close();

    Hi,
    I have gone through your code and the problem is that when you create jar it takes a complete path address (which is called using getAbsolutePath ) (when you extract you see the path; C:\..\...\..\ )
    You need to truncate this complete path and take only the path address where your files are stored and the problem must be solved.

  • Java.util.regex error

    Hello,
    I checked JavaDoc multiple times but do not see what is wrong with
    myString.replaceAll("D:\\web\\mars","")which results in
    java.util.regex.PatternSyntaxException: Illegal/unsupported escape squence near index 7
    D:\web\mars
           ^
         at java.util.regex.Pattern.error(Unknown Source)
         at java.util.regex.Pattern.escape(Unknown Source)
         at java.util.regex.Pattern.atom(Unknown Source)
         at java.util.regex.Pattern.sequence(Unknown Source)
         at java.util.regex.Pattern.expr(Unknown Source)
         at java.util.regex.Pattern.compile(Unknown Source)
         at java.util.regex.Pattern.<init>(Unknown Source)
         at java.util.regex.Pattern.compile(Unknown Source)
         at java.lang.String.replaceAll(Unknown Source)
         at ArticleImageImportProcessor.main(ArticleImageImportProcessor.java:40)
    Exception in thread "main" please, every suggestion/hint is most appeciated

    You have to "encode" backslash twice, first for String purpose and second time because of special meaning of '\' in regular expressions.
    It should looks like
    myString.replaceAll("D:\\\\web\\\\mars","")

  • How do I estimate time takes to Zip/Unzip using java.util.zip ?

    If I am compressing files or uncompressing zips using java.util.zip is there a way for me to tell the user how long it should take or for perhaps displaying a progress monitor ?

    For unzip use the ZipInputStream and pass it a CountingInputStream that keeps track ofr the number of bytes read from it (you write this). This CountingInputStream extends fileInputStream and as such can provide you with information about the total number of bytes available to be read and the number already read. This can give a crude idea of how much work has been done and how much work still needs to be done. It is inaccurate but should be good enoough for a progress bar
    As for zipping use the ZipOutputStream and pass it blocks of information. Keep track of the number of blocks you have written and the number you still need to write.
    matfud

Maybe you are looking for