Parsing xhtml using java.util.regex
I am parsing an XHTML file using the java.util.regex package and I am perplexed at why the following doesn�t work.
The lines I wish to match are either like this:
<span class="someclass"><b>Some String.</b></span></td>
or
Some String.</td>
The code I use to try to achieve this is:
Pattern somePattern = Pattern.compile(".*(<span class=\"someclass\"><b>)?(.*)[.](</b></span>)?</td>.*");
String s = null;
while((s = br.readLine()) != null) {
if(somePattern.matcher(s).matches()) {
System.out.println("0:" + eventMatcher.group(0));
System.out.println("1:" + eventMatcher.group(1));
System.out.println("2:" + eventMatcher.group(2));
System.out.println("3:" + eventMatcher.group(3));
I expect to get as output
0:<span class="someclass"><b>Some String.</b></span></td> 1:<span class="someclass"><b>
2:Some String
3:</b></span>
or
0:Some String.</td>
1:null
2:Some String
3:null
depending on which lines provide the match as mentioned above. Instead I get:
0:<span class="someclass"><b>Some String.</b></span></td>
1:null
2:(empty string)
3:</b></span>
or
0:Some String.</td>
1:null
2:(empty string)
3:null
Any ideas? Thanks in advance.
Consider the terms of ".*(<span class=\"someclass\"><b>)?(.*)[.](</b></span>)?</td>.*"
.* - greedily collect characters
(<span class=\"someclass\"><b>)? - optionallly collect information taht will always be matched by the previous .* pattern so will be empty.
(.*) - greedily collect characters that will also have been swallowed by the first .* so will be empty
[.] - a single .
(</b></span>)? - optionally collection
</td> - must be there
.* - collect the rest of the charcters in the line.
Therefore in general groups 1 and 2 will be empty because the first .* will have collected the information you wanted to capture!
You could just make the first .* non-greedy by using .*? but this may fail for other reasons.
So, in general terms, what are you trying to extract?
Similar Messages
-
How to check special characters in java code using Java.util.regex package
String guid="first_Name;Last_Name";
Pattern p5 = Pattern.compile("\\p{Punct}");
Matcher m5 =p5.matcher(guid);
boolean test=m5.matches();
I want to find out the weather any speacial characters are there in the String guid using regex.
but above code is always returning false. pls suggest.Pattern.compile ("[^\\w]");The above will match any non [a-zA-Z0-9_] character.
Or you could do
Pattern.compile("[^\\s^\\w]");This should match anything that is not a valid charcter and is not whitespace. -
Remove all the special characters using java.util.regex
Hi,
How to remove the all the special characters in a String[] using regex, i have the following:-
public class RegExpTest {
private static String removeSplCharactersForNumber(String[] number) {
String number= null;
Matcher m = null;
Pattern p = Pattern.compile("\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)\\_\\+\\-\\{\\}\\|\\;\\\\\\'////\\,\\.\\?\\<\\>\\[\\]");
for (int i = 0; i < number.length; i++) {
m = p.matcher(number);
if (m.find()) {
number= m.replaceAll("");
System.out.println("Final Number is:::"+number);
return number;
public static void main(String args[]){
String[] str = {"raghav!@#$%^&*()_+"};
RegExpTest regExpTest = new RegExpTest();
regExpTest.removeSplCharactersForNumber(str);
This code is not working and m.find() is "false", here i want the output to be raghav for the entered string array, not only that it should remove all the special characters for a entered string[]. Is there a simple way to do this to remove all the special characters for a given string[]? More importantly the "spaces" (treated as a spl. character), should be removed as well. Please do provide a solution to this.
ThanksYou don't need the find(). Just use the replaceAll() on each element of the String[] i.e.
String[] values = ...
for (int i = 0; i < values.length; i++)
values[i] = p.matcher(values).replaceAll("");
}I can't understand your regex since the forum software has mangled it but you just need to add a space to the set of chars to remove. When you post code, surround it with CODE tags then the forum software won't mangle it. -
Doubt in Regular Expressions : java.util.regex
I want to identify words starting with capital letters in a sentence and I want to replace the matched word with "#" added in front of it.... For example, if my input sentence is
"Christopher Williams asked Mike Muller a question"
my output should be,
"#Christopher #Williams asked #Mike #Muller a question"
How do I do that using java.util.regex ?
In perl we can can use *"back references"* in *"replacement string"* . Perl replacement accepts back references whereas java replacement method accepts only strings....
Please help me.....Your replacement is swallowing the space before the uppercase character, and won't match at the beginning of the line.
Also, it's unnecesarily verbose. String has a replaceAll method (that calls the same methods of Pattern and Matcher under the covers)sentence = s.replaceAll("(^| )([A-Z])", "$1#$2");Disclaimer: I'm no prome, sabre or u/a :-) That can probably be simplified.
db -
Ignore word (Java.util.regex )
Hello All,
Can anyone help me to solve this probelm:
Probelm: I have a text file and i want to search a word or combination of words in that using java.util.regex
Example : in the sentence "things like the Forestry in the Commission (FC)." i want to search "Forestry Commission" while ignoring the word "in the". This ignoring criteria is specific i.e. search return true only if it ignore "in the" word not any other word.
Also how i ignore multiple words in ignore condition.
Thanks in advance.Try out this line of code:
System.out.println("Forestry in the Commission".replaceAll("Forestry\\s(.*?\\s)?Commission", "Hello"));In EBNF, it looks like this:
match ::= "Forestry" <whitespace> [<character> <whitespace>] "Commission"
whitespace ::= <space> | \t | \n | \x0B | \f | \r
character ::= (any one character)
This, of course, is an ambiguous EBNF definition. The breakdown of the expression, however, reveals why this works. In the string, "\\s" refers to a character of whitespace. "(.*?\\s)?" is where the magic happens: it causes ".*?\\s" to happen either not at all or once. ".*?" will consume as few "." (any character) as possible to make the match, and the following "\\s" is to make sure that strings like "Forestry deCommission" don't match. The EBNF's ambiguity comes from EBNF's lack of ability to describe "reluctant qualifiers": qualifiers that indicate that as few of the given expression should be matched as possible.
Cheers! -
Please help on java.util.regex.*
Hi all,
My RTF file looks like this:
Project Num\tab N/A
\par Project Name\tab Hook-up Installation and Service
\par
My intension is to read the file until the \tab and store Project Num as a string into a
variable. Similarly read until \par and store the value of Project Num into another variable.
So that i can use those variables further in my program.
I used java.util.regex.* package for this purpose. I could successfully split the sentence whenever it sees \tab and \par but don't know how to get the text before and after the delimeters and store them into variables.
The code which i wrote is:
import java.util.regex.*;
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;
public class RegexDemo{
public static void main(String[] args){
// Create a pattern to match breaks
Pattern p = Pattern.compile("\\\\.[a-z][a-z]",Pattern.DOTALL);
try{
File file = new File("sample.rtf");
FileInputStream fis = new FileInputStream(file);
FileChannel fc = fis.getChannel();
// Get a CharBuffer from the source file
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int)fc.size());
Charset cs = Charset.forName("8859_1");
CharsetDecoder cd = cs.newDecoder();
CharBuffer cb = cd.decode(bb);
// Run some matches
Matcher m = p.matcher(cb);
while (m.find())
System.out.println("Found comment: "+m.group());
}catch(Exception e){
e.printStackTrace();
Please somebody help me in this regard. I have spent lot of time searching the forums but couldn't find any solution.
Thanks in advance
rnalluJust put target inside parenthesis with delimiters at boundaries.
Example: "(\\w+)\\t(\\d)\\s" will match occurrences of a word followed with a tab char then a digit followed with a whitespace. If target string matches pattern then m.group(1) contains the word and m.group(2) contains the digit. -
Who use sql-mapping with java.util.regex?
Hi everyone:
I use the IBatis SQL-Mapping and I think it is very good.Now I want to add the search function to my BBS forum.I also want to display the content high light like jive.I mean that if I want to find the string "ibatis",then the search result "ibatis" will be high light displayed.
So I must use the java.util.regex in jsdk1.4.But the problem is that what I get is a List if I use sql-mapping.For example:
String resource="conf/XML/sql/lyo-sql-map.xml";
Reader reader=Resources.getResourceAsReader(resource);
sqlmap=XmlSqlMapBuilder.buildSqlMap(reader);
List articlelist=sqlmap.executeQueryForList("selectSiteArticle","%"+icontent+"%");
The result I get is a List and I have no time to use regex.
I don't know whether I could do this:
Iterate the List,use the regex and later place all the object back to the List.
It's right?
How to use regex with sql-mapping?ThksAny idea? :(
-
RFC used for java.util.regex
Hi,
Does anyone know the RFC used for java.util.regex ??
Thanks & Regards,
Gurushant HanchinalCan you please give me the link to view to specifications for java.util.regex.. I have tried the link which is available in :
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html page with name " Java Language Specification"
on click of this link, i am getting page not found error..
Please give me any other alternate links to view the regular expression specifications..
Thanks,
Gurushant Hanchinal -
Regular expressions with java.util.regex
Hello Guys,
I wrote last time this
* Uses split to break up a string of input separated by
* commas and/or whitespace.
* See: http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/
* Change: I have slightly modified the source
import java.util.regex.*;
public class Splitter {
public static void main(String[] args) throws Exception {
// Create a pattern to match breaks
Pattern p = Pattern.compile("[<>\\s]+");
// Split input with the pattern
String[] result =
p.split("<element attributname1 = \"attributwert1\" attributname2 = \"attributwert2\">");
for (int i=0; i<result.length; i++)
if (result.equals(""))
System.out.println("EMPTY");
else
System.out.println(result[i]);
int res = result.length - 1;
System.out.println("\nStringlaenge: " + res);
I wonder, why I got an empty element in reult[0]. Have anyone an idea?
We'll come together next time
... �nhan Inay ([email protected])What is wrong with this Pattern?
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
This time i used following Split:
p.split("<element attributname1=\"attributwert1\" attributname2=\"attributwert2\">");
I've got a compilation error:
U:\qms_neu\htdocs\inay\Source\myWork\Regex-Samples>javac Splitter.java
Splitter.java:14: illegal start of expression
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: illegal character: \92
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: illegal character: \92
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:14: unclosed string literal
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:17: ')' expected
p.split("<element attributname1=\"attributwert1\" attributname2
=\"attributwert2\">");
^
Splitter.java:14: unexpected type
required: variable
found : value
Pattern p = Pattern.compile("^<[a-zA-Z0-9_\\"=]+[\\s]*$>");
^
Splitter.java:18: cannot resolve symbol
symbol : variable result
location: class Splitter
for (int i=0; i<result.length; i++)
^
Splitter.java:19: cannot resolve symbol
symbol : variable result
location: class Splitter
if (result.equals("")){
^
Splitter.java:21: cannot resolve symbol
symbol : variable result
location: class Splitter
System.out.println(result[0]);
^
Splitter.java:24: cannot resolve symbol
symbol : variable result
location: class Splitter
System.out.println(result[i]);
^
Splitter.java:25: cannot resolve symbol
symbol : variable result
location: class Splitter
int res = result.length - 1;
^
11 errors -
Regular Expressions (java.util.regex)
I am developing using a product that must
use java 1.2.2_05a but I want to use regular
expressions, does anybody know where I can
get of the package java.util.regex without
having to download the whole java 1.4 release.
Or does someone know of an alternative that
I can use ?There is another regex pack for java available from Apache Foundation Project. You can try it.
Take a look at http://jakarta.apache.org/ -
Hi,
Jdev 11.1.1.0.31.51.56
If somebody of you get the following trace stack when running a jspx using ViewCriteriaRow.setOperator :
There is bug 7534359 and metalink note 747353.1 available.
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
at java.util.regex.Matcher.reset(Matcher.java:291)
at java.util.regex.Matcher.<init>(Matcher.java:211)
at java.util.regex.Pattern.matcher(Pattern.java:888)
at oracle.adfinternal.view.faces.model.binding.FacesCtrlSearchBinding._loadFilter
CriteriaValues(FacesCtrlSearchBinding.java:3695)
Truncated. see log file for complete stacktrace
Workaround:
If you use
vcr.setAttribute("Job",job);
or
vcr.setAttribute("Job","="+job);
than add following line of code:
vcr.setOperator("Job","="); regards
PeterHi,
useful to mention that this happens when setting the equal operator or LIKE operator
vcr.setAttribute("Job","= '"+job+"'");
or
vcr.setOperator("Job","=");
Frank -
About the error of java.util.regex in jdk1.4's docs
In java.util.regex,the class Pattern's document says:
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X X, one or more times
X{n} X, exactly n times
X(n,} X, at least n times
X{n,m} X, at least n but not more than m times
Why don't metion �X+�?
I think that should be "X+ X, one or more times",right?Agreed. I use Regex in many places (and used
Oromatcher before 1.4), and I've verified that I
use the + operator in several places-- it works. -
Problem in Creating a jar file using java.util.jar and deploying in jboss 4
Dear Techies,
I am facing this peculiar problem. I am creating a jar file programmatically using java.util.jar api. The jar file is created but Jboss AS is unable to deploy this jar file. I have also tested that my created jar file contains the same files. When I create a jar file from the command using jar -cvf command, Jboss is able to deploy. I am sending the code , please review it and let me know the problem. I badly require your help. I am unable to proceeed in this regard. Please help me.
package com.rrs.corona.solutionsacceleratorstudio.solutionadapter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.jar.JarEntry;
import java.util.jar.JarOutputStream;
import java.util.jar.Manifest;
import com.rrs.corona.solutionsacceleratorstudio.SASConstants;
* @author Piku Mishra
public class JarCreation
* File object
File file;
* JarOutputStream object to create a jar file
JarOutputStream jarOutput ;
* File of the generated jar file
String jarFileName = "rrs.jar";
*To create a Manifest.mf file
Manifest manifest = null;
//Attributes atr = null;
* Default Constructor to specify the path and
* name of the jar file
* @param destnPath of type String denoting the path of the generated jar file
public JarCreation(String destnPath)
{//This constructor initializes the destination path and file name of the jar file
try
manifest = new Manifest();
jarOutput = new JarOutputStream(new FileOutputStream(destnPath+"/"+jarFileName),manifest);
catch(Exception e)
e.printStackTrace();
public JarCreation()
* This method is used to obtain the list of files present in a
* directory
* @param path of type String specifying the path of directory containing the files
* @return the list of files from a particular directory
public File[] getFiles(String path)
{//This method is used to obtain the list of files in a directory
try
file = new File(path);
catch(Exception e)
e.printStackTrace();
return file.listFiles();
* This method is used to create a jar file from a directory
* @param path of type String specifying the directory to make jar
public void createJar(String path)
{//This method is used to create a jar file from
// a directory. If the directory contains several nested directory
//it will work.
try
byte[] buff = new byte[2048];
File[] fileList = getFiles(path);
for(int i=0;i<fileList.length;i++)
if(fileList.isDirectory())
createJar(fileList[i].getAbsolutePath());//Recusive method to get the files
else
FileInputStream fin = new FileInputStream(fileList[i]);
String temp = fileList[i].getAbsolutePath();
String subTemp = temp.substring(temp.indexOf("bin")+4,temp.length());
// System.out.println( subTemp+":"+fin.getChannel().size());
jarOutput.putNextEntry(new JarEntry(subTemp));
int len ;
while((len=fin.read(buff))>0)
jarOutput.write(buff,0,len);
fin.close();
catch( Exception e )
e.printStackTrace();
* Method used to close the object for JarOutputStream
public void close()
{//This method is used to close the
//JarOutputStream
try
jarOutput.flush();
jarOutput.close();
catch(Exception e)
e.printStackTrace();
public static void main( String[] args )
JarCreation jarCreate = new JarCreation("destnation path where jar file will be created /");
jarCreate.createJar("put your source directory");
jarCreate.close();Hi,
I have gone through your code and the problem is that when you create jar it takes a complete path address (which is called using getAbsolutePath ) (when you extract you see the path; C:\..\...\..\ )
You need to truncate this complete path and take only the path address where your files are stored and the problem must be solved. -
Java.util.regex error
Hello,
I checked JavaDoc multiple times but do not see what is wrong with
myString.replaceAll("D:\\web\\mars","")which results in
java.util.regex.PatternSyntaxException: Illegal/unsupported escape squence near index 7
D:\web\mars
^
at java.util.regex.Pattern.error(Unknown Source)
at java.util.regex.Pattern.escape(Unknown Source)
at java.util.regex.Pattern.atom(Unknown Source)
at java.util.regex.Pattern.sequence(Unknown Source)
at java.util.regex.Pattern.expr(Unknown Source)
at java.util.regex.Pattern.compile(Unknown Source)
at java.util.regex.Pattern.<init>(Unknown Source)
at java.util.regex.Pattern.compile(Unknown Source)
at java.lang.String.replaceAll(Unknown Source)
at ArticleImageImportProcessor.main(ArticleImageImportProcessor.java:40)
Exception in thread "main" please, every suggestion/hint is most appeciatedYou have to "encode" backslash twice, first for String purpose and second time because of special meaning of '\' in regular expressions.
It should looks like
myString.replaceAll("D:\\\\web\\\\mars","") -
How do I estimate time takes to Zip/Unzip using java.util.zip ?
If I am compressing files or uncompressing zips using java.util.zip is there a way for me to tell the user how long it should take or for perhaps displaying a progress monitor ?
For unzip use the ZipInputStream and pass it a CountingInputStream that keeps track ofr the number of bytes read from it (you write this). This CountingInputStream extends fileInputStream and as such can provide you with information about the total number of bytes available to be read and the number already read. This can give a crude idea of how much work has been done and how much work still needs to be done. It is inaccurate but should be good enoough for a progress bar
As for zipping use the ZipOutputStream and pass it blocks of information. Keep track of the number of blocks you have written and the number you still need to write.
matfud
Maybe you are looking for
-
Issue displaying repeated fragments
Hello, I ran across an issue where I have a form created with LiveCycle designer that has multiple insertion points with the same name. In those insertion points the same fragment is to be inserted. The fragment is being inserted in all of the insert
-
Just switched from pearl to 8330. When I loaded contacts from outlook, it formatted the phone numbers in 8330 as 0000000000. Format for my outlook and pearl is (000) 000-0000. Is there a way to get the number to format the same as my outlook as my pe
-
Availability of Std Report Delivery Schedule Vs GR
Dear Friends, Request to find any standard report for delivery schedule vs GR . Regards, ASK
-
Is it possible to read a sram with the DAQCard 6533 ? If you've already done it, would you please send to me exemples of programs and you advises about how to proceed. my e mail is [email protected] Thank you for helping a trainee... Bertrand Jacquet
-
Basically I want to let someone use my desktop computer at home without that person being able to pry into my Google email, Firefox bookmarks, etc. I want to leave computer on because I want remote access when I travel. If you have a solution that do