Regular expression - find repeating +++ signs
I'm trying to use regular expressions to remove duplicate +++ signs in a string. When I test my pattern using the expresso test (www.ultrapico.com) it parses the string correctly, in Java 1.5 it doesn't work. .. mp.matches() is always false. Any suggestions would be appreciated.
finalLongstring = "TTL1,clip1+TTL2+++clip3,TTL4,clip4,TTL5,clip5+TTL6+clip6+TTL7+clip7,TTL8,clip8,TTL9,clip9,TTL10,clip10,TTL11,clip11,TTL12,clip12,TTL13,clip13+TTL14+clip14,TTL15,clip15,TTL16,clip16,TTL17,clip17,TTL18,clip18,TTL19,clip19,TTL20,clip20,TTL21,clip21,TTL22,clip22,TTL23,clip23,TTL24,clip24,TTL25,clip25,TTL26,clip26,TTL27,clip27,TTL28,clip28,TTL29,clip29";
Pattern multiplePunctuation=null;
multiplePunctuation=Pattern.compile("[,+]{2,6}");
// | |
// | 2 or more times
// a comma or plus character
Matcher mp=multiplePunctuation.matcher(finalLongstring);
if(mp.matches()){
finalLongstring=mp.replaceAll("+");
/code]
Answere in your other thread.
http://forum.java.sun.com/thread.jspa?threadID=5143654
Similar Messages
-
Java – Regular Expressions – Finding any non digit byte in a multiple byte
Hello,
I’m new to JAVA and Regular Expressions; I’m trying to write a regular expression that will find any records that contain a non digit byte in a multiple byte field.
I thought the following was the correct expression but it is only finding records that contain “all” non digit bytes.
\D{1,}
\D = Non Digit
{1,} = at least 1 or more
Below is my sample data. I would like the regular expression to find all of the records that are not all numeric. However when I use the regular expression \D{1,} it is only finding the 2 records that all bytes are non digits. (i.e. “ “ and “A “)
“ 111229”
“2 111229”
“20091229”
“200912c9”
“201#1229”
“20101229”
“20110229”
“20111*29”
“20111029”
“20111229”
“20B11229”
“A “
“A0111229”
Please note I have also tried \D{1,}+ and \D{1,}? And they also do not return my desired results
Any assistance someone can provide would be greatly appreciated.You don't show the code you are using but I surmise you are using String.matches() which requires that the whole target must match the regular expression not just part of it. Instead you should create a Pattern and then a Matcher and use the Matcher.find() method. Check the Javadoc for Pattern and Matcher and look at the Java regex tutorial - http://docs.oracle.com/javase/tutorial/essential/regex/ .
P.S. You can re-use the Pattern object - you don't have to create it every time you need one.
P.P.S. Java regular expressions work with characters not bytes and characters are not not not bytes. -
Regular Expressions find and replace
Hi ,
I have a question on using Regular Expressions in Java(java.util.regex).
Problem Description:
I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
I have to do it inside a servlet and hence have to use java.
I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
//Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
String ddurl="http://www.google.com/";
String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
Matcher m = p.matcher (strHTML);
while(m.find())
m.replaceAll(ddurl+m.group(1));
what is wrong in this?
Thanks,
RajivRight, here's the full monte (whatever that means):import java.util.regex.*;
public class Test1
public static void main(String[] args)
String domain = "http://www.google.com/";
String strHTML =
" < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
" <img alt=\"Google logo\" src=images/logo.gif >\n" +
" <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
" <img alt=\"Google logo\" src=../images/logo.gif>\n" +
" <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
String regex =
"(<\\s*img.+?src\\s*=\\s*) # Capture preliminaries in $1. \n" +
"(?: # First look for URL in quotes. \n" +
" ([\"\']) # Capture open quote in $2. \n" +
" (?!http:) # If it isn't absolute... \n" +
" /?(.+?) # ...capture URL in $3 \n" +
" \\2 # Match the closing quote \n" +
" | # Look for non-quoted URL. \n" +
" (?!http:) # If it isn't absolute... \n" +
" /?([^\\s>]+) # ...capture URL in $4 \n" +
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
Matcher m = p.matcher(strHTML);
StringBuffer sbuf = new StringBuffer();
while (m.find())
String relURL = m.group(3) != null ? m.group(3) : m.group(4);
m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
m.appendTail(sbuf);
System.out.println(sbuf.toString());
}First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did). -
Regular expression - find if string does NOT contain text....
I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
joe schmoe<=jack, jane
should become
joe
schmoe
<=
jack
jane
As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Here's the code I plan on using for this:
public String[] getWords(String input) {
Matcher matcher = WORD_PATTERN.matcher(input);
ArrayList<String> words = new ArrayList<String>();
while (matcher.find()) {
words.add(matcher.group());
return (String[]) words.toArray(new String[0]);
}The trick, though, is coming up with a working regular expression. The closest I've found yet is:
([^\s]|^(,)|^(<=))+|,|<=
but that produces the following:
joe
schmoe<=jack,
jane
I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?Try:
* Tokenizer.java
* version 1.0
* 01/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class StrangeTokenizer {
public static void main(String[] args) {
String text = "joe schmoe<=jack, jane";
Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
Matcher matcher = pattern.matcher(text);
while( matcher.find() ) {
System.out.println( "Item: " + matcher.group(1) );
}May the code be with you. -
Regular Expression - find double hyphens only
I am wondering if there's a way to write a regular expression to find double hyphens and change them to single hyphens. The catch is that some of the text I'm searching through have multiple hyphens.
Example:
str1 = "Here is my sample text with double -- and I would like to replace this with one hyphen."
str2 = "Here is another sample with multiple hyphens ----- that I do not want to change but leave as is."
Is there a way to change only str1 to a single hyphen and keep str2 as is?You are correct. I should have been more explict. Here are some real examples. I hope this helps.
Helps what?
Have you tried to write the regex yourself, at least?
Adam -
Regular Expression Find and Replace with Wildcards
Hi!
For the world of me, I can't figure out the right way to do this.
I basically have a list of last names, first names. I want the last name to have a different css style than the first name.
So this is what I have now:
<b>AAGAARD, TODD, S.</b><br>
<b>AAMOT, KARI,</b> <br>
<b>AARON, MARJORIE, C. </b> <br>
and this is what I need to have:
<span class="LastName">AAGAARD</span> <span class="FirstName">, TODD, S. </span> <br />
<span class="LastName">AAMOT</span> <span class="FirstName">, KARI,</span> <br/>
<span class="LastName">AARON</span> <span class="FirstName">, MARJORIE, C.</span> <br/>
Any ideas?
Thanks!Make a backup first.
In the Find field use:
<b>(\w+),\s+([^<]+)<\/b>\s*<br>
In the Replace field use:
<span class="LastName">$1</span> <span classs="FirstName">$2</span><br />
Select Use regular expression. Light the blue touch paper, and click Replace All. -
Regular expressions: find files with exactly 'n' digits in a row
Hi there,
I want to filter files that contain only a fixed number of digits, but not more (at least not in after the digits).
For example, I have
01.mp3
02.mp3
test10.txt
test000110101010.txt
04.flac
and for n=2 I want to get all files except 'test000110101010.txt'.
The following is not working, and I'm a total newb regarding regular expressions
ls -l | grep '^-' | awk '{print $9}' | grep '([0-9]\{2\})[^0-9]\{2\}'
Thanks for help.
Regards,
drmThanks!
I wrote a python script to scan e.g. a music folder for missing files and needed to extract the file numbers from the files to get the "highest" number.
You can get it from here: http://pastebin.com/Sg9yDHiw (Python3, expires in 1 month)
Regards,
drm
Edit: found a bug
Last edited by drm00 (2011-02-04 13:57:43) -
REGULAR EXPRESSION FIND PLEASE ;(
HI FORM
I have the following documents with the content
doc1 = "ELECTRONICS DIGITAL CAMERA"
doc2 = "ELECTRONICS DIGITAL CAMERA ACCESSIORIES"
doc3 = "ELECTRONICS DIGITAL CAMERA OPTICS "
Using regexpression I would like to get only 2nd document ONLY which has the content
"ELECTRONICS DIGITAL CAMERA ACCESSIORIES"
How to Achieve this
KarthikYou can try this one: ((digital|camera|accessories)[\\s]*)+
Explainations:
(digital|camera|accessories) - this group matches any of the 3 words
[\\s] matches a space character [\\s]* matches any number of spaces
((digital|camera|accessories)[\\s]*) - this group matches any of the 3 words, optionnally followed by spaces
((digital|camera|accessories)[\\s]*)+ - matches any sequence of 3 words, seperated by spaces
NOTES:
- this regular expression matches "digitalcameraaccessories" because the * operator accepts 0 occurences.
If you want to avoid this situation, change the * to a+, but you will have to append a space to the searched string
in order to make the pattern match.
- this regular expression will also match "digital digital camera" because there is no unicity checking.
Hope this helped,
Regards. -
Regular Expression + Find and Replace
Hey there- I have a question about regExp and the Find and Replace. Basically I want to search a wildcard between a href tag, how would that look, because the code below does not work.
countryLink = "<a href=\"http://www.whateve.com\" target=\"_parent\">";
[code]
countryLink = "([^"]*)";
[code]
Thanks! Any help is appreciated!
Also, how do i add code blocks to this forum?Yes, I meant the <a> tag, but thank you for displaying the href attribute solution as well. This solved my issue. Thanks! Thought I would display what I did with your code incase someone was interested in using this code to convert a javascript string to XML.
query this:
countryLink = "<a href=\"http://www.whateve.com\" target=\"_blank\">";
add this to the Find box:
countryLink = "<a href=\\"([-\w:/.?=&;]+)\\" target=\\"_parent\\">";
add this to the Replace box:
<countryLink>$1</countryLink>
creates an output of this:
<countryLink>http://www.whateve.com</countryLink> -
Regular Expression Capturing Repeated Patterns
I have a string pattern that looks like "PartList ABC1 TO ABC20" and some times it looks like "PartList ABC1 TO ABC20 AND XYZ1 to XYZ15". The string indicates a part type and the range of part numbers in the list.
I am interested in capturing the type and the range of numbers. I created a regex pattern that allows me to handle the list when only one type is listed. My problem is that I don't know how to build the pattern to handle the list when two part types
are listed. Can someone help?
The pattern I have is:
"(?<Min>\d+)(\s)?TO(\s)?(?<PartType>([A-Z]+[-]?))?(?<Max>\d+)"I went completely wrong about it. I need to make sure that the string is either in the form
1- Part List (or whatever) followed by "ABC1 TO ABC20"
or
2- "Part List (or whatever) followed by "ABC1 TO ABC20 AND XYZ1 TO XYZ15"
First I need to validate it making sure it conforms to either forms, then I need to capture the part type and the range of part numbers for either form.
I think that I need to make sure that the string contains "([A-Z]+\d+)(\sTO\s)([A-Z]+\d+)" one or two times.
If I use the pattern "\sTO\s" as a central point I could do this
"(?<Min>(\d+(?=\sTO\s)))" to get the Min
"((?<=\sTO\s)(?<PartType>([A-Z]+)))" to get the part type
"((?<=\sTO\s[A-Z]+[-]?)(?<Max>\d+))" to get the Max
Now I don't know how to combine these patterns to work together to validate the string and capture the groups for either form of the string. Please help. I am going nuts with this. -
Dilimit the regular expression
How could I delimit the regular expression with the pound sign (#) and then use a single quote within the expression.
Can someone give some explamples.
As the regular expressions are enclosed within quotes, and I have a single quote in string also how would i be able to specify it by dilimiting the regular expression with a # sign.
Thanks
Edited by: LostWorld on Sep 15, 2010 5:40 AMHi,
Not sure I understand the issue you're facing, but I think using q quoting might help:
SQL> with t as (
2 select 'abcd ''1234''' str from dual)
3 -- end of sample data
4 select str, regexp_substr(str, q'#'1234'#') str from t;
STR STR
abcd '1234' '1234'The # symbol now encloses the string in the example above.
You can refer to the docs for more infor, in the link below:
http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements003.htm -
ReplaceAll string by regular expression not work for this case.
I will delete all tag and want "pure text" but the output is delete all.
String content = "<aaa>pure text<fff>";
content = content.replaceAll("<.*>","");Content has output is blank because reqular expression match from begin and end of string
But when i change
String content = "<aaa>pure text<fff";
content = content.replaceAll("<.*>","");The output is ==> pure text<fff
How make req match in sequential
Please lead me to solutionpeterdog1234 wrote:
Thank you very much.
I know '?' is a Quantifiers.
I do not understand using ?
Please lead me againSee the paragraph "Laziness Instead of Greediness" from [http://www.regular-expressions.info/repeat.html]. -
NIRG LabVIEW regular expression for covering multiple requirements
The Word document type in NI Requirements Gateway allows for comma separating the requirements in a Reference / coverage statement. I would like to do the same within my LabVIEW code, but the type does not have the same Sub regular expression field available. Is there any way to have a LabVIEW regular expression find coverage statements such as the following:
[Covers: REQ-5, REQ-9, REQ-15]
currently within LabVIEW comments I have to have 3 separate [Covers: REQ-5] type statementscdweiss,
I'm very interested to know if you have any other feedback on NI Requirements Gateway. I'd also be curious to know what products are you're using with it and how extensive your requirements are.
Feel free to email me directly at [email protected]
Cheers,
Eli
Message Edited by Elijah K on 01-19-2010 11:40 PM
Elijah Kerry
Senior Product Manager, LabVIEW
Follow my Software Engineering for LabVIEW Blog -
Find/replace and regular expression problem
Hello, i'm using find and replace with a regular expression
for the first time. I have it checkmarked and it's finding my text
but it's missing (not highlighting) the ')' at the end of the line.
Here's my code:
[($[0-9]+<font size="-2">US</font>)]
it's supposed to find everything inside the square brackets -
but it misses the closing parenthesis after </font>. I need
to find this string and replace with nothing to remove the string
from any/all pages. Is there a reason why it's missing the closing
parenthesis? I was actually able to add a few more parenthesis
(e.g. "))))") before OR after the closing square bracket and it
still found the original text minus the closing bracket and the
extra parenthesis didn't prevent the text from being found.
Any help is appreciated!
James...WyattEA wrote:
> Hello, i'm using find and replace with a regular express
for the first time. I
> have it checkmarked and it's finding my text but it's
missing (not
> highlighting) the ')' at the end of the line. Here's my
code:
>
> [($[0-9]+<font size="-2">US</font>)]
That's not how square brackets work
Try:
\(\$\d+<font size="-2">US</font>\)
A left parens, followed by the dollar sign, followed by at
least one
digit, followed by <font size="-2">US</font>,
followed by a right parens.
Mick
>
> it's supposed to find everything inside the square
brackets - but it misses
> the closing parenthesis after </font>. I need to
find this string and replace
> with nothing to remove the string from any/all pages. Is
there a reason why
> it's missing the closing parenthesis? I was actually
able to add a few more
> parenthesis (e.g. "))))") before OR after the closing
square bracket and it
> still found the original text minus the closing bracket
and the extra
> parenthesis didn't prevent the text from being found.
>
> Any help is appreciated!
>
> James...
> -
Help with regular expression to find a pattern in clob
can someone help me writing a regular expression to query a clob that containts xml type data?
query to find multiple occurrences of a variable string (i.e <EMPID-XX> - XX can be any number). If <EMPID-01> appears twice in the clob i want the result as EMPID-01,2 and if EMPID-02 appears 4 times i want the result as EMPID-02,4.with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || to_char(ids) || '(' || to_char(count(*)) || ')' multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
from ofx_clob
connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(2)
with
ofx_clob as
(select q'~
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123457
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>2
< UNQID>123456
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
<EMPID>1
< UNQID>123458
< TIMESTAMP>...
< ADDRINFO>
< TITLE>^@~*
< FIRST>ABCD
< MI>
< LAST>EFGH
< ADDR1>ADDR1
< ADDR2>^@~*
< CITY>CITY
~' ofx from dual
select '<EMPID>' || listagg(to_char(ids) || '(' || to_char(count(*)) || ')',',') within group (order by ids) multi_empid
from (select replace(regexp_substr(ofx,'<EMPID>\d*',1,level),'<EMPID>') ids
from ofx_clob
connect by level <= regexp_count(ofx,'<EMPID>')
group by ids having count(*) > 1
MULTI_EMPID
<EMPID>1(3),2(2)
Regards
Etbin
Message was edited by: Etbin
used listagg to report more than one multiple <EMPID>
Maybe you are looking for
-
Hi there, I'm trying to download Shockwave (FULL-Other Browsers) - I have also tried the Lite, on Windows 7 Professional 32 Bit from http://get.adobe.com/shockwave. However the download doesn't auto-start so I click the link that says "Click here to
-
I have a Pages document with a header that contains both text and a line (a shape, inline) below the text. When I make the document two-sided (facing pages), with a 1" inside margin and a .75" outside margin, the line offsets either left or right in
-
System Administrator responsibility not working in E-business suite
Hi, Im new to Oracle Apps. In my company none of the funcion prompts(like custom, requests, set, conflict domains) under system administrator responsibilty in the ebusiness suite are working..If i click on any prompt the browser gives a popup saying
-
Size of table of report run in Web Analyzer
Hello, When I open a query in web (just by default link in a role, by clicking in "run in web" from query designer etc.) it opens in a table, which is very small. The table cover less than 25% of my screen, the rest is blank. It presents only a few c
-
Good book for Oracle 9i Performance Tuning
Hi Can anybody suggest good book in Oracle 9i performance Tuning (All the Tuning methods and I/O, tuning Memeory Tuning .......) I done my OCP 9i and I worked as Junior DBA and now I want to concentrate only on Tuning. Thanks Venkataragavan.S