Regular Expression - replaceAll() - how to replace words?

Hiya,
I have this regex to replace all instances of myWord:
String oldWord = "oldWord";
String newWord = "newWord";
String sentence = "some sentence that contains " + oldWord;
String newSentence = replaceWordsInSentence(sentence, oldWord, newWord);
private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
return sentence.replaceAll("\b" + oldWord + "\b", newWord);
}...it works in most instances, but when oldWord is at the end of the sentence it is not replaced. Presumably the problem is that "/b" is not a sufficient word boundary. Can someone help me out with the correct regular expression code?
Thanks,
James

Mel, you did appear to misunderstand as you thought points 2 and 3 were alternatives, but you now recognise that they are additional "shoulds".
Of course, I applied the extra backslash as soon as Joachim advised. Maybe you don't agree with my rationale, but I prefer the complete solution that will work in all instances... so was simply waiting for him to post a code example that included the latter 2 points as (although I understood the point of them perfectly) I was not sure how to implement them.
Have come up with the following, expanded, method...
    private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
        return sentence.replaceAll("\\b" + Pattern.quote(oldWord) + "\\b", Matcher.quoteReplacement(newWord));
    }...works fine with the tests I have run. Joachim, can you confirm this is correct.

Similar Messages

Regular expression and pattern matching/replacing

I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll(replacementStr);
But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
Any suggestions are most welcome.

I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

Regular Expression back-reference Find/Replace in SQL Developer 4.0

I cannot seem to get the Search/Replace functionality to use back-references with Regular Expressions in 4.0. This worked fine in 3.0.
Text in my editor:
abc
Example:
Find: (a)(bc)
Replace:
\1\1\1\2
Should result in:
aaabc
Instead I get:
\1\1\1\2
This still isn't fixed in 4.0.2 - I'll try to find time to put in MOS as a bug...

You know what, I think I'm logging in a sys on the command line on the sqlplus side of the house, AHA!
Wow, I must be tired....LOL...thanks

Using regular expressions to find and replace code.

Hi! Semi-newbie coder here.
I'm trying to strip out code from multiple pages, I've tried regular expressions but I'm struggling to understand them. I also need to do it across a LOT of pages, so I need an automated way of doing it.
The best way I can explain is with an analogy:
I want to delete any string of characters that start with c, ends with t and includes anything inbetween, so it would pick up "cat, cut, chat, coconut, can do it" whatever appears in the middle of those.
Except, instead of c and t, I want it to find strings of code starting with <div class="advert" and ending with Vote<br> while picking up everything in between, (including spaces, code, comments, etc.). Then, deletes that whole string including the starting and ending.Is there a regular expression I could use in dreamweaver that could do this? Is there a way to do this at all?

Let me begin by saying I'm a complete idiot with DW's Reg Ex. I use Search Specific Tag whenever possible. See screenshot below.
Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
Good luck,
Nancy O.

Regular expression to search if a word occurs with z and /z tags

Hi ,
I am trying to create a regex to search for the occurence of two words within <z> and </z> tags.(they must occur between <z> and and next immideate </z> tags)
This is my regex
<z>\s[\w\W]+?(?!</z>)word1[\w\W]+?(?!</z>)word2[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)word2[\w\W]+?(?!</z>)word1[\w\W]+?</z>
I am trying to specify (?!</z>) in order that i insist that my regex engine does map for word1 and word2 within <z> and its next immideate </z> tags. The words can appear either ways word1 followed by word2 or vice versa.
The above regex does not work fine.
It maps fine for the following sentence
<z> This is test for pattern for a Regex </z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
The regex is as follows
<z>\s[\w\W]+?(?!</z>)pattern[\w\W]+?(?!</z>)Regex[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)Regex[\w\W]+?(?!</z>)pattern[\w\W]+?</z>
But, when i include the </z> in between pattern and Regex , it should not match, but that is not what is happening.
<z> This is test for pattern </z> for a Regex</z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
Please let me know how I can accomplish the same.
Thanks.

oops.. sorry ..this is aligned better ...
Hi , I am using this regex
(?=(?:(?!</z>).)+?\bpattern\b.+</z>)(?!(?:(?!</z>).)+?\bscientific\b.+</z>)((\bpattern\b(?:(?!</z>).)+)\bpattern\b|\bpattern\b)I have written the above regex to match pattern between <z> and </z> provided there is no word "scientific" within that "z" tags.
I have been trying to replace the regex to do the same between two sentences, here in my case I have a paragraph with multiple sentences. The delimiter to determine a sentence is dot (.). So i am trying to specify the condition as above to match between two sentences - using representation for dot as (\. )
The following is my regex
(?=(?:(?!\.).)+?\bpattern\b.+\.)(?!(?:(?!\.).)+?\bscientific\b.+\.)((\bpattern\b(?:(?!\.).)+)\bpattern\b|\bpattern\b) But this does not work..
Can you please tell me how to go about this?
Thanks.

Regular Expression & LEVEL - how to split attribut value

Hi Folks;
I have to transform the value of an attibut Attr_A (of table A) in multiple attibut's values in a another table B like that :
Table A
Attr_A = '[only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space][only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space][only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space]etc...
with Operator = 'AND' or '(+),' or '(-),'
exemple Attr_A='L12345 AND T23456 (+), U12345 (-)'
In the result table B, I would have :
- first column egal to 'L12345'
- secund column egal to '1' (position of the first column value in the Attr_A)
- third column egal to 'AND'(the operator between secund column value of table B)
Next record :
- first column egal to 'T23456'
- secund to '2'
- third to '(+),'
etc 'U12345' '3' '(-),'
Thanks for your help ^^
Edited by: Moostiq on 27 avr. 2011 10:54
Edited by: Moostiq on 27 avr. 2011 10:54

Hi,
Whenever you post code or data on this this, please format it and type hese 6 characters:
\(small letters only, inside curly brackets) before and after each section of formatted text. This will keep strings such as (+)
from looking like
(+)
You need to divide attr_a into parts, where each part consists of a 6-character word followed by a space, then followed by an operator. Since the opertors are at least 3-characters long, that means a string of n characters will have (at most) n/10 parts. There may be other text in attr_a (in your example, there was a ',') that will be ignored.
The first sub-query below, cntr, generates the numbers 1, 2, 3, ... up to the greatest possible numbr of parts in any attr_a.
The second sub-query, got_part, extracts each part from attr_a.
The main query parses each part into the columns you want.WITH cntr     AS
     SELECT     LEVEL     AS n
     FROM     (
               SELECT MAX (LENGTH (attr_a))
                    / 10     AS max_parts
               FROM     a
     CONNECT BY     LEVEL     <= max_parts
,     got_part     AS
     SELECT     REGEXP_SUBSTR ( a.attr_a
               , '[A-Z][0-9]{5} +(AND|$\+$|$-$)'
               , 1
               , c.n
               )          AS part
     FROM     a
     JOIN     cntr     c ON c.n <= LENGTH (a.attr_a) / 10
SELECT     REGEXP_SUBSTR (part, '[A-Z][0-9]{5}')     AS column_1
,     REGEXP_SUBSTR (part, '[0-9]')          AS column_2
,     REGEXP_SUBSTR (part, '[^ ]+$')     AS column_3
FROM     got_part
WHERE     part     IS NOT NULL

Grouping & Back-references with regular expressions on Replace Text window

I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
s/$firstPart$ $secondPart$ $oldThirdPart$/\2 \1 newThirdPart/g
which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

duplicate of Grouping & Back-references with regular expressions on Replace Text window

Regular Expression for non-words

hello all!
can you help me construct a regular expression that will match non-word strings say "��". I will be needing this to filter words from a Microsoft Word Document.
Thanx!

hello all!
can you help me construct a regular expression that
will match non-word strings say "��". I will
be needing this to filter words from a Microsoft Word
Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

String Regular Expression for uncommon characters

Hi,
I am trying to get text out of HTML file for which I am using EditorKit and Document classes. After I obtain the text, the text (String) contains some characters like �. This character looks like a with French style acute accent . My problem is how to use regular expression to find and replace (replaceAll method) these unwanted characters.
Is there a regular expression pattern for such characters?
Thanks!
Rahul.

hrm I would recommend looking at the specific patterns,
a simplified site would be here http://www.p3m.org/wiki?regex
as a refernce . If you dont know regular expression, use
http://www.perl.com/doc/manual/html/pod/perlre.html
The only way I could think of constructing the regex is to use the \s and add the characters you want in that regex :s you could look into regex look ahead and look behind methods...

Using regular expressions to get a customized output

Hi,
I have a string/varchar variable with the data ',a,b,c,' in it.
I want the display as follows:
a
b
c
I would like to get the similar output using regular expressions.
How do I get this output using REGEXP_REPLACE or REGEXP_SUBSTR?
Please do the needful.
Thanks & Regards,
Rakshit

I remember that, however if we look closer, that one has a little flaw: The 2nd row should be null, because ",," indicates an empy field. The MODEL clause solution works just fine in this case:
with t as (select 'aaaa,,bbbb,cccc,dddd,eeee,ffff' col1 from dual)
-- end of sample data
SELECT col_new
FROM t
MODEL
   PARTITION BY (ROWNUM rn)
   DIMENSION BY (0 dim)
   MEASURES(col1, col1 col_new)
   RULES ITERATE(99) UNTIL (ITERATION_NUMBER = LENGTH(REGEXP_REPLACE(col1[0], '[^,]')))
                (col_new[ITERATION_NUMBER] = REPLACE(REGEXP_SUBSTR(col1[0], '(^|,)[^,]*', 1, ITERATION_NUMBER+1), ','))
COL_NEW
aaaa
bbbb
cccc
dddd
eeee
ffff
7 Zeilen ausgewählt.Update: I had this nagging feeling that I missed something, and there it was. If you want to see what the problem with my solution is, change the example to
with t as (select ',aaaa,,bbbb,cccc,dddd,eeee,ffff' col1 from dual)So I went back and tried to fix BlueShadows approach. Here it is:
with t as (select 'aaaa,,bbbb,cccc,dddd,eeee,ffff' txt from dual)
-- end of sample data
SELECT REPLACE(REGEXP_SUBSTR(',' || txt, ',[^,]*', 1, level), ',') col_new
FROM t
CONNECT BY level <= length(regexp_replace(txt,'[^,]*'))+1
;C.

Regular Expression Help Please?

Hi
I'm trying to get my head round regular expressions in find
and replace,
it's a slow process for me!
I have this -
<a
href="
http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=http://www.website-ad dress.com"
and I'm trying to change it to this -
<a
href="
http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=http://www.website-ad dress.com&email="
I was trying first of all with a *.*, but couldn't work out
how to tell it
where the code ends?
They are hundred of pages like this, all with different
website-addresses.
After I have changed all the pages to the new code, I then
will need to copy
and paste an different email address to the end of each line,
to each page.
Unless anyone knows a way of automating that?
Hope someone can point me in the right direction?
Many thanks, Craig.

Hi David
Many thanks for all that and the detailed descriptions.
I will be working through it all again tomorrow, so will put
your info to
the test! lol
As for partially building the email addresses, I think that
would be too
much,
as the emails are all over the place, some have their own
domain, other use
hotmail, Yahoo etc.
Some even have they own domain for their website and a free
one for the
email address.
They are all Hotels, B&B' & Cottages etc.
Hopefully all your hard work will help me a step closer to
understanding it
all.
Many thanks again,
Craig.
"David Stiller" <[email protected]> wrote in
message
news:[email protected]...
> Craig,
>
>> You do have that correct David, thanks.
>
> Okay.
Regex is as much an "exact science" as it is an "art
> form" -- which isn't to say I'm a regex artist; I just
love the
> technology -- but I mention this because I made the
following assumption
> in order to keep the pattern relatively simple: your
href values are all
> quoted in either single or double quotes. Such as, for
example, the
> following sample HTML ...
>
> <body>
> <a
> href="
http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.sample.com">asfd< /a>
> <a
> href='
http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.example.net'></a>
> <a
> href="
http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.company.com"></a>
> </body>
>
> In the Find field, enter this pattern ...
>
> (tracker\.pl\?site=.*?)(["'])
>
> ... and in the Replace field, enter this pattern ...
>
> $1&email=ADDRESS$2
>
> Then carefully use your Find Next and Replace buttons to
step through
> your code. The above will add &email=ADDRESS to your
HTML in all the
> right places. I chose that because ADDRESS is easy to
select by double
> clicking, which should facilitate your replacing it.
>
>
> Let's step through the patterns.
>
> (tracker\.pl\?site=[^"']*?)(["'])
>
> This looks for the phrase "tracker.pl?site=" (without
quotes) followed
> immediately by a "non-greedy" match of any character
that isn't a single
> or double quotation mark, followed immediately by either
a single or
> double quotation mark. I took , which I took to be a
safe, short "hook"
> into the string we need. I split this pattern into two
sections, grouped
> by parentheses. This allows us to refer to the first
part of the match
> (everything but the closing quotation mark) as group 1,
and the second
> part (the closing quotatin mark) as group 2. This is
like storying values
> with your calculator's M (memory) button.
>
> $1&email=ADDRESS$2
>
> Here, we refer to group 1 and follow it with the phrase
> "&email=ADDRESS" (without quotes), followed again by
group 2.
>
> Now, in theory, we could use the domain name of each
unique site to at
> least partially build the email address. That would get
you even closer
> to your goal. To do so, I'd need even more detail from
you, such as the
> kinds of domains you have (how many sub domains are
probable, etc.).
>
>
> David
> stiller (at) quip (dot) net
> Dev essays:
http://www.quip.net/blog/
> "Luck is the residue of good design."
>

Allow specific characters - Regular Expression

Hello everyone
I am new to regular expression and I have a very simple question. I use the "read from text file" function to load a Tab delimited file with 3 columns into my VI. Next, the string is converted in array and I use the values.
Nevertheless, I want to develop a "filter" allowing only digits (0-9), colon, comma and point into strings.
Using the "match regular expression" function, I was trying a regular expression like that:
[^0-9]|[^\].[|^:]|[^,]
But it is not working.
Could someone help me with this issue?
Thanks
Dan07
Solved!
Go to Solution.

Hello
Actually I don't need to modify the string that has "invalid" characters, I just need to identify them instead. Find below a VI testing both methods: Match Regular Expression and Search and Replace String.
Using Match Regular Expression method, I got correct results since all the "valid" values must be identified as "-1" and all the "invalid" values must be identified as positive numbers (offset).
Nevertheless, using Search and Replace String method I got wrong results, since all the strings were classified as "valid" (-1), but "bg" and "03/12/2010" are not "valid".
I will go ahead with Match Regular Expression method because it is working great, but I was just wondering how to fix Search and Replace String method to achieve equivalent results.
Thanks
Dan07
Attachments:
Regular Expression_example.vi ‏18 KB

Help: Verify or Suggest a Simple Regular Expression

I'm trying to do a mapping from a title to a file name portion of a URL. Thus the result needs to follow the rules as specified here:
http://labs.apache.org/webarch/uri/rfc/rfc3986.html#unreserved
I identified the following characters as legal: a-z / 0-9 / "-" / "." / "_" / "~"
Everything else has to be converted to an underscore.
I came up with the following expression:
someString.replaceAll("[^a-zA-Z0-9-._~]", "_")Is that correct? I spent a lot of time trying to figure out regular expressions, but it seems like everyone (i.e. PHP, TextPad and now Java) has a slightly different version and to top it off, there not very good tutorials or explanations. I dread regular expressions!!!
Can anyone help please?

HoganWang wrote:
Ur regular expression is right. The regular expression has simple and complex versions. If the replaceAll is frequently called, it is recommended to use Pattern to compile the regular expressions first.How does the expression know that the hypen isn't part of range? I guess the only way is that it is between alphabetical letters or numbers.
In terms of efficiency. This is called once per page request i.e.
somedomain.com/somecategory/title_title_title
Well, I need to be able to translate title&title$title to title_title_title. It doesn't seem like a pre-compiling the regular expression will speed it up since between page requests, it won't remember fields or am I wrong?

Regular expressions in Workshop 8.1

Hello,
I'm posting this question here because I don't see a "jdk" subcategory in this
newsgroup and it might be problem peculiar to Workshop.
I'm trying to use the Pattern and Matcher classes in java.util.regex (JDK 1.4.2)
in BEA Workshop 8.1, but I'm getting "ERROR: Unknown escape code" (red squiggly
line appears under the regex and this message is the screen tip) whenever I try
to use the backslash to escape a special character in the Pattern.compile() and
the Pattern.matches() methods.
For example, it doesn't allow "\d" to mean "any digit". For this particular one,
I can get around the problem by specifying "[0-9]", but in the case of the period
character, I'm stuck. I cannot use "\." However, the JDK API doc (http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)
says the backslash is to be used for this purpose, if I'm reading it correctly.
Is this a problem with Workshop, and is there a workaround? I need to specify
that I require one and exactly one period.
Any help would be most appreciated!
Thanks.

Yes, I had read the Java doc, but I guess I hadn't fully understood it. Now I
do! Thanks!!
David
Josh Eckels <[email protected]> wrote:
This isn't particular to Workshop, but you'll need to use two
backslashes in your source code. Inside a string, backslash is used to
escape the next character so that you can enter special characters like
newlines ('\n'), tabs ('\t'), etc.
So, in order to enter a backslash character into your string, you need
to escape it, like '\\'.
There's a small section on this in the java.util.regex.Pattern JavaDoc,
under the "Backslashes, escapes, and quoting" header:
Backslashes within string literals in Java source code are interpreted
as required by the Java Language Specification as either Unicode escapes
or other character escapes. It is therefore necessary to double
backslashes in string literals that represent regular expressions to
protect them from interpretation by the Java bytecode compiler. The
string literal "\b", for example, matches a single backspace character
when interpreted as a regular expression, while "\\b" matches a word
boundary. The string literal "$hello$" is illegal and leads to a
compile-time error; in order to match the string (hello) the string
literal "\$hello\$" must be used.
Josh
David Chang wrote:
Hello,
I'm posting this question here because I don't see a "jdk" subcategoryin this
newsgroup and it might be problem peculiar to Workshop.
I'm trying to use the Pattern and Matcher classes in java.util.regex(JDK 1.4.2)
in BEA Workshop 8.1, but I'm getting "ERROR: Unknown escape code" (redsquiggly
line appears under the regex and this message is the screen tip) wheneverI try
to use the backslash to escape a special character in the Pattern.compile()and
the Pattern.matches() methods.
For example, it doesn't allow "\d" to mean "any digit". For this particularone,
I can get around the problem by specifying "[0-9]", but in the caseof the period
character, I'm stuck. I cannot use "\." However, the JDK API doc
(http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)
says the backslash is to be used for this purpose, if I'm reading itcorrectly.
Is this a problem with Workshop, and is there a workaround? I needto specify
that I require one and exactly one period.
Any help would be most appreciated!
Thanks.

Regular Expressions, please help.

Hello everyone.
Can I get a Java Regular Expression to match with a word of the following language...
Start --> Expression;
Expression --> [0-9]+;
Expression --> Expression * Expression;
So the regexp should match with words like:
4;
4664;
4 * 763;
5 * 4534 * 23534;
04 * 002 * 1 * 10 * ...
I would be very happy, if anyone could help.

I dont think that I need to learn anything more.
I am sure it is not possible to make, what I want.
I want to build a compiler.
I just finished the abstract syntax of my language. Now I need a possibility to compile the concrete syntax of my language to the abstract one.
But I think, it is not possible with regular expressions.
Cause I need possibility to match a syntax of type chomsky 2.
I think regular expressions only match chomsky 3 languages.
But the "Backtracking"-mechanism of Java RegExp could do this.
I am not sure with this.
If you have any ideas please post.

Regular Expression - replaceAll() - how to replace words?

Similar Messages

Maybe you are looking for