Regular Expression - replaceAll() - how to replace words?

Hiya,
I have this regex to replace all instances of myWord:
String oldWord = "oldWord";
String newWord = "newWord";
String sentence = "some sentence that contains " + oldWord;
String newSentence = replaceWordsInSentence(sentence, oldWord, newWord);
private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
    return sentence.replaceAll("\b" + oldWord + "\b", newWord);
}...it works in most instances, but when oldWord is at the end of the sentence it is not replaced. Presumably the problem is that "/b" is not a sufficient word boundary. Can someone help me out with the correct regular expression code?
Thanks,
James

Mel, you did appear to misunderstand as you thought points 2 and 3 were alternatives, but you now recognise that they are additional "shoulds".
Of course, I applied the extra backslash as soon as Joachim advised. Maybe you don't agree with my rationale, but I prefer the complete solution that will work in all instances... so was simply waiting for him to post a code example that included the latter 2 points as (although I understood the point of them perfectly) I was not sure how to implement them.
Have come up with the following, expanded, method...
    private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
        return sentence.replaceAll("\\b" + Pattern.quote(oldWord) + "\\b", Matcher.quoteReplacement(newWord));
    }...works fine with the tests I have run. Joachim, can you confirm this is correct.

Similar Messages

  • Regular expression and pattern matching/replacing

    I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
    My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
    I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replacementStr);
    But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
    Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
    Any suggestions are most welcome.

    I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
    What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

  • Regular Expression back-reference Find/Replace in SQL Developer 4.0

    I cannot seem to get the Search/Replace functionality to use back-references with Regular Expressions in 4.0.  This worked fine in 3.0.
    Text in my editor:
    abc
    Example:
    Find: (a)(bc)
    Replace:
    \1\1\1\2
    Should result in:
    aaabc
    Instead I get:
    \1\1\1\2
    This still isn't fixed in 4.0.2 - I'll try to find time to put in MOS as a bug...

    You know what, I think I'm logging in a sys on the command line on the sqlplus side of the house, AHA!
    Wow, I must be tired....LOL...thanks

  • Using regular expressions to find and replace code.

    Hi! Semi-newbie coder here.
    I'm trying to strip out code from multiple pages, I've tried regular expressions but I'm struggling to understand them. I also need to do it across a LOT of pages, so I need an automated way of doing it.
    The best way I can explain is with an analogy:
    I want to delete any string of characters that start with c, ends with t and includes anything inbetween, so it would pick up "cat, cut, chat, coconut, can do it" whatever appears in the middle of those.
    Except, instead of c and t, I want it to find strings of code starting with <div class="advert" and ending with Vote<br> while picking up everything in between, (including spaces, code, comments, etc.). Then, deletes that whole string including the starting and ending.Is there a regular expression I could use in dreamweaver that could do this? Is there a way to do this at all?

    Let me begin by saying I'm a complete idiot with DW's Reg Ex.   I use Search Specific Tag whenever possible.  See screenshot below.
    Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
    Good luck,
    Nancy O.

  • Regular expression to search if a word occurs with z and /z tags

    Hi ,
    I am trying to create a regex to search for the occurence of two words within <z> and </z> tags.(they must occur between <z> and and next immideate </z> tags)
    This is my regex
    <z>\s[\w\W]+?(?!</z>)word1[\w\W]+?(?!</z>)word2[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)word2[\w\W]+?(?!</z>)word1[\w\W]+?</z>
    I am trying to specify (?!</z>) in order that i insist that my regex engine does map for word1 and word2 within <z> and its next immideate </z> tags. The words can appear either ways word1 followed by word2 or vice versa.
    The above regex does not work fine.
    It maps fine for the following sentence
    <z> This is test for pattern for a Regex </z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
    The regex is as follows
    <z>\s[\w\W]+?(?!</z>)pattern[\w\W]+?(?!</z>)Regex[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)Regex[\w\W]+?(?!</z>)pattern[\w\W]+?</z>
    But, when i include the </z> in between pattern and Regex , it should not match, but that is not what is happening.
    <z> This is test for pattern </z> for a Regex</z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
    Please let me know how I can accomplish the same.
    Thanks.

    oops.. sorry ..this is aligned better ...
    Hi , I am using this regex
    (?=(?:(?!</z>).)+?\bpattern\b.+</z>)(?!(?:(?!</z>).)+?\bscientific\b.+</z>)((\bpattern\b(?:(?!</z>).)+)\bpattern\b|\bpattern\b)I have written the above regex to match pattern between <z> and </z> provided there is no word "scientific" within that "z" tags.
    I have been trying to replace the regex to do the same between two sentences, here in my case I have a paragraph with multiple sentences. The delimiter to determine a sentence is dot (.). So i am trying to specify the condition as above to match between two sentences - using representation for dot as (\. )
    The following is my regex
    (?=(?:(?!\.).)+?\bpattern\b.+\.)(?!(?:(?!\.).)+?\bscientific\b.+\.)((\bpattern\b(?:(?!\.).)+)\bpattern\b|\bpattern\b) But this does not work..
    Can you please tell me how to go about this?
    Thanks.

  • Regular Expression & LEVEL - how to split attribut value

    Hi Folks;
    I have to transform the value of an attibut Attr_A (of table A) in multiple attibut's values in a another table B like that :
    Table A
    Attr_A = '[only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space][only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space][only one letter from A to Z][only 5 numerics from 0 to 9][space][Operator][space]etc...
    with Operator = 'AND' or '(+),' or '(-),'
    exemple Attr_A='L12345 AND T23456 (+), U12345 (-)'
    In the result table B, I would have :
    - first column egal to 'L12345'
    - secund column egal to '1' (position of the first column value in the Attr_A)
    - third column egal to 'AND'(the operator between secund column value of table B)
    Next record :
    - first column egal to 'T23456'
    - secund to '2'
    - third to '(+),'
    etc 'U12345' '3' '(-),'
    Thanks for your help ^^
    Edited by: Moostiq on 27 avr. 2011 10:54
    Edited by: Moostiq on 27 avr. 2011 10:54

    Hi,
    Whenever you post code or data on this this, please format it and type hese 6 characters:
    \(small letters only, inside curly brackets) before and after each section of formatted text.  This will keep strings such as (+)
    from looking like
    (+)
    You need to divide attr_a into parts, where each part consists of a 6-character word followed by a space, then followed by an operator.  Since the opertors are at least 3-characters long, that means a string of n characters will have (at most) n/10 parts.  There may be other text in attr_a (in your example, there was a ',') that will be ignored.
    The first sub-query below, cntr, generates the numbers 1, 2, 3, ... up to the greatest possible numbr of parts in any attr_a.
    The second sub-query, got_part, extracts each part from attr_a.
    The main query parses each part into the columns you want.WITH cntr     AS
         SELECT     LEVEL     AS n
         FROM     (
                   SELECT MAX (LENGTH (attr_a))
                        / 10     AS max_parts
                   FROM     a
         CONNECT BY     LEVEL     <= max_parts
    ,     got_part     AS
         SELECT     REGEXP_SUBSTR ( a.attr_a
                   , '[A-Z][0-9]{5} +(AND|\(\+\)|\(-\))'
                   , 1
                   , c.n
                   )          AS part
         FROM     a
         JOIN     cntr     c ON c.n <= LENGTH (a.attr_a) / 10
    SELECT     REGEXP_SUBSTR (part, '[A-Z][0-9]{5}')     AS column_1
    ,     REGEXP_SUBSTR (part, '[0-9]')          AS column_2
    ,     REGEXP_SUBSTR (part, '[^ ]+$')     AS column_3
    FROM     got_part
    WHERE     part     IS NOT NULL

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Regular Expression for non-words

    hello all!
    can you help me construct a regular expression that will match non-word strings say "������". I will be needing this to filter words from a Microsoft Word Document.
    Thanx!

    hello all!
    can you help me construct a regular expression that
    will match non-word strings say "������". I will
    be needing this to filter words from a Microsoft Word
    Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
    Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

  • String Regular Expression for uncommon characters

    Hi,
    I am trying to get text out of HTML file for which I am using EditorKit and Document classes. After I obtain the text, the text (String) contains some characters like �. This character looks like a with French style acute accent . My problem is how to use regular expression to find and replace (replaceAll method) these unwanted characters.
    Is there a regular expression pattern for such characters?
    Thanks!
    Rahul.

    hrm I would recommend looking at the specific patterns,
    a simplified site would be here http://www.p3m.org/wiki?regex
    as a refernce . If you dont know regular expression, use
    http://www.perl.com/doc/manual/html/pod/perlre.html
    The only way I could think of constructing the regex is to use the \s and add the characters you want in that regex :s you could look into regex look ahead and look behind methods...

  • Using regular expressions to get a customized output

    Hi,
    I have a string/varchar variable with the data ',a,b,c,' in it.
    I want the display as follows:
    a
    b
    c
    I would like to get the similar output using regular expressions.
    How do I get this output using REGEXP_REPLACE or REGEXP_SUBSTR?
    Please do the needful.
    Thanks & Regards,
    Rakshit

    I remember that, however if we look closer, that one has a little flaw: The 2nd row should be null, because ",," indicates an empy field. The MODEL clause solution works just fine in this case:
    with t as (select 'aaaa,,bbbb,cccc,dddd,eeee,ffff' col1 from dual)
    -- end of sample data
    SELECT col_new
      FROM t
    MODEL
       PARTITION BY (ROWNUM rn)
       DIMENSION BY (0 dim)
       MEASURES(col1, col1 col_new)
       RULES ITERATE(99) UNTIL (ITERATION_NUMBER = LENGTH(REGEXP_REPLACE(col1[0], '[^,]')))
                    (col_new[ITERATION_NUMBER] = REPLACE(REGEXP_SUBSTR(col1[0], '(^|,)[^,]*', 1, ITERATION_NUMBER+1), ','))
    COL_NEW                                                                                                                                                                  
    aaaa                                                                                                                                                                     
    bbbb                                                                                                                                                                     
    cccc                                                                                                                                                                     
    dddd                                                                                                                                                                     
    eeee
    ffff
    7 Zeilen ausgewählt.Update: I had this nagging feeling that I missed something, and there it was. If you want to see what the problem with my solution is, change the example to
    with t as (select ',aaaa,,bbbb,cccc,dddd,eeee,ffff' col1 from dual)So I went back and tried to fix BlueShadows approach. Here it is:
    with t as (select 'aaaa,,bbbb,cccc,dddd,eeee,ffff' txt from dual)
    -- end of sample data
    SELECT REPLACE(REGEXP_SUBSTR(',' || txt, ',[^,]*', 1, level), ',') col_new
      FROM t
      CONNECT BY level <= length(regexp_replace(txt,'[^,]*'))+1
    ;C.

  • Regular Expression Help Please?

    Hi
    I'm trying to get my head round regular expressions in find
    and replace,
    it's a slow process for me!
    I have this -
    <a
    href="
    http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=http://www.website-ad dress.com"
    and I'm trying to change it to this -
    <a
    href="
    http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=http://www.website-ad dress.com&email="
    I was trying first of all with a *.*, but couldn't work out
    how to tell it
    where the code ends?
    They are hundred of pages like this, all with different
    website-addresses.
    After I have changed all the pages to the new code, I then
    will need to copy
    and paste an different email address to the end of each line,
    to each page.
    Unless anyone knows a way of automating that?
    Hope someone can point me in the right direction?
    Many thanks, Craig.

    Hi David
    Many thanks for all that and the detailed descriptions.
    I will be working through it all again tomorrow, so will put
    your info to
    the test! lol
    As for partially building the email addresses, I think that
    would be too
    much,
    as the emails are all over the place, some have their own
    domain, other use
    hotmail, Yahoo etc.
    Some even have they own domain for their website and a free
    one for the
    email address.
    They are all Hotels, B&B' & Cottages etc.
    Hopefully all your hard work will help me a step closer to
    understanding it
    all.
    Many thanks again,
    Craig.
    "David Stiller" <[email protected]> wrote in
    message
    news:[email protected]...
    > Craig,
    >
    >> You do have that correct David, thanks.
    >
    > Okay.
    Regex is as much an "exact science" as it is an "art
    > form" -- which isn't to say I'm a regex artist; I just
    love the
    > technology -- but I mention this because I made the
    following assumption
    > in order to keep the pattern relatively simple: your
    href values are all
    > quoted in either single or double quotes. Such as, for
    example, the
    > following sample HTML ...
    >
    > <body>
    > <a
    > href="
    http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.sample.com">asfd< /a>
    > <a
    > href='
    http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.example.net'></a>
    > <a
    > href="
    http://www.forms.mydomainname.com/cgi-bin/urltracker/tracker.pl?site=www.company.com"></a>
    > </body>
    >
    > In the Find field, enter this pattern ...
    >
    > (tracker\.pl\?site=.*?)(["'])
    >
    > ... and in the Replace field, enter this pattern ...
    >
    > $1&email=ADDRESS$2
    >
    > Then carefully use your Find Next and Replace buttons to
    step through
    > your code. The above will add &email=ADDRESS to your
    HTML in all the
    > right places. I chose that because ADDRESS is easy to
    select by double
    > clicking, which should facilitate your replacing it.
    >
    >
    > Let's step through the patterns.
    >
    > (tracker\.pl\?site=[^"']*?)(["'])
    >
    > This looks for the phrase "tracker.pl?site=" (without
    quotes) followed
    > immediately by a "non-greedy" match of any character
    that isn't a single
    > or double quotation mark, followed immediately by either
    a single or
    > double quotation mark. I took , which I took to be a
    safe, short "hook"
    > into the string we need. I split this pattern into two
    sections, grouped
    > by parentheses. This allows us to refer to the first
    part of the match
    > (everything but the closing quotation mark) as group 1,
    and the second
    > part (the closing quotatin mark) as group 2. This is
    like storying values
    > with your calculator's M (memory) button.
    >
    > $1&email=ADDRESS$2
    >
    > Here, we refer to group 1 and follow it with the phrase
    > "&email=ADDRESS" (without quotes), followed again by
    group 2.
    >
    > Now, in theory, we could use the domain name of each
    unique site to at
    > least partially build the email address. That would get
    you even closer
    > to your goal. To do so, I'd need even more detail from
    you, such as the
    > kinds of domains you have (how many sub domains are
    probable, etc.).
    >
    >
    > David
    > stiller (at) quip (dot) net
    > Dev essays:
    http://www.quip.net/blog/
    > "Luck is the residue of good design."
    >

  • Allow specific characters - Regular Expression

    Hello everyone
    I am new to regular expression and I have a very simple question. I use the "read from text file" function to load a Tab delimited file with 3 columns into my VI. Next, the string is converted in array and I use the values.
    Nevertheless, I want to develop a "filter" allowing only digits (0-9), colon, comma and point into strings.
    Using the "match regular expression" function, I was trying a regular expression like that:
    [^0-9]|[^\].[|^:]|[^,]
    But it is not working.
    Could someone help me with this issue?
    Thanks
    Dan07
    Solved!
    Go to Solution.

    Hello
    Actually I don't need to modify the string that has "invalid" characters, I just need to identify them instead. Find below a VI testing both methods: Match Regular Expression and Search and Replace String.
    Using Match Regular Expression method, I got correct results since all the "valid" values must be identified as "-1" and all the "invalid" values must be identified as positive numbers (offset).
    Nevertheless, using Search and Replace String method I got wrong results, since all the strings were classified as "valid" (-1), but "bg" and "03/12/2010" are not "valid".
    I will go ahead with Match Regular Expression method because it is working great, but I was just wondering how to fix Search and Replace String method to achieve equivalent results.
    Thanks
    Dan07
    Attachments:
    Regular Expression_example.vi ‏18 KB

  • Help: Verify or Suggest a Simple Regular Expression

    I'm trying to do a mapping from a title to a file name portion of a URL. Thus the result needs to follow the rules as specified here:
    http://labs.apache.org/webarch/uri/rfc/rfc3986.html#unreserved
    I identified the following characters as legal: a-z / 0-9 / "-" / "." / "_" / "~"
    Everything else has to be converted to an underscore.
    I came up with the following expression:
    someString.replaceAll("[^a-zA-Z0-9-._~]", "_")Is that correct? I spent a lot of time trying to figure out regular expressions, but it seems like everyone (i.e. PHP, TextPad and now Java) has a slightly different version and to top it off, there not very good tutorials or explanations. I dread regular expressions!!!
    Can anyone help please?

    HoganWang wrote:
    Ur regular expression is right. The regular expression has simple and complex versions. If the replaceAll is frequently called, it is recommended to use Pattern to compile the regular expressions first.How does the expression know that the hypen isn't part of range? I guess the only way is that it is between alphabetical letters or numbers.
    In terms of efficiency. This is called once per page request i.e.
    somedomain.com/somecategory/title_title_title
    Well, I need to be able to translate title&title$title to title_title_title. It doesn't seem like a pre-compiling the regular expression will speed it up since between page requests, it won't remember fields or am I wrong?

  • Regular expressions in Workshop 8.1

    Hello,
    I'm posting this question here because I don't see a "jdk" subcategory in this
    newsgroup and it might be problem peculiar to Workshop.
    I'm trying to use the Pattern and Matcher classes in java.util.regex (JDK 1.4.2)
    in BEA Workshop 8.1, but I'm getting "ERROR: Unknown escape code" (red squiggly
    line appears under the regex and this message is the screen tip) whenever I try
    to use the backslash to escape a special character in the Pattern.compile() and
    the Pattern.matches() methods.
    For example, it doesn't allow "\d" to mean "any digit". For this particular one,
    I can get around the problem by specifying "[0-9]", but in the case of the period
    character, I'm stuck. I cannot use "\." However, the JDK API doc (http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)
    says the backslash is to be used for this purpose, if I'm reading it correctly.
    Is this a problem with Workshop, and is there a workaround? I need to specify
    that I require one and exactly one period.
    Any help would be most appreciated!
    Thanks.

    Yes, I had read the Java doc, but I guess I hadn't fully understood it. Now I
    do! Thanks!!
    David
    Josh Eckels <[email protected]> wrote:
    This isn't particular to Workshop, but you'll need to use two
    backslashes in your source code. Inside a string, backslash is used to
    escape the next character so that you can enter special characters like
    newlines ('\n'), tabs ('\t'), etc.
    So, in order to enter a backslash character into your string, you need
    to escape it, like '\\'.
    There's a small section on this in the java.util.regex.Pattern JavaDoc,
    under the "Backslashes, escapes, and quoting" header:
    Backslashes within string literals in Java source code are interpreted
    as required by the Java Language Specification as either Unicode escapes
    or other character escapes. It is therefore necessary to double
    backslashes in string literals that represent regular expressions to
    protect them from interpretation by the Java bytecode compiler. The
    string literal "\b", for example, matches a single backspace character
    when interpreted as a regular expression, while "\\b" matches a word
    boundary. The string literal "\(hello\)" is illegal and leads to a
    compile-time error; in order to match the string (hello) the string
    literal "\\(hello\\)" must be used.
    Josh
    David Chang wrote:
    Hello,
    I'm posting this question here because I don't see a "jdk" subcategoryin this
    newsgroup and it might be problem peculiar to Workshop.
    I'm trying to use the Pattern and Matcher classes in java.util.regex(JDK 1.4.2)
    in BEA Workshop 8.1, but I'm getting "ERROR: Unknown escape code" (redsquiggly
    line appears under the regex and this message is the screen tip) wheneverI try
    to use the backslash to escape a special character in the Pattern.compile()and
    the Pattern.matches() methods.
    For example, it doesn't allow "\d" to mean "any digit". For this particularone,
    I can get around the problem by specifying "[0-9]", but in the caseof the period
    character, I'm stuck. I cannot use "\." However, the JDK API doc
    (http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)
    says the backslash is to be used for this purpose, if I'm reading itcorrectly.
    Is this a problem with Workshop, and is there a workaround? I needto specify
    that I require one and exactly one period.
    Any help would be most appreciated!
    Thanks.

  • Regular Expressions, please help.

    Hello everyone.
    Can I get a Java Regular Expression to match with a word of the following language...
    Start --> Expression;
    Expression --> [0-9]+;
    Expression --> Expression * Expression;
    So the regexp should match with words like:
    4;
    4664;
    4 * 763;
    5 * 4534 * 23534;
    04 * 002 * 1 * 10 * ...
    I would be very happy, if anyone could help.

    I dont think that I need to learn anything more.
    I am sure it is not possible to make, what I want.
    I want to build a compiler.
    I just finished the abstract syntax of my language. Now I need a possibility to compile the concrete syntax of my language to the abstract one.
    But I think, it is not possible with regular expressions.
    Cause I need possibility to match a syntax of type chomsky 2.
    I think regular expressions only match chomsky 3 languages.
    But the "Backtracking"-mechanism of Java RegExp could do this.
    I am not sure with this.
    If you have any ideas please post.

Maybe you are looking for