Regular expression - Replace a part of an expression

Dear All,
This is not a business requirement, just trying to practice regexp.
Suppose we want to replace a part of a regular expression from a string. As an example, in the string 'THIS Number 124356 Is to Change.This Number 5 Also', I am trying to replace the last digit of each number with $ sign.
Output will be
'THIS Number 12435$ Is to Change.This Number $ Also'.
Will this be possible using a single regexp_replce?
Thanks in advance.

MichaelS wrote:
I am trying to achieve this using a SINGLE regexp_replace.Here we go:
SQL> with t as (
select 'ABC124556def568gh236klJ258' str from dual union all
select 'THIS Number 124356 Is to Change.This Number 5 Also' str from dual
select str, regexp_replace(str, '(\d{0,})\d{1}', '\1$') str2
from t
STR                                                     STR2                                                  
ABC124556def568gh236klJ258                              ABC12455$def56$gh23$klJ25$                            
THIS Number 124356 Is to Change.This Number 5 Also      THIS Number 12435$ Is to Change.This Number $ Also    
2 rows selected.
Nice..
Learning for me ..
Never thought of usage like {0,} ... kepping the end value OPEN.
I was trying with \d+?\d, which was not working..

Similar Messages

  • Introduction to regular expressions ... last part.

    Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
    Having fun with regular expressions - Part 3
    In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
    ;I can even use strings composed of values like 'abc, xyz ,  012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
    Ready to try your own solution?
    Does it look like this?
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz ,  012', ' *, *', '|') || ')$')
    ;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
    ; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
    SELECT data
      FROM (SELECT data, LOWER(DATA) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search LIKE 'abc%'
        OR search LIKE 'xyz%'
        OR search LIKE '012%'
    SELECT data
      FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search IN ('abc', 'xyz', '012')
    ;  I'll leave it to your imagination how a complete non regular example with 'abc, xyz ,  012' as search condition would look like.
    As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
    WITH t AS (SELECT '123-4567' phone
                 FROM dual
                UNION
               SELECT '01 345678'
                 FROM dual
                UNION
               SELECT '7 87 8787'
                 FROM dual
    SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
      FROM t
    ;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
    Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
    First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
    Single digit values: 0-9
    Double digit values: 00-99
    Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
    So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
    SELECT data
      FROM TABLE(regex_utils.gen_data('0123456789', 3))
    WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$') 
    ; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
    Using first my brute force generator, I'll check if I've missed any possible combination:
    SELECT data
      FROM TABLE(regex_utils.gen_data('03.', 15))
    WHERE REGEXP_LIKE(data,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  Looks good to me. Let's check on some sample data:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
                UNION 
               SELECT '256.128.64.32'
                 FROM dual            
    SELECT t.ip
      FROM t WHERE REGEXP_LIKE(t.ip,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
    Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
      FROM t
    ;  Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                                '[0-9]*([0-9]{3})(\.?)', '\1\2'
      FROM t
    ;  Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
    Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
    "[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
    Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
    Here's mine:
    SELECT data
      FROM TABLE(regex_utils.gen_data('a1@.', 9))
    WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
    How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
    Does it look like this?
    WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
                 FROM dual
                UNION
               SELECT 'http://x/'
                 FROM dual
    SELECT t.URL
      FROM t
    WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
    Update: Improvements in 10g2
    All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
    Rewriting my example from the first lesson, the WHERE clause would look like this:
    WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
    '^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
    Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
      FROM dual
      ;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
    Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
      FROM dual
      ;This is old style, "greedy" search pattern, returning everything until the last space.
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
      FROM dual
      ;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
      FROM dual
      ;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
    SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
      FROM dual
      ;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
    I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
    C.
    Fixed some typos ...
    Message was edited by:
    cd
    Included 10g2 features
    Message was edited by:
    cd

    Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
    but for better understanding what I'd like to get
    next example:
    a have strings of two blocks separated by space.
    in the first block 5 symbols of [01] in the second block 3 symbols of [01].
    In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
    The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
    with t as (select '10(!)010 10(>)1' num from dual union all
    select '1112(!)0 111' from dual union all --incorrect because of '2'
    select '(!)10010 011' from dual union all
    select '10010(!) 101' from dual union all
    select '10010 100(>)' from dual union all
    select '13001 110' from dual union all -- incorrect because of '3'
    select '100!01 100' from dual union all --incorrect because of ! without (!)
    select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
    select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
    select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
    select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
    select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
    --end of test data

  • "Multiple Line" Regular expressions in find/replace

    Hi there,
    I'm wondering if it is possible to find and replace (in batch
    mode) the following obstacle:
    say is have this piece of code in every html file i'd like to
    edit:
    <html>
    <blah blah (Same text in all files)>
    <blah blah (NOT same text in all files)>
    <blah blah (NOT same text in all files)>
    <code that is the same again in all files>
    and would like to replace it by nothing, effectively deleting
    the part completely... (by leaving the replace by field empty).
    The tricky part is that the lines which are NOT the same in
    all files, are sometimes 2 lines, but sometimes 3 or more lines.
    Somehow I need to get the regular expression to include
    <line>* (or something similar), in order to make the regular
    expression work.
    Any ideas on how to solve this? I must say untill now, the
    Dreamweaver search and replace function has been the most effective
    one, compared to many alternatives out there.
    Rp

    I've tried \1 and $1Just these in the text items of the find & replace dialog box...?????
    Can you write down exactly what have you tried....(text to be replaced by which....)????
    Greetings....
    Sim

  • Regular expression - Replace not like

    Hi All,
    Is there a regexp pattern to replace anything other than 'oracle' from the below string.
    "oracle sdsd oracle xyd fgh oracle idmdh asasas trtrt"
    The result will be "oracleoracleoracle"
    If I want to write like regexp_replace('oracle sdsd oracle xyd fgh oracle idmdh asasas trtrt',<pattern>), what should be the pattern?
    I know how to get the result by nesting regexp and other functions.
    But is there any single pattern for this?
    Thanks in advance.
    Note; This is not a business requirement, trying to learn regexp..

    884476 wrote:
    Could you please explain what does that pattern mean? We can look at your string as a sequence of substrings where each substring is set of characters (part A) followed by word oracle or, in last substring, by end-of-line (part B). Each of such substrings be want to replace with part B thus removing part A.
    Dot (.) means any character. Asterisk (*) means repeated 0 or more times. This is our part A (I'll get back to ? later). Pipe (|) means OR. Dollar sigh ($) means end-of-line. Parenthesis mean grouping. So ((oracle)|$) means string oracle or end-of line. This is our part B. Now back to question mark. By default Oracle regular expressions are "greedy". We say any character repeated any number of times followed by word oracle. Since word oracle itself matched definition of any character repeated any number of times (6 in this case) regexp will match it that way that it will be from the beginning of the string to last occurrence of word oracle - that's why it is called "greedy". Question mark (?) tells regexp not to use greedy matching, therefore '.*?oracle' will stop at first occurrence of word oracle. Now replacement string. Notation \1 is grouping backreference. Group 1 is ((oracle)|$) and, as I already noted, means string oracle or end-of line (whichever was found).
    SY.

  • Replace All with Regular Expression

    Hi all,
    I need a help to replace the String:
    to
    "<a href=\"1\"">Java Programming</a>"
    I was trying to replace with
    .replaceAll("\\[\\[*([^\\]]*?)*\\]\\]", "<a href=\"\"></a>")
    but I don't know how to separate the parameters values.
    Best regards                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

    Hi prometheuzz,
    you are the one!!
    But how I could do to a Srting like this:
    Ah, more requirements...
    Things are getting a bit messy now:
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    class Main { 
        public static void main(String[] args) {
            String line = "any text any text any text any text any text. "+
                          "Click - [[Code: 6 Title: Java Programming]]. "+
                          "any text any text any text any text any text. "+
                          "- [[C�digo: 2 T�tulo: Sun]] text text "+
                          "any text any text any text any text any text.";
            String newLine = line;
            String[] wikiTags = getWikiTags(newLine);
            for(String tag : wikiTags) {
                String newTag = "<a href=\""+get("(Code:|C�digo:)", " ", tag)+
                                "\">"+get("(Title:|T�tulo:)", "$", tag)+"</a>";
                newLine = newLine.replaceFirst("\\[\\["+tag+"\\]\\]", newTag);
            System.out.println(line);
            System.out.println(newLine);
        public static String[] getWikiTags(String text) {
            java.util.List<String> list = new java.util.ArrayList<String>();
            Pattern pattern = Pattern.compile("(?<=\\[\\[)(.*?)(?=\\]\\])");
            Matcher matcher = pattern.matcher(text);
            while(matcher.find()) {
                list.add(matcher.group());
            return list.toArray(new String[list.size()]);
        public static String get(String start, String end, String text) {
            Pattern pattern = Pattern.compile("(?<="+start+"\\s)(.*?)(?="+end+")");
            Matcher matcher = pattern.matcher(text);
            return matcher.find() ? matcher.group() : "#ERROR#";
    }Details about regex:
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
    http://www.regular-expressions.info/java.html
    http://java.sun.com/docs/books/tutorial/essential/regex/
    >
    Thanks a lot for you helpLike I said, it's a bit of a messy solution. See if you can use (a part of) it.
    Good luck.

  • Replace string with regular expression

    I am new to the regular expression. I am just trying to remove double quotes from a string.
    My sample string is sql statement(SELECT "SCHEMA"."TABLE1"."COLUMN1", "SCHEMA"."TABLE1"."COLUMN2" FROM "SCHEMA"."TABLE1", "SCHEMA"."TABLE2" WHERE "SCHEMA"."TABLE1"."COLUMN1" = "SCHEMA"."TABLE2"."COLUMN1" );
    This is what I came up with.
    String s = query.replaceAll("\\.\"", ".").replaceAll("\"\\.", ".").replaceAll("\\s\"", " ").replaceAll("\\s\"", " ").replaceAll("\",", ",").replaceAll("\"\\s", " ");
    But I want to optimize this line.
    Could anybody tell me how to call replaceAll once with using regular expression instead of calling replaceAll multiple times? Some table or column names can have double quotes, so I want to remove the double quotes without replacing the double quotes which is part of the table or column name.
    I'd appreciated it.

    caesarkim1 wrote:
    I am new to the regular expression. I am just trying to remove double quotes from a string.
    My sample string is sql statement(SELECT "SCHEMA"."TABLE1"."COLUMN1", "SCHEMA"."TABLE1"."COLUMN2" FROM "SCHEMA"."TABLE1", "SCHEMA"."TABLE2" WHERE "SCHEMA"."TABLE1"."COLUMN1" = "SCHEMA"."TABLE2"."COLUMN1" );
    This is what I came up with.
    String s = query.replaceAll("\\.\"", ".").replaceAll("\"\\.", ".").replaceAll("\\s\"", " ").replaceAll("\\s\"", " ").replaceAll("\",", ",").replaceAll("\"\\s", " ");
    ...Note that the following two lines are equivalent:
    s = s.replaceAll("a","").replaceAll("b","");
    s = s.replaceAll("a|b","");

  • Extended Replace with Regular Expressions

    I am trying to figure out how to user regular expressions in
    HomeSite 5.5 extended search and replace.
    I am trying to find all matches to: <span
    class="someClass">FooText</span>
    And replace with: <sometag>FooText</sometag>
    I have been able to get the find part working with reg ex:
    <span class=\"someClass\">*[A-Za-z0-9_
    ><.="']*</span>
    However, I have not been able to figure out what to put in
    the replace box. I have tried different combinations of syntax with
    "\1" for the first match of the reg ex in the search field, but
    nothing has worked.
    Can someone help me out with the syntax for what I should
    enter in the replace field?

    SQL>with t as(
      2   select 1 id,'a   aa ---- ss-ee -' str from dual union
      3   select 2, 'a        aa -------- ss-ee ---- - - -' str from dual union
      4   select 3, '-' str from dual union
      5   select 4, '----- -- -a -' str from dual union
      6   select 5, '-a-a-a-a----a-a-a-' str from dual
      7  )
      8  select id,
      9  str,
    10  trim(trim('-' from regexp_replace(str, '( |-){2,}', '\1'))) new_str,
    11  length(str) len,
    12  length(trim(trim('-' from regexp_replace(str, '( |-){2,}', '\1')))) new_len
    13  from t;
    ID STR                                 NEW_STR                      LEN          NEW_LEN
      1 a   aa ---- ss-ee -                 a aa ss-ee                    19               10
      2 a     aa -------- ss-ee ---- - - -  a aa ss-ee                    34               10
      3 -                                                                  1
      4 ----- -- -a -                       a                             13                1
      5 -a-a-a-a----a-a-a-                  a-a-a-a-a-a-a                 18               13

  • Regular expressions-how to replace [ and ] characters from a string

    Hi,
    my input String is "sdf938 [98033]". Now from this given string, i would like to replace the characters occurring within square brackets to empty string, including the square brackets too.
    my output String needs to be "sdf938" in this case.. How should I do it using regular expressions? I tried several possible combinations but didn't get the expected results.

    "\\s*\\[[^\\]]+\\]"

  • How can I remove all content between two tags using Find/Replace regular expressions?

    This one is driving me bonkers...  I'm relatively new to regular expressions, but I'm trying to get Dreamweaver to remove all content between two tags in an XML document.  For example, let's say I have the following XML:
    <custom>
    <![CDATA[<p>Some text</p>
    <p>Some more text</p>]]>
    </custom>
    I'd like to do a Find/Replace that produces:
    <custom>
    </custom>
    In essence, I'd like to strip all of the content between two tags.  Ideally, I'd like to know how to strip the CDATA content as well, to return the following:
    <custom>
    <![CDATA[]]>
    </custom>
    I'd much appreciate any suggestions on accomplishing this.
    Many thanks!

    Thanks much for your response.  I found David's article to be a little thin with respect to examples using quantifiers in coordination with the wildcard metacharacters; however, I was able to cobble together a working expression through trial and error using the information he presented.  For posterity, here’s the solution:
    Find:
    <custom>[\d\D]*?</custom>
    Replace:
    <custom>
    <![CDATA[]]>
    </custom>
    I believe this literally translates to:
    [] = find anything in this range/character class
    \d = find any digit character (i.e. any number)
    \D = find any non-digit character (i.e. anything except numbers)
    *? = match zero or more times, but as few times as possible (i.e. match multiple characters per instance, but only match one instance at a time, or none at all)
    I’m still not sure how to effectively utilize the . wildcard.  For example, the following expression will not find content that ends with a number:
    <custom>.*?[\D]*?</ custom >
    I'm presuming this is because numbers aren't included in the \D metacharacter; however, shouldn't numbers be picked up by the .*? expression?

  • Regular expression and pattern matching/replacing

    I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
    My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
    I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replacementStr);
    But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
    Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
    Any suggestions are most welcome.

    I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
    What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

  • Regular Expression replacement not working

    I am trying to use a regular expression to replace non-ascii characters on a file, and I'm afraid I've reached the end of my regex knowledge. 
    Here is the specific code
    'Set the Regular Expression paramaters
    Set RegEx = CreateObject("VBScript.Regexp")
    RegEx.Global = True
    RegEx.Pattern = "[^\u0000-\u007F]"
    RegEx.IgnoreCase = True
    'Replace the UTF-8 characters
    ReplacedText = RegEx.Replace(FileText, "\u0020")
    If I understand regular expressions correctly the pattern of "[^\u0000-\u007F]" should replace any character that is not an ascii character, and then replace it with a space (which I understand is "\u0020").  What am I doing wrong?

    Simply use
    ReplacedText = RegEx.Replace(FileText, " ")
    Regards, Hans Vogelaar (http://www.eileenslounge.com)

  • Using Regular Expressions to replace Quotes in Strings

    I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
    String temp = "\"Hello\" i am a \"variable\"";
    temp = temp.replaceAll("\"","\\\\\"");
    however, this does not work and when i print out the code to the file the resulting code appears as:
    String someVar = ""Hello" i am a "variable"";
    and not as:
    String someVar = "\"Hello\" i am a \"variable\"";
    I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
    Thanks in advance.

    Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

  • Regular Expressions find and replace

    Hi ,
    I have a question on using Regular Expressions in Java(java.util.regex).
    Problem Description:
    I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
    If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
    I have to do it inside a servlet and hence have to use java.
    I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
    //Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
              String ddurl="http://www.google.com/";
    String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
    Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
    Matcher m = p.matcher (strHTML);
    while(m.find())
    m.replaceAll(ddurl+m.group(1));
    what is wrong in this?
    Thanks,
    Rajiv

    Right, here's the full monte (whatever that means):import java.util.regex.*;
    public class Test1
      public static void main(String[] args)
        String domain = "http://www.google.com/";
        String strHTML =
          " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=images/logo.gif >\n" +
          " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
          " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
        String regex =
          "(<\\s*img.+?src\\s*=\\s*)   # Capture preliminaries in $1.  \n" +
          "(?:                         # First look for URL in quotes. \n" +
          "   ([\"\'])                 #   Capture open quote in $2.   \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?(.+?)                  #    ...capture URL in $3       \n" +
          "   \\2                      #   Match the closing quote     \n" +
          " |                          # Look for non-quoted URL.      \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?([^\\s>]+)             #    ...capture URL in $4       \n" +
        Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
        Matcher m = p.matcher(strHTML);
        StringBuffer sbuf = new StringBuffer();
        while (m.find())
          String relURL = m.group(3) != null ? m.group(3) : m.group(4);
          m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
        m.appendTail(sbuf);
        System.out.println(sbuf.toString());
    }First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
    As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

  • [CC] Search&Replace: Bug only with Regular Expressions

    Can you confirm the following behaviour/bug?
    Steps to reproduce:
    1 Create a new document in DW CC
    2 Enter that text in the source code view:
    <p>&euro;</p>
    3 Open Search&Replace
    Field Search in: Current document
    Field Search: Text
    Field Search (3rd Field): €
    Start Search
    Result: As expected the "€" is found.
    4 [x] Regular Expressions
    Start Search
    Result: "€" isn't found
    Can you reproduce that?
    Do you regard that as a bug like me?
    If not, why please?
    Do you think it is technically impossible for the developers to solve the task?
    Do you think the developers forgot to gray out "text" in the choice box and allow regular expressions only in source code?
    Thanks.

    Nice that you like the nick
    Sure, I know that I can file a feature wish.
    But my main interest in this forum is to know, what other users think about certain features, why they think so, which they miss, which they regard as a bug, ...
    When I remember it right, you told me some time ago, that you don't use RegEx.
    Than I understand, that everything around that feature is not important for you.
    What I like to understand is, why you think the behaviour is logically.
    For me it is the opposite.
    You get a result with "Scope: Text", "[ ]RegEx".
    And you get no result with "Scope: Text", "[x]RegEx".
    Incredible strange.
    I would appreciate, if you could explain more detailed your view.

  • Back reference for regular expressions on "Search & Replace".

    Can't I reference a result from my Regular Expression used on "Search Clause" into my "Replace Clause"?
    What I'm doing is:
    Using JDev 10.1.3 DP,
    Search Menu -> Replace in Files...
    Select "Regular Expressions"
    Text: (A[0-9]*?_)(.)
    Replace with: $2 (reference to the second Parenthesis in the "Text" Field).
    For example, in Java, I would so something like this:
    "Dummy String".replaceAll("(D.*? )(S)", "$2");
    Are there any other ways that I can just Search something and Replace for nothing? I need a simple "Search and Delete", you know?
    If I don't do that and try to leave it blank, a warning comes up saying that the "Replace with" field cannot be blank.
    Any sugestions?

    BOUNCE!
    Please, anyone?
    Will I really have to delete 1200 occurrences by hand???
    :(

Maybe you are looking for