RegEx replacement

I'm trying to make a small parsing engine that does the following:
1. reads an HTML document
2. searches for special tags of the form [key]
3. looks up the key in a hashmap or something
4. replaces [key] with value from the hashmap
I'm storing the HTML document line by line as strings in an ArrayList (any better way to do this?). I iterate through each line, replacing each tag with its corresponding value. The thing is, however, that I'd like to use RegEx as much as possible.
The java.lang.String.matches(String regex) method tells me, through a boolean return value, whether the current line actually contains any tag at all , while the java.lang.String.replaceFirst(String regex, String replacement) method solves the replacement part. What the String class methods don't provide me with is a method for returning the actual substring that was found during the java.lang.String.matches(String regex) call. Is there any such method out there? Right now I'm doing this manually, using the String class' indexOf(...) and lastIndexOf(..) methods.
In short, what I'm looking for is a method something like String find(String text, String regex) that would return "salary" when called as find("<h1>[salary]</h1>", ".*\\[.*\\].*")
Thanks for your help!

You must use Patterns and Matchers (java.util.regex.*).
In short, what I'm looking for is a method something like String find(String text, String >regex) that would return "salary" when called as find("<h1>[salary]</h1>", ".*\\[.*\\].*")Let's
String html
be your input.
Pattern p = Pattern.compile("(?im)<h1>(.*?)</h1>");
Matcher m = p.matcher(html);
if( m.find() ){
// the whole matching substring is in m.group();
// the string within the h1's is in m.group(1)
}

Similar Messages

  • Variable in regex replace pattern

    Hi,
    I need to use a variable in a regex replace pattern - how can I do it? Specifically, I need to pass arguments to a shell script that then uses that argument in a replace pattern:
    #!/bin/bash
    #$1 now holds the argument
    sed 's/searchpattern/replace_pattern_with_variable$1/g' file1 > file2
    when I run this, the replace pattern uses $1 as a literal string "$1" and not the variable value. How can I change this?
    Thanks!
    Ingo

    Hi Ingo,
       As Vid points out, the issue is that single quotes protect strings from shell interpretation. You need to have the dollar sign, '$', visible to the shell or it won't read what follows as a variable name. Using double quotes works because the shell "reads through" those.
       However, complex regular expressions can contain lots of characters that the shell interprets. These can be quoted individually by backslashes but the use of backslashes in regular expressions is complex enough without the addition of shell interpretation. I find it easiest to keep the single quotes and only expose the part of the string that the shell needs to interpret.
       The shell doesn't have a special string concatenation character. All you have to do is to put the strings beside each other with nothing in between and the shell will concatenate them. Therefore it's possible to write your example as:
    sed 's/searchpattern/replace_pattern_with_variable'${1}'/g' file1 > file2
    That is, one closes the single quote right before the variable and then resumes it immediately afterward. The shell will put these quoted strings together with the contents of the variable just as it would with double quotes but you still enjoy the protection of single quotes around the rest of the string!
    Gary
    ~~~~
       P.S. Perl's master plan (or what passes for one) is to take
       over the world like English did. Er, as English did...
          -- Larry Wall in <[email protected]>

  • Efficient regex replace function? Help!

    Here's what I'm trying to do:
    Scan through a string, and replace any occurences of http://...... (until the next space) with an HTML link. Does anyone know what would be the most efficient regex replace function to do this?
    Thank you very much in advance.

    class StringReplaceDemo{
         public static void main (String args[]){
              String mystr = "This is hyperlink to Google http:// . From there you can search for any thing in the world";
              String google = "http://www.google.com";
              String hyperlink = "http://";
              int hyperlinklength = hyperlink.length();
              int index = mystr.indexOf("http://");
              mystr = mystr.substring(0,index)+google+mystr.substring((index+hyperlink.length()),mystr.length());
              System.out.println(mystr);
    }

  • Applescript Regex Replace Usage

    I am using Applescript to do regex replace for a pattern of type APP[0-9][0-9][a-z][a-z] and display it as a hyperlink.
    Eg: APP23cc to APP23cc
           APP36ij to APP36ij
    I am getting the body of the email as a string.
    How could I do this?

    I'm not exactly clear on what you're trying to do (display the hyperlink where and how? what link are you trying to link?), but I can say that applescript does not do regexp natively.  you have two choices:
    download the Satimage osax so that you can do regexp
    use a simpler search pardigm
    a simpler search would require a repeat loop, like so:
    repeat with w in words of body
              considering case
                        if w begins with "APP" then
      -- do whatever you're trying to do here
                        end if
              end considering
    end repeat
    you can make that IF as detailed as you need it to be to specify the things you want to work with.

  • Understanding Regex replace method call involving delegate

    Hello,
    I am trying to understand the $regex.replace static method call below (I came across this code snippet in the cookbook).
    $replacer = {
    param($match)
    $chars = $match.Groups[0].Value.ToCharArray()
    [Array]::Reverse($chars)
    $chars -join ''
    $regex = [Regex] "\w+"
    $regex.Replace("Hello World wide", $replacer)
    What I do not understand is the below overloaded definitions for replace method do not seem to match the above replace call. So how exactly is this working? The above call has 2 parameters passed where as none of the below overloads have less than
    3 parameters.
    PS C:\WINDOWS> [regex]::replace
    OverloadDefinitions
    static string Replace(string input, string pattern, string replacement)
    static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options)
    static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options, timespan matchTimeout)
    static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator)
    static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options)
    static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options, timespan
    matchTimeout)

    What you are looking at are the static methods ([regex]::) and their appropriate parameters which in this case have a minimum of 3 parameters to properly perform the Replace using the input, pattern and replacement
    value. If you were to use the constructor of [regex] to create a pattern like this:
    $Regex = [regex]'\w'
    You will see that the Replace method here allows for only 2 parameters because you have already satisfied the pattern when you created the Regex object.
    $Regex.Replace
    OverloadDefinitions
    string Replace(string input, string replacement)
    string Replace(string input, string replacement, int count)
    string Replace(string input, string replacement, int count, int startat)
    string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator)
    string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count)
    string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count, int startat)
    Boe Prox
    Blog |
    Twitter
    PoshWSUS |
    PoshPAIG | PoshChat |
    PoshEventUI
    PowerShell Deep Dives Book

  • Using a local variable in regex portion of replaceAll(regex, replacement)

    While this works..
    output = output.replaceAll("(HED>|AUT>)(.*)(</\\1)", "$1<![CDATA[$2]]>$3");
    I'd like the list of alternation values to be contained in a variable, for example:
    String nodeLIst = "HED>|AUT>";
    output = output.replaceAll("(nodeList)(.*)(</\\1)", "$1<![CDATA[$2]]>$3");
    The extension of this would be so I can store this stuff in a db as a list and avoid compilation on change, but please don't let this muddy the waters... :)
    Any pointers are much appreciated. Links to specific reading material, etc. I've scoured Friedl's Mastering Regular Expressions to no avail. This approach is supported by some other regex engines I've used (perl, php, ORO?) but I'm new to Java.
    TIA,
    Mark

    I've scoured Friedl's Mastering Regular Expressions to no avail.Did you look on page 209? In the book, that code sample is labelled "Building Up a Regex Through Variables in Java". That should have been a clue. ^_^
    But seriously, you're probably thinking of the interpolated strings you find in scripting languages like Perl, PHP, Ruby, etc.. But that's a feature of the language itself, not the regex engine, and Java doesn't work that way. (The $1, $2, etc., in the replacement string are processed by the Matcher class, in a very limited imitation of Perl's variable interpolation).
    However, you can fake it pretty well with String's format() method:   String regex = String.format("(%s)(.*)(</\\1)", theAlternation);
      output = output.replaceAll(regex, "$1<![CDATA[$2]]>$3"); That way, you can easily escape the dynamic part, in case it might contain regex metacharacters:   String regex = String.format("(%s)(.*)(</\\1)", Pattern.quote(theAlternation));

  • Regex replace ^

    Hi trying to find a way of removing some text from a string where there is a ^ followed by either a number of character.  So I want to remove ^6 or ^h or whatever.  Trying the following:  var colorPattern:RegExp = /^[A-Za-z0-9]/g; match.replace(colorPattern,"");  I have tested the regex in a tester which seemed to work but this just returns match with the original text.  This is my first time using regex so any help would be appreciated.  Cheers Peter

    OK, figured it out. Here it is if anyone else needs to know
    <h1>(.+)</h1>
    <div id="background">
    replace with
    <div id="background">
    <h1>$1</h1>

  • Regex replacing multiple characters in string.

    I have been working through the Java regex tutorial and tried to modify one of the programs for my own use. Basically, I want to take a string and convert the chatracters A to T, T to A, C to G and G-C.
    I produced the rather crude program below, but of course it doesn't work. A could be converted to T and back again before the program terminates.
    I know that the code to do this correctly is probably quite complex, so could anyone point me in the direction of a tutorial which will help me to do this?
    This aside, I take it that if I am looking for multiple matches of characters which won't give the problem already indicated above, my code is too bloated anyway. Say, for example, instead of wanting to replace A to T, T to A, C to G and G-C, I wanted to replace dog-cat, horse-donkey - lion, tiger , cat-mouse. My code will work for this, but I am sure that it could be compressed a lot. Surely I would not need all the lines of code to do this?
    Thanks for any help,
    Tim
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    import java.io.*;  // needed for BufferedReader, InputStreamReader, etc.
        /** A Java program that demonstrates console based input and output. */
         class dna {
            // Create a single shared BufferedReader for keyboard input
            private static BufferedReader stdin =
                new BufferedReader( new InputStreamReader( System.in ) );
            // Program execution starts here
            public static void main ( String [] args ) throws IOException
                // Prompt the user
                System.out.print( "Type your DNA sequence: " );
                // Read a line of text from the user.
                String DNA = stdin.readLine();
                DNA = DNA.toUpperCase();
                String DNA2 = DNA;
                //calculate reverse complement
                Pattern A = Pattern.compile("A");
                Pattern T = Pattern.compile("T");
                Pattern C = Pattern.compile("C");
                Pattern G = Pattern.compile("G");
                Matcher AA = A.matcher(DNA);
                DNA = AA.replaceAll("T");
                Matcher TT = T.matcher(DNA);
                DNA = TT.replaceAll("A");
                Matcher CC = C.matcher(DNA);
                DNA = CC.replaceAll("G");
                Matcher GG = G.matcher(DNA);
                DNA = GG.replaceAll("C");
                // Display the input back to the user.
                System.out.println( "DNA input             : " + DNA2);
                System.out.println ("Complementary sequence: " + DNA);
        }

    TimM wrote:
    Thanks a lot!!! Can't believe you managed all that with so few lines of code.You're welcome.
    Must be great to know what you are doing :-)
    Thanks again,
    TimAfter being a bit more familiarised with the methods of String, you'll be able to do this in no time, I'm sure!

  • Need help with Regex replace in DW

    I'm converting a document to HTML that came with mixed case image filenames.  Our standards say that all filenames must be lowercase, so I used a CF script to convert them, but the src links in the HTML doc tag still shows the uppercase name.
    I planned to do something like this:
    Find:
    <img src="(.+?)" alt="(.+?)" />
    Replace:
    <img src="$1" alt="$2" />
    but I can't seem to get it to lowercase the $1 argument.  I've tried \L and all the normal things, but I just get the literal replacement.  Any tips?

    Use Regular Expressions
    "Big_Slick" <[email protected]> wrote in
    message
    news:f0ip6t$ee8$[email protected]..
    >I need to know how to Find/Replace for more than one tag
    at a time
    >(I.E--Search
    > for <table>, <tr>, and <td> or
    and and remove the code) As it is
    > right
    > now, it will only search for one tag at a time but I
    would assume DW can
    > handle
    > this kind of thing.
    >
    > Thanks in advance.
    >

  • Regex replacement syntax in editor

    Hello all,
    for the life of me I can't remember (nor find in the documentation) the syntax for doing string replacements using regular expressions in the CVI editor.
    Something like:
    Search: ({[~,]*},
    Replace: (\1, A10,
    Would change the string "(aei1," to "(aei1, A10,"
    But what are the delimiters to use ? {} like in my example ? And what's the variable ? \1 ? $1 ?
    I just can't find it.
    In sed syntax:
    sed -e "s/(\([^,]*\),/(\1, A10,/"

    No, it's not in there. Here's the relevant doc from sed for what I want:
           s/regexp/replacement/
                  Attempt to match regexp against the pattern space.  If successful, replace that portion  matched  with  replacement.   The replacement  may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
    I'm sure I've used it with the CVI editor before.

  • Replacing multiple regexes at once

    Hi everybody,
    I'm trying to set up a regex replace that functions like this:
    String1 replaces String2
    String3 replaces String4
    and I'm curious, is there a way to set up a single call, like this:
    replaceAll( "[String1|String3]", "[String2|String4]" )
    where the one call can know which one to replace?
    I know it's possible to do:
    replaceAll( string1, string2 )
    replaceAll( string3, string4 )
    but I'm pretty sure I've seen this somewhere. Can anyone give me a hand with the syntax, if it's viable?
    Thanks,
    Jezzica85

    jezzica85 wrote:
    Thanks guys,
    I guess that means I'll have to try the old standby of one at a time. Hmm, I wonder where I thought I saw that? Maybe I dreamed it... :)
    Jezzica85You could do something like this:
    public static String multipleReplacements(String text,
            String[] patterns, String[] replacements) {
        // assume patterns.length == replacements.length
        for(int i = 0; i < patterns.length; i++) {
            text = text.replaceAll(patterns, replacements[i]);
    return text;

  • Search and replace all spaces between quotes with uderscore

    Hello,
    I'm new on Powershell and I'm trying to make the script that:
    searches over file and replaces all the spaces which have been found between quotes;
    removes all quotes (except these which has not value eg "").
    For example:
    Source file:
    string3=string4 string="string1 string2 string23" string8="" string5="string7 string8"
    Destination file:
    string3=string4 string=string1_string2_string23 string8="" string5=string7_string8
    I have been created script that searches the data correctly
    $file="c:\scripts\mk.txt"
    $data=Get-Content $file
    $1
    $regex = [regex]@'
    (?x) # ignore pattern whitespace option
    (?<test>(["'])(?:(?=(\\?))\2.)*?\1)
    $data |% {
    if ($_ -match $regex){
    new-object psobject -property @{
    test = $matches['test']
    }#when adding "|select-object test" I'm getting the correct data
    I have stopped here on search / replace operation (from space to underscore, and by removing the quotes leaving the empty quotes non touched). Can you help me to finish the script?
    Thanks!

    Try this.  It uses the [Regex] Replace static method, with a script block delegate:
    $file="c:\scripts\mk.txt"
    $data=Get-Content $file
    $regex = '(\S+=[^"\s]+)|(\S+="[^"]+")'
    $delegate = { $args[0].value.replace(' ','_') -replace '"(.+)"','$1' }
    $data |% { [regex]::Replace($_,$regex,$delegate) }
    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

  • Search and replace, with pattern matching using a table

    I need to inspect a data stream and standardise a set of codes. I need to
    1. Match any patterns with a dash character and remove the dash and any following characters, eg BN-S -> BN, BN-SH -> BN, ARG-22 -> ARG, etc.
    2. Make a few specific word for word replacements, eg, PAEDSH -> PAED
    This is easy to hard code but can it be done using a table of regex substitutions? Can anyone give a pointer or link to some example code? The couple of regex replacement examples I've found use regex to locate but hard code substitutions.
    Thanks

    You could store all your patterns in a Map. Then iterate over the map inserting the patterns into a regex.

  • Regex with xml for italicize or node creation

    Okay
    Guess it's a complex situation to explain.
    I am working on the text content of xml documents again. made quite a lot of progress with some of my other regex requirements.
    I am looking for a specific set of words to italicize say for example 'In Vitro'
    String Regex = "In Vitro";
    // here I get the text of a particular xml Node which is a text node
    String paragraph = nl.item(i).getNodeValue();
    //Value of paragraph before replace is "and lipids and In Vitro poorlysoluble(in water"
    String replace = "<Italic>In Vitro<Italic/>";
    String paragRepl = m.replaceFirst(replace);
    //Value of pargRepl after regex replace is "and lipids,?;:!and <Italic>In Vitro<Italic/> poorlysoluble(in water"
    //then I update the content of the node again
    nl.item(i)..setNodeValue(paragRepl);
    // save the xml documentthe italic tag is interpreted by our custom stylesheet to display "In Vitro" in italics, the reason it cannot do that is because the the character entities of the < and > have been put in the text content of the node i.e &lt; and &gt;. On closer examination of the text of the node after the document was saves, it appeared this way " &lt;Italic>In Vitro&lt;Italic/> ". For some reasom the greater than sign came out okay, but still no point, It didn't actually create a new node. I am not sure how you can automatically put tags around specific text you find in xml documents using regex, or If I have to create a new node at that point.
    it's xml so these entities come into picture.
    any help is greatly appreciated, in short I need to just add a set of tags to a particular regex I find in an xml document,
    thanks in advance
    Jeevan

    okay i am getting closer to the solution as there is an api call from another proprietary language that would do this
    but as I loop through the xml document, it keep selecting the text "In Vitro" even after it has been italicized.
    So I guess my next challenge is getting a regex which looks for "In Vitro" but not italicized
    For regex so far I have seen case insensitive handling, I have seen for italics
    basically if I I can get my hands on a regex for example
    String regex = "In Vitro && Not Italic"
    any help is appreciated
    Jeevan

  • Replace specific string in a file while reading it

    Hi,
    I am kind of an in between learner & intermediate programmer. I am trying to replace a specific string pattern in a file by reading it using BufferedReader & BufferedWriter. I tried it for 2 days. Can anyone give me a piece of code which reads a file line by line and replaces the specific string format with other? I tried replaceAll(regex,replacement). Nothing is working.

    Hi Torajirou,
    Thanks for the code. But, looks like
    . But, looks like it is giving me a static method
    problem. It says a static method should beaccessed
    in a static way(f1.replace in main). Here is the
    code. Can you help me with this?
    import java.io.*;
    public class FileReadWrite1{
    public static void replace(File file, Stringregex,
    String replacement) throws
    FileNotFoundException,IOException {
         if (file == null) {
    throw new IllegalArgumentException("File should
    uld not be null.");
         if (!file.exists()) {
    throw new FileNotFoundException ("File does not
    not exist: " + file);
         if (!file.isFile()) {
    throw new IllegalArgumentException("Should not be
    be a directory: " + file);
         if (!file.canWrite()) {
    throw new IllegalArgumentException("File cannot be
    be written: " + file);
    File tempFile = File.createTempFile("temp",
    mp", "temp");
    BufferedReader reader = new BufferedReader(new
    (new FileReader(file));
    BufferedWriter writer = new BufferedWriter(new
    (new FileWriter(tempFile));
    try{
    while (true) {
    String line = reader.readLine();
    if (line == null) {
    break;
    line = line.replaceAll(regex,replacement);
    writer.write(line);
    writer.newLine();
    writer.close();
    reader.close();
    file.delete();
    tempFile.renameTo(file);
    }catch (Exception e) {
         System.out.println("Exception occured :"+e);
    public static void main(String[] args) {
         FileReadWrite1 f1 = new FileReadWrite1();
         File file = new File("C:\\Temp\\src1.txt");
         String regex = "us.mi.state";
         String replacement = "us.tx.state";
         f1.replace(file,regex,replacement);
    }when I wrote it, I was sober and bored
    now I'm drunk and amused
    god
    did you try and remove the magic "static" keyword ?
    I wrote it static because I didn't refer to any class
    member when I wrote it and that's what I felt
    compelled to do, though I feel nowadays writing
    anyting static suggests some design mistake
    somewhere
    blahblahblahblahblahs/blahblahblahblahblah/blahblahblahblahblahblah
    (forgot a "blah")

Maybe you are looking for

  • ASP ,JSP in one file

    Hi...i need to include ASP code in the JSP file...is it possible...if so how to run that file since each requires a different server.....plz advice me... Thanks in advance...

  • How to Get Value from a Variable to restrict a Key Figure ?

    I am trying to make a query in Query designer, we are running BI7. I have a user entry veriable on the Fiscal year period, the user will need to enter a period they want the report for, i.e. 07.2008 for July 2008 report and 04.2008 for April 2008 rep

  • Tomcat can't locate my class files

    Tomcat can't locat my class file for my javabeans in a jsp page. (Tomcat 4.1.24/windows XP) I have created inside of webapps a directory thesis and also the WEB-INF\classes and WEB-INF\lib. I don't know what should I put in web.xml so it is basically

  • Post Handling Unit Physical Inventory Document

    Hi Experts, Is there a BAPI or a FM to post/clear the physical inventory document created for a handling unit? The manual transaction used in this is 'HUINV05'. If there is no BAPI or FM, kindly help in finding a way how to capture in recording the c

  • Backing up Lightroom photos with external hard drives. Which software? Thank you.

    I watched Julieanne Kost's backup strategies. I have 2 TB Seagate GoFlex external drives to work with my Macbook. One external for photos, LR catalog backup, supporting files and the 2nd to backup external drive one. I also have a 3rd external drive