Regular expression (regex) help!

I am trying to write a correct regular expression but am having difficulties.
I have a webpage saved as a string and want to extract all the links (urls) from the webpage string.
The trouble I am having is that some websites surround links using double quotes " " and some use single quotes ' ' around links in html:
Double quotes around url:
<a href="www.example.com"></a>
And single quotes:
<a href="www.example.com"></a>
So far I have a regex which extract links if they are surrounded with double quotes (see below), however if a page uses single quotes it screws up ;)
Pattern.compile("<a\\s+href\\s*=\\s*\"?(.*?)[\"|>]",  Pattern.CASE_INSENSITIVE);So is there a way to say look for double quotes OR single quotes?
Many thanks
null

There's no need to escape the single-quote (or apostrophe) in a regex. The only reason it was necessary to escape the double-quote (or quotation mark) is because the regex was written in the form of a String literal. Neither the single-quote or the double-quote has any special meaning in regexes.

Similar Messages

  • Quick regular expression question/help

    Can someone help me with two regular expressions I need. I could spend a while trying to figure it out myself, however times short and I really would like to get a fool proof optimal solution (my attempt would be buggy).
    Sample sentence
    The population, is projected to reach 200,000, or more (by 2020).[7] This is {dummy} text.
    The first regular expression
    I need all brackets and every thing between them to be removed from a sentence.
    Brackets such as: ( ), [ ] and { } .
    I.e. Given the above sentence the following would be returned:
    The population, is projected to reach 200,000, or more. This is text.
    The second regular expression
    If a word has a trailing comma character I need to add a whitespace between the word and the comma.
    I.e. Given the sentence returned from the first regular expression, this regex would return:
    The population *,* is projected to reach 200,000 *,* or more. This is text.
    Many thanks to anyonewho can help me with this!
    Edited by: Myles on Jan 18, 2008 8:12 AM

    http://java.sun.com/docs/books/tutorial/extra/regex/index.html
    http://www.regular-expressions.info

  • Question about Regular Expressions, please help!

    I have created an app which reads files and extracts certain data using regular expressions in JDK1.4 using Pattern and Matcher classes.
    However it needs to run on JDK1.2.2 (dont ask). The regular expression classes are not available in 1.2.2 (the Pattern and Matcher class) so i am looking for something similiar which i can use?
    I need something that loops through all the matches found in the file like how Matcher works i.e.
    while (matcher.find())
    // do this
    Help!

    http://jakarta.apache.org/regexp/

  • Regular Expressions, please help.

    Hello everyone.
    Can I get a Java Regular Expression to match with a word of the following language...
    Start --> Expression;
    Expression --> [0-9]+;
    Expression --> Expression * Expression;
    So the regexp should match with words like:
    4;
    4664;
    4 * 763;
    5 * 4534 * 23534;
    04 * 002 * 1 * 10 * ...
    I would be very happy, if anyone could help.

    I dont think that I need to learn anything more.
    I am sure it is not possible to make, what I want.
    I want to build a compiler.
    I just finished the abstract syntax of my language. Now I need a possibility to compile the concrete syntax of my language to the abstract one.
    But I think, it is not possible with regular expressions.
    Cause I need possibility to match a syntax of type chomsky 2.
    I think regular expressions only match chomsky 3 languages.
    But the "Backtracking"-mechanism of Java RegExp could do this.
    I am not sure with this.
    If you have any ideas please post.

  • Stripping HTML thru regular expression(pls help)

    Hi all..
    I've been trying to use the regular OROMatcher-1.1 expression package downloaded from apache.org.
    it works well with my program but i m having problems building correct regular expression to strip off HTML tags.
    can any of u help me build an expression tha strips of ALL html tags including those with funny spaces such as:
    <a href = "www.here.com">click me</a>
    do help pls. i've tried for ages and its driving me mad

    Hi,
    Wont go into much details but the simplest way to do that would be using XML technology. Try using SAX or DOX whatever you feel comfortable with. I think SAX would be a better choice. For details visit
    http://java.sun.com/xml/?frontpage-spotlight
    /khurram

  • Regular Expression query help.

    Hi, your help will be appreciated,
    I need to replace the a string's pattern with some special characters.
                            Input String := 'mytext*% align="quot;leftquot;><font face="quot;Arialquot;"> *% align="quot;leftquot;"><this is text><p this to replace >'
                            Output String := 'mytext@ align="quot;leftquot;$<font face="quot;Arialquot;"> @ align="quot;leftquot;"$<this is text><p this to replace >'
    Replacing Rules:
    1)              '*%'             should be replaced by '@'
    2)              '>'            should be replaced by $ (only the EVERY FIRST occurrence after the character @ )
    Tried with REGEXP but looks like need your help!
    Thx
    DJ.

    Hi, DJ,
    DeeJay wrote:
    Perfect Frank. Thanks for your help.
    Could you please explain how it is working? you know, these Regexps are hurdle for me always in understanding.Not just you; regular expression can be very cryptic.
    We're saying "replace '*%x>' with '@x$', where x is 0 or more characters from the set of all characters except '>'.
    {code}
    SELECT     REGEXP_REPLACE ( 'mytext*% align="quot;leftquot;> *% align="quot;leftquot;"><this is text>'
              , '\*'     || -- aserisk (special character, must be escaped)
              '%'     || -- percent sign
                   '('     || -- begin \1 definition
                   '['     || -- begin set definition
                   '^' || -- "The set consiting of all characters EXCEPT ...
                   '>' || --     ... the greater-than sign"
                   ']'     || -- end set definition
                   '*'     || -- 0 or more characters from the preceding set
                   ')'     || -- end \1 definition
                   '>'     -- greater-than sign
              , '@\1$'
              )     AS txt
    FROM     dual;

  • Java Regular Expression Need Help

    I want regular Expression that accept all numbers and it should skip the numbers if it comes in {}

    No this is not workingThen you need to be MUCH clearer as to exactly what you are trying to acheive...
    We aren't mind readers... try posting the string you are parsing and the exact result that you want to get

  • Keystroke regular expression (regex) problem

    I have a text field that should only permit the following values: "V", "X" or "NA" (all uppercase).
    I'm fairly OK with javascript in pdf and I have a script that validates these when the field losses focus.
    What would be much better is to have a keystroke script that only permits these values to be entered. I've found elsewhere in the forums a useful starting point here:
    var re = /^[A-Za-z0-9 :\\_]$/;
    event.change = event.change.toUpperCase();
    if (event.change.length >0) {
    if (event.willCommit == false) {
        if (!re.test(event.change)) {
            event.rc = false
    but I've been unable to come up with anything for the regexp pattern that works reliably (these are definitely not my strong point).
    Grateful thanks for any help.

    You can apply the RegEX to the keystroke and restrict the keystroke values that be entered or you can use the validation tab to enter a script to restrict the input to specific values.
    The script you are using only checks the entered character so it will not test the string "NA" which is 2 characters.
    Another approach for such a limited number of input options would be a combo box  or drop down box where the options are space, "V", "X", and "NA". The value of the field would then be limited to a space (no selection made), a V, X, or NA.

  • Regular expression to substring

    Hi Folks;
    I need to extract dynamically substrings from an attribut A.
    The varchar2 attribut A is defined like that : "LXXXXX/111111(+),LXXXXX/111111(-),LXXXXXX/111111,etc..." Always the same serie.
    I need to store all "111111(+)" "111111(-)" "111111" of the same record in a new attribut named B.
    I feel the regular expressions could help me but i'm not a very good...
    Thanks for your help . ^^

    Try this,
    SELECT LTRIM (REGEXP_SUBSTR (attrA,
                                 '/[^,]+',
                                 1,
                                 LEVEL),'/')
      FROM T
    CONNECT BY LEVEL <= LENGTH (REGEXP_REPLACE ( attrA, '[^/]'))
    Example
    SQL> WITH T AS (SELECT 'LXXXXX/111111(+),LXXXXX/111111(-),LXXXXXX/111111,' attrA FROM DUAL)
      2  SELECT LTRIM (REGEXP_SUBSTR (attrA,
      3                               '/[^,]+',
      4                               1,
      5                               LEVEL),'/') exprssn
      6    FROM T
      7  CONNECT BY LEVEL <= LENGTH (REGEXP_REPLACE ( attrA, '[^/]'));
    EXPRSSN
    111111(+)
    111111(-)
    111111
    SQL> G.

  • Regular expression usage question

    Hi there.
    I have a 200 bytes EBCDIC variable record which I need to break down into fields. Fields are positional and are either text, binary numbers, packed-decimal and 64bytes long numbers.
    My question is. Can regular expression handle this complex data.
    I want to isolate each field into their corresponding format. EBCDIC into ASCII text, binary into java Integer and so on.
    The reason for using reqular expression is because the record format could change and regular expression would be easier to modify without having to change the code.
    Your words of advice are highly appreciated.
    Please advice.
    Regards,
    Ulises

    Regular expressions? I don't think so.
    If you have a situation where positions 1-3 might be a binary number like client number, and the format might change so it moves to positions 12-14, then you could certainly write a record-format class to encapsulate that sort of information. In fact that would be a very good idea. But I can't imagine how a regular expression would help in getting a number out of three bytes, for example.

  • Need help with regular expression

    I'm trying to use the java.util.regex package to extract URLs from html files.
    The URLs that I am interested in extracting from the HTML look like the following:
    <font color="#008000">http://forum.java.sun.com -
    So, the URL is always preceeded by:
    <font color="#008000">
    and then followed by a space character and then a hyphen character. I want to be able to put all these URLs in a Vector object. This doesn't seem like it should be too difficult but for some reason I can't get anywhere with it. Any help would be greatly appreciated. Thanks!

    hi gupta am not sure of the java syntax but i can tell u about the regular expression...try this....
    <font color="#008000">(http:\/\/[a-zA-Z0-9.]+) [-]
    i dont know the java methods to call...just the reg exp...
    Sanjay Acharya

  • Help regarding regular expression

    HI All ,
    Please see the following string
    String s = "IF ((NOT NUM4 IS ALPHABETIC ) AND NUM3 IS ALPHABETIC-UPPER AND (NUM5 IS GREATER OR EQUAL TO 3) AND (NUM5 IS NOT GREATER THAN 3) AND (NUM3 GREATER THAN 46) AND (NUM5 GREATER THAN NUM3) OR NUM3 LESS THAN 78) .";
    My problem is: i want to capture the part of this line which contains "ALPHABETIC ,ALPHABETIC-UPPER for ex :NOT NUM4 IS ALPHABETIC , NUM3 IS ALPHABETIC-UPPER.from that I have to capture the word num4 , num3 which are in these phrases only ;from the whole string whereever it exists along with the phrase,Can any one help me out by suggesting something.num4 and num3 are variable names

    I suspect you're right, Sabre, but I can't resist...
    import java.util.regex.*;
    * A rewriter does a global substitution in the strings passed to its
    * 'rewrite' method. It uses the pattern supplied to its constructor, and is
    * like 'String.replaceAll' except for the fact that its replacement strings
    * are generated by invoking a method you write, rather than from another
    * string. This class is supposed to be equivalent to Ruby's 'gsub' when given
    * a block. This is the nicest syntax I've managed to come up with in Java so
    * far. It's not too bad, and might actually be preferable if you want to do
    * the same rewriting to a number of strings in the same method or class. See
    * the example 'main' for a sample of how to use this class.
    * @author Elliott Hughes
    public abstract class Rewriter
      private Pattern pattern;
      private Matcher matcher;
       * Constructs a rewriter using the given regular expression; the syntax is
       * the same as for 'Pattern.compile'.
      public Rewriter(String regularExpression)
        this.pattern = Pattern.compile(regularExpression);
       * Returns the input subsequence captured by the given group during the
       * previous match operation.
      public String group(int i)
        return matcher.group(i);
       * Overridden to compute a replacement for each match. Use the method
       * 'group' to access the captured groups.
      public abstract String replacement();
       * Returns the result of rewriting 'original' by invoking the method
       * 'replacement' for each match of the regular expression supplied to the
       * constructor.
      public String rewrite(CharSequence original)
        this.matcher = pattern.matcher(original);
        StringBuffer result = new StringBuffer(original.length());
        while (matcher.find())
          matcher.appendReplacement(result, "");
          result.append(replacement());
        matcher.appendTail(result);
        return result.toString();
      public static void main(String[] args)
        String s = "IF ((NOT NUM4 IS ALPHABETIC ) " +
                    "AND NUM3 IS ALPHABETIC-UPPER " +
                    "AND (NUM5 IS GREATER  OR EQUAL TO 3) " +
                    "AND (NUM5 IS NOT GREATER THAN 3) " +
                    "AND (NUM3 GREATER THAN 46) " +
                    "AND NUM645 IS ALPHABETIC " +
                    "AND (NUM5 GREATER THAN NUM3) " +
                    "OR NUM3 LESS THAN 78 " +
                    "AND NUM34 IS ALPHABETIC-UPPER " +
                    "AND NUM92 IS ALPHABETIC-LOWER " +
                    "AND NUM0987 IS ALPHABETIC-LOWER) .";
        String result =
          new Rewriter("(NUM\\d+) +IS +(ALPHABETIC(?:-(?:UPPER|LOWER))?)")
            public String replacement()
              String type = group(2);
              if (type.endsWith("UPPER"))
                return "Character.isUpper(" + group(1) + ")";
              else if (type.endsWith("LOWER"))
                return "Character.isLower(" + group(1) + ")";
              else
                return "Character.isLetter(" + group(1) + ")";
          }.rewrite(s);
        System.out.println(result);
    }

  • Help!!!!! Regular Expressions!!

    I am trying to use Regular Expressions, for parsing. For that the pakage required is
    java.util.regex.*;
    I am also using the import statement in a sample code. But compiling it, gives an error,
    ERRORS:
    Replacement.java:6: package java.util.regex does not exist
    import java.util.regex.*;
    ^
    I have also set the path to C:\jdk1.4\bin
    I have also set the classpath to C:\jdk1.4\lib
    I don't know, Why it doesn't recognise the java.util.regex package
    please help!!
    gaurav_k1

    Have you checked if the regex package is part of the
    JDK1.4? I can't find it. What classes does it
    implement?Yeah, since 1.4
    http://java.sun.com/j2se/1.4/docs/api/java/util/regex/package-summary.html
    I'm not sure what the original problem could be, possibly using a previously installed jre? If you had one previously installed, check the classpaths and uninstall any old jre (some forget that thinking they only need to remove the jdk). Could you give us anymore hints?

  • Need help in unix regular expressions

    Hi All,
    I'm new to shell scripting. Please help me in achieving this
    I am trying to a find regular expression that need to pick a file with begin with the below format and mask variable is called in xml file.
    currently the script accepts:
    mask="CLIENT_ID+'_ADHSUITE_IN_'+date2str(now,'MMddyy','US/Eastern')+'.txt'"
    But it should accept in the below format
    2595_ADHSUITE_IN_ANNWEL_030309_2009-02-10_15-12-46-000_648.TXT715.outpgp_out
    where CLIENT_ID=2595. How to place wild card character '*' in the below to accept file in the above format. here is what i made changes.
    mask="CLIENT_ID+'_ADHSUITE_IN_'*+date2str(now,'MMddyy','US/Eastern')*+'.TXT'*+'.outpgp_out'"
    Please help.
    Thanks

    I believe your statement is being passed over twice:
    First Pass: (This is done by something like javascript)
    CLIENT_ID+'_ADHSUITE_IN_'+'.*'+date2str(now,'MMddyy','US/Eastern')+'.*'+'.TXT'+'.*'+'.outpgp_out'In this pass the variables and functions that are enclosed in literals are processed:
    (1) CLIENT_ID is replaced by 2595 or whatever is current value is:
    (2) date2str(now,'MMddyy','US/Eastern') gets replaced by 040609 (if the current time now is 4th april 2009).
    So at the end of this first pass we have a string:
    2595_ADHSUITE_IN_.\*040609.\*.TXT.*.outpgp_outThis string at the end of the first pass is a Posix basic regular expression. (ref: [http://en.wikipedia.org/wiki/Regular_expression] ) accessed at time of post).
    This is the string I put in the Regular Expression text box on [http://www.fileformat.info/tool/regex.htm]
    and it matches "2595_ADHSUITE_IN_ANNWEL_040609_2009-01-27_17-02-28-000_631.TXT715.outpgp_out" for me (though I prefer my egrep test).
    I hope this is somewhat clearer. Remember I have very little information about your system/application and I make big guesses.
    NB: (I should thank Frits earlier for pointing my sloppiness between wildcards (for eg unix shell filename expansion) and regular expressions).
    For the second pass this used to compared other strings to see

  • Help needed regarding regular expressions

    hello
    i need to write a program that recieves a matematical expression and evaluates
    it...in other words a calculator :)
    i know i need to use regular expressions inorder to determine if the input is legal or not ,but i'm really having trouble setting the pattern
    the expression can be in the form : Axxze2223+log(5)+(2*3)*(5+4)
    where Axxze2223 is a variable(i.e a combination of letters and numbers.)
    where as: l o g (5) or log() or Axxx33aaaa or () are illegal
    i tried to set the pattern but i got exceptions or it just didnt work the way i wanted it .
    here's what i tried to do at least for the varibale form:
    "\\s*(*([a-zA-Z]+\\d)+)*\\s*";
    i'm really new to this...and i can't seem to set the pattern by using regular expressions,how can i combine all the rules to one string?
    any help or references would be appreciated
    thanks

    so i'll explain
    let's say i got token "abc22c"(let's call it "token")
    i wan't to check if it's legal
    i define:
    String varPattern = "\\s*[a-zA-Z]+\\d+\\s*";If you want to check a sequence of ASCII characters, longer than one, followed by a single digit, the whole possibly surrounded by spaces -- yes.
    >
    now i want to check if it's o.k
    so i check:
    token.matches(varPattern);
    am i correct?Quite. It's better to compile the Pattern (Pattern.compile(String)), create a java.util.regex.Matcher (Pattern#matcher(CharSequence)), and test the Matcher for Matcher#matches().
    (Class.method -> static method, Class#method -> instance method)
    >
    now i'm having problem defining pattern for log()
    sin() cos()
    that brackets are mandatory ,and there must be an
    expression inside
    how do i do that?First, I'd check the overall function syntax (a valid name, brackets), then whether what's inside the brackets is a valid expression (maybe empty), then whether that expression is valid for that function (presumably always?).
    I might add I'm no expert on parsing, so that's more a supposition than a guide.

Maybe you are looking for

  • Error in Asset Transfer

    Hi Sappers: I am facing the following issue in intercompany asset transfer: Asset numbers: 910000001608 and 910000001609 need to be transferred. The transfer will be moving the assets from company code 9001/cost center 1100105202 to company code 2001

  • Clients connection to Server

    Hi all, At one of client installation, I am required to every time use change server and give server ip address. Whenever I login next to SAP B1 2007 B, it will not display company list, and I have to use Change server option everytime and change it

  • Acrobat Toolbar Issue

    We are currently using Adobe Acrobat X Pro in a Windows 7, Citrix Xendesktop environment.  I believe our engineer installed an older version of Acrobat then upgraded to X.  The IE Add-in for Acrobat is causing a number of issues with other IE add-ins

  • 10.6.3 and CS4 problems

    Further to the other thread on CS3, I've just had the same problem with CS4. I was using Illustrator for a couple of days with 10.6.3, also Photoshop and Indesign. Then today I did something in Illustrator and it crashed, whereupon none of the CS4 pr

  • Why doesn't the set interface define a get method?

    I think it'd be nice to have.