Quick regular expression question/help

Can someone help me with two regular expressions I need. I could spend a while trying to figure it out myself, however times short and I really would like to get a fool proof optimal solution (my attempt would be buggy).
Sample sentence
The population, is projected to reach 200,000, or more (by 2020).[7] This is {dummy} text.
The first regular expression
I need all brackets and every thing between them to be removed from a sentence.
Brackets such as: ( ), [ ] and { } .
I.e. Given the above sentence the following would be returned:
The population, is projected to reach 200,000, or more. This is text.
The second regular expression
If a word has a trailing comma character I need to add a whitespace between the word and the comma.
I.e. Given the sentence returned from the first regular expression, this regex would return:
The population *,* is projected to reach 200,000 *,* or more. This is text.
Many thanks to anyonewho can help me with this!
Edited by: Myles on Jan 18, 2008 8:12 AM

http://java.sun.com/docs/books/tutorial/extra/regex/index.html
http://www.regular-expressions.info

Similar Messages

  • Quick regular expression question!

    If I have a String such as:
    "This is a sentence.[1] This is another."
    OR
    "This is a sentence.(1) This is another."
    I.e. A String has a full stop within it and a non alphanumeric character immediately after it (without whitespace between the full stop and the character)
    How can I insert a whitespace character between the full stop and the non alphanumeric character (such as a bracket in the above examples)?
    So the above Strings would be transformed into:
    "This is a sentence. [1] This is another."
    "This is a sentence. (1) This is another."
    Thanks

    If I understand what you're asking...
    str = str.replaceAll("\\.([^\\p{Alnum}\\s])", ". $1");
    "This is sentence.[1] This is another.(1) This is a third. [1] This is a fourth.& This is a fifth. This is the last."
    "This is sentence. [1] This is another. (1) This is a third. [1] This is a fourth. & This is a fifth. This is the last."For more info:
    http://java.sun.com/docs/books/tutorial/extra/regex/index.html
    http://www.regular-expressions.info/

  • Help: Regular Expression question??

    Hello,
    How can I extract the following content using Java Regular expression?
    <tr bgcolor="#333333">
         <td class="title" colspan="4" height="18"> <b>SUPER_1</b> - SUPER_2</td>
    </tr>
    <tr bgcolor="#333333">
         <td class="match-light" width="45" height="18"> </td>
         <td class="match-light" colspan="3" width="286" align="right">March 19 </td>
    </tr>
    <tr>
         <td colspan="4" height="1"></td>
    </tr>
    <tr bgcolor="#cfcfcf">
         <td width="45" height="18"> FT</td>
         <td width="118" align="right">SUPER_3</td>
         <td width="50" align="center"><a class="scorelink" target="details" onclick="showDetails();">999 - 888</a></td>
         <td width="118">SUPER_4</td>
    </tr>From the above contents, How can I define a regular expression for extract the "*SUPER_1*", "*SUPER_2*", "*March 19*", "*SUPER_3*", "*999*", "*888*" and "*SUPER_4*" ????
    Please help.
    Best regards,
    Eric

    Kayaman wrote:
    Why not use a better way than regex, like an actual HTML parser (or XML if you have it well-formed)? People seem to love parsing (or rather, asking help how to parse) HTML with regex for some unknown reason.Indeed.
    Read this (hilarious):
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

  • Question about Regular Expressions, please help!

    I have created an app which reads files and extracts certain data using regular expressions in JDK1.4 using Pattern and Matcher classes.
    However it needs to run on JDK1.2.2 (dont ask). The regular expression classes are not available in 1.2.2 (the Pattern and Matcher class) so i am looking for something similiar which i can use?
    I need something that loops through all the matches found in the file like how Matcher works i.e.
    while (matcher.find())
    // do this
    Help!

    http://jakarta.apache.org/regexp/

  • Regular Expressions - Questions to SME

    Ralph Benzinger presented an online meetup on the topic of Regular Expressions.  The presentation (slides only) can be found <a href="http://www.sdn.sap.comhttp://www.sdn.sap.comhttp://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/docs/library/uuid/866072ca-0b01-0010-54b1-9c02a45ba8aa">here</a>
    Unfortunately the recording is not going to be available, but Ralph has been generous enough to agree to answer questions posted to this sticky thread.
    cheers,
    Marilyn

    Hello Peter,
    You're welcome!
    Alas, I was unable to locate the regex documentation on help.sap.com either.  In fact, I'm not even sure it has already been updated for 2004s.  I recommend that you use the online documentation within the system, e.g., from transactions SE38 or SE80.  Do an index search for "regex", and you'll be directed to REGEX, FIND and REGEX, REPLACE, both of which have extensive subsections on regexes.
    The class cx_sy_regex is an exception class that is thrown by FIND, REPLACE and cl_abap_regex in case of an invalid regex, such as ".\1" (there is no capture group that back reference \1 can refer to).  If the pattern is known statically, the syntax check will report this error, but for statements like "FIND REGEX pat IN text.", the actual pattern is only known at runtime.
    The cx_sy_matcher class (and its subclasses) similarly indicate some invalid states, for example trying to call "cl_abap_matcher->replace_found( )" when the matcher has no current match to replace (e.g., replace_found( ) called twice in a row).
    Please let me know if I can provide some additional information.
    Regards
    Ralph

  • Regular Expression Question

    Hi all,
    I am suffering in java regular expression, and I hope you guys can help me out. I want to use the String api ".matches" to find out any string pattern like "xxxx.xxxx" where xxx can be only english word(both upper and lower case). Actually I will use this kind of expression to represent the cross join SQL statement in my java class, like "tableA.name = tableB.name", where they should be english letter only. I tried to use MyString.matches("^[A-Z] + \\. + ^[A-Z]") in my java program, but seem it doesn't work. Can you guys figure out the right expression for me ?? Many thanks
    Transistor

    Thanks for your prompt response, I tried your code, however, it doesn't work out.
    I put your code like the following:
    if ( searchCriteria.getStringPair().getValue().trim().matches("[A-Za-z]+\\.[A-Za-z]+") {...some action }.
    Seems the java program never reach this expression.
    Kindly remind that I wan to expression anything like "xxxxx.xxxxx" where xxxx can be a word.
    Myriads of thanks
    Transistor

  • Regular Expressions, please help.

    Hello everyone.
    Can I get a Java Regular Expression to match with a word of the following language...
    Start --> Expression;
    Expression --> [0-9]+;
    Expression --> Expression * Expression;
    So the regexp should match with words like:
    4;
    4664;
    4 * 763;
    5 * 4534 * 23534;
    04 * 002 * 1 * 10 * ...
    I would be very happy, if anyone could help.

    I dont think that I need to learn anything more.
    I am sure it is not possible to make, what I want.
    I want to build a compiler.
    I just finished the abstract syntax of my language. Now I need a possibility to compile the concrete syntax of my language to the abstract one.
    But I think, it is not possible with regular expressions.
    Cause I need possibility to match a syntax of type chomsky 2.
    I think regular expressions only match chomsky 3 languages.
    But the "Backtracking"-mechanism of Java RegExp could do this.
    I am not sure with this.
    If you have any ideas please post.

  • Stripping HTML thru regular expression(pls help)

    Hi all..
    I've been trying to use the regular OROMatcher-1.1 expression package downloaded from apache.org.
    it works well with my program but i m having problems building correct regular expression to strip off HTML tags.
    can any of u help me build an expression tha strips of ALL html tags including those with funny spaces such as:
    <a href = "www.here.com">click me</a>
    do help pls. i've tried for ages and its driving me mad

    Hi,
    Wont go into much details but the simplest way to do that would be using XML technology. Try using SAX or DOX whatever you feel comfortable with. I think SAX would be a better choice. For details visit
    http://java.sun.com/xml/?frontpage-spotlight
    /khurram

  • Regular Expression query help.

    Hi, your help will be appreciated,
    I need to replace the a string's pattern with some special characters.
                            Input String := 'mytext*% align="quot;leftquot;><font face="quot;Arialquot;"> *% align="quot;leftquot;"><this is text><p this to replace >'
                            Output String := 'mytext@ align="quot;leftquot;$<font face="quot;Arialquot;"> @ align="quot;leftquot;"$<this is text><p this to replace >'
    Replacing Rules:
    1)              '*%'             should be replaced by '@'
    2)              '>'            should be replaced by $ (only the EVERY FIRST occurrence after the character @ )
    Tried with REGEXP but looks like need your help!
    Thx
    DJ.

    Hi, DJ,
    DeeJay wrote:
    Perfect Frank. Thanks for your help.
    Could you please explain how it is working? you know, these Regexps are hurdle for me always in understanding.Not just you; regular expression can be very cryptic.
    We're saying "replace '*%x>' with '@x$', where x is 0 or more characters from the set of all characters except '>'.
    {code}
    SELECT     REGEXP_REPLACE ( 'mytext*% align="quot;leftquot;> *% align="quot;leftquot;"><this is text>'
              , '\*'     || -- aserisk (special character, must be escaped)
              '%'     || -- percent sign
                   '('     || -- begin \1 definition
                   '['     || -- begin set definition
                   '^' || -- "The set consiting of all characters EXCEPT ...
                   '>' || --     ... the greater-than sign"
                   ']'     || -- end set definition
                   '*'     || -- 0 or more characters from the preceding set
                   ')'     || -- end \1 definition
                   '>'     -- greater-than sign
              , '@\1$'
              )     AS txt
    FROM     dual;

  • Regular Expression Question, Repetition Operators

    These are my success entries for a field;
    123456,
    123456,123456,
    123456,123456,123456,
    123456,123456,123456,123456,
    "," seperated 6 digits can be repeated unlimited times.
    I found on documentation this; "Repetition Operators; {m,}     Match at least m times" and for my need i tried this regular expression; "^[[[:digit:]]{6},]{1,}$", but didnt worked :(
    Any comments?
    Thank you very much :)
    Tonguc

    repeating exactly 6
    {6}
    repeating at least 1
    +
    repeating at least 6
    {6,}
    ok, your problem is [ instead of (                                                                                                                                                                                                                                                    

  • Regular Expression question I think

    My application is receiving HTML as a string and I'm trying to simplify it before displaying. The string could contain one or more substrings similar to this:
    <span style="cursor:pointer" onmouseout="hideTooltip()" onmouseover="createTooltip( this,'The quartile that your firm\'s value falls into.  Each quartile contains 25% of the values in the Peer Group. The 1st Quartile is always the best.  The 4th Quartile is always the worst.', ( findPosX( this ) - 150))">Quartile</span>The text will not be consistent and it may be in the string as many as 4 times. What I'd like to do is replaceAll so that I end up with
    <span>Quartile</span>Is there a way to do this with regular expression? I've tried replaceAll("<span .*>","<span>") But that takes out everything to the end of the String. I want it to stop at >.
    Any way?

    replaceAll("<span .*?>","<span>")

  • Regular expression question (should be an easy one...)

    i'm using java to build a parser. im getting an expression, which i split on a white-space.
    how can i build a regular-expression that will enable me to split only on unquoted space? example:
    for the expression:
    (X=33 AND Y=44) OR (Z="hello world" AND T=2)
    I will get the following values split:
    (X=33
    AND
    Y=34)
    OR
    (Z="hello world"
    AND
    T=2)
    and not:
    (Z="
    hello
    world"
    thank you very much!

    Instead of splitting on whitespace to get a list of tokens, use Matcher.find() to match the tokens themselves: import java.util.*;
    import java.util.regex.*;
    public class Test
      public static void main(String[] args) throws Exception
        String str = "(X=33 AND Y=44) OR (Z=\"hello world\" AND T=2)";
        List<String> tokens = new ArrayList<String>();
        Matcher m = Pattern.compile("[^\\s\"]+(?:\".*?\")?").matcher(str);
        while (m.find())
          tokens.add(m.group());
        System.out.println(tokens);
    }{code} The regex I used is based on the assumptions that there will be at most one run of quoted text per token, that it will always appear in the right hand side of an expression, and that the closing quote will always mark the end of the token.  If the rules are more complicated (as sabre150 suggested), a more complicated regex will be needed.  You might be better off doing the parsing the old-fashioned way, with out regexes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Regular Expressions Question

    I am using regular expressions to parse through a text file generated by a set of sensors attached to a SeaBird CTD profiler, what matter is that some lines use quotes as a special marker and some dont, but I need to treat both as the same data. Here is the example:
    SeaBird Datafile (just some part):
    "# name 0 = depS: depth, salt water [m]"
    "# name 1 = t068: temperature, IPTS-68 [deg C]"
    # name 2 = sal00: salinity, PSS-78 [PSU]
    "# name 3 = sigma-t00: density, sigma-t [kg/m^3]"
    "# name 4 = flS: fluorometer, sea tech"And I am using this regular expression to read this lines and save those names:
    private static final String ColumnName = "(^\"#|^# ) name (\\d)+ = ((.+)\"$|(.+)$)";There is a possiblity that the string either starts with "# and ends with ", or that it only starts with # and no special marker to the end. Can anyone enlighten me on the correct regex because the one I posted up there only works in the case of being surrounded by quotes. Thanks in advance!
    Christian A. Sueiras

    Try this: private static final String ColumnName = "^(\"?)# name (\\d+) = (.+)\\1$";{code} You match an optional quotation mark at the beginning, and capture it in group #1.  At the end, you match whatever was captured in group #1: either a quotation mark, or nothing.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • A regular expression question

    I have a large body of text which I am breaking into individual words (as part of an experimental indexing project.)
    I can break the text into a list by making a paragraph break for every word space (a simple find and replace).
    But I want proper names to remain unbroken.
    So I am trying to write a regular expression script which will find every occurence of two contiguous words which each begin in a capital letter, and then to replace the space between the two words with an underscore.
    So Sigmund Freud becomes Sigmund_Freud.
    Does anyone know how I would write this script?
    Thanks!!!

    You don't need a script, you can do it in the interface:
    Find: (\u[-\w]+)\x{20}(?=\u[-\w]+)
    Change: $1_
    \u[-\w]+ stands for "upper-case letter followed by one or more of hyphen/word character"; here the first name.
    \x{20} stands for the space.
    followed by another \u[-\w]+, the last name. This one is in a lookahead, so the whole expression paraphrases as "find a word that starts with an upper-case letter followed by a space if it's followed by another word starting with an uc letter".
    Peter

  • Validation on text item as Regular Expression Question(Repetition Operator)

    These are my success entries for a field;
    123456,
    123456,123456,
    123456,123456,123456,
    123456,123456,123456,123456,
    "," seperated 6 digits can be repeated unlimited times.
    I found on documentation this; "Repetition Operators; {m,}     Match at least m times" and for my need i tried this regular expression; "^[[[:digit:]]{6},]{1,}$", but didnt worked :(
    Any comments?
    Thank you very much :)
    Tonguc

    Your expression is a little incorrect because you can't use repetition operators within a bracket expression. How about something like this
    "^([[:digit:]]{6},)+$"
    Regards,
    Peter

Maybe you are looking for