Question on regex

Hi,
I am currently trying to compile a regular expression that can ignore everything within < >.
From:-
<p>123</p><a>456
To:-
123456
The objective is to remove all the HTML tags. Can anybody shed some light in a regular expression that could cater for this?
Thanks.
Joseph

Note: my previous (naive) method only works if there are numbers between the tags. Check this page for details on regex patterns:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
If you're interested in parsing real html files, I suggest using a html parser:
http://java-source.net/open-source/html-parsers
Good luck.

Similar Messages

  • Ternary Operator Question (with regex).

    Hi, I was learning regex in java and wrote a program which tests a string to see if it is an email address.
    import javax.swing.*;
    public class IsEmailAddress
         public static void main(String[] args)
              String address = JOptionPane.showInputDialog("Enter an email address");
              if(address.matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+"))
                   JOptionPane.showMessageDialog(null, "It is an email address");
              else
                   JOptionPane.showMessageDialog(null, "It is not an email address");
    }The above program works correctly. But then I decided to try and make the program only one line inside the main:
    import javax.swing.*;
    public class IsEmailAddress
         public static void main(String[] args)
              JOptionPane.showInputDialog("Enter an email address").matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+") ? JOptionPane.showMessageDialog(null, "It is an email address") : JOptionPane.showMessageDialog(null, "It is not an email address");
    }I get the compilation error not a statement, what am I doing wrong? Is this even possible?
    Thanks for any help

    I would never use a piece of code like that as part of a bigger program, I wanted to see if it was possible and maybe to learn something about the ternary operator.
    Keeping in mind what you said about the ternary operator having to return something, I was able to make it work:
    import javax.swing.*;
    public class IsEmailAddress
         public static void main(String[] args)
              int a = JOptionPane.showInputDialog("Enter an email address").matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+") ? JOptionPane.showOptionDialog(null, "It is an email address","Is it an email Adress?",JOptionPane.CANCEL_OPTION,JOptionPane.PLAIN_MESSAGE,null,null,null) : JOptionPane.showOptionDialog(null, "It is not an email address","Is it an email Adress?",JOptionPane.CANCEL_OPTION,JOptionPane.PLAIN_MESSAGE,null,null,null);
    }I chose showOptionDialog because it returns a static int.
    Thank you for your help!!!

  • Question on regex Matcher (group number)

    HI, everybody
    I am writing a program on replacement like the one below.
    String regex = "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)";
    String original = "ABCDEFGHIJKL";
    String replacement = "$12";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(original);
    String result = m.replaceFirst(replacement);What I actually want is to take out the first group, in this case an "A", and append a character "2" after it.
    The result I am expecting is "A2". But the result I get is "L". For the regex engine takes it as the 12th group.
    What should I do to remove the ambiguity.
    Thanks.

    In such case, use $1\\2.

  • Help: Regular Expression question??

    Hello,
    How can I extract the following content using Java Regular expression?
    <tr bgcolor="#333333">
         <td class="title" colspan="4" height="18"> <b>SUPER_1</b> - SUPER_2</td>
    </tr>
    <tr bgcolor="#333333">
         <td class="match-light" width="45" height="18"> </td>
         <td class="match-light" colspan="3" width="286" align="right">March 19 </td>
    </tr>
    <tr>
         <td colspan="4" height="1"></td>
    </tr>
    <tr bgcolor="#cfcfcf">
         <td width="45" height="18"> FT</td>
         <td width="118" align="right">SUPER_3</td>
         <td width="50" align="center"><a class="scorelink" target="details" onclick="showDetails();">999 - 888</a></td>
         <td width="118">SUPER_4</td>
    </tr>From the above contents, How can I define a regular expression for extract the "*SUPER_1*", "*SUPER_2*", "*March 19*", "*SUPER_3*", "*999*", "*888*" and "*SUPER_4*" ????
    Please help.
    Best regards,
    Eric

    Kayaman wrote:
    Why not use a better way than regex, like an actual HTML parser (or XML if you have it well-formed)? People seem to love parsing (or rather, asking help how to parse) HTML with regex for some unknown reason.Indeed.
    Read this (hilarious):
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

  • Powershell Regex and multiple matches

    Hi all, 
    I have been messing with powershell for a while now, but this regex challenge has got me stumped.
    I have a block of text, with is an output from a previous command, the variable is called $data
    The contents look like:
    Host : 10.0.0.1
    Output : Listening on eth1
    # Host name (port/service if enabled) last 2s last 10s last 40s cumulative
    1 LOCALPC.internaldomain.net => 79.1Kb 79.1Kb 79.1Kb 19.8KB
    www.awebsite.com.au <= 3.99Mb 3.99Mb 3.99Mb 1.00MB
    Total send rate: 83.3Kb 83.3Kb 83.3Kb
    Total receive rate: 3.99Mb 3.99Mb 3.99Mb
    Total send and receive rate: 4.08Mb 4.08Mb 4.08Mb
    Peak rate (sent/received/total): 83.3Kb 3.99Mb 4.08Mb
    Cumulative (sent/received/total): 20.8KB 1.00MB 1.02MB
    ============================================================================================
    ExitStatus : 0
    I am trying to extract the two host names from that block of text, one will always be a hostname ending in internaldomain.net the other one can either be a host name or IP address. They may appear in alternate orders as well.
    I can get a single match without an issue:
    if ($data -match "(?<=\s).*?(?=.internaldomain)") {
    $pcname = $matches[0].Trim()
    or
    if ($data -match "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b") {
    $externalip = $matches[0].Trim()
    if ($data -match "([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)*?") {
    $externalhost = $matches[0].Trim()
    But I can't for the life of me extract the second match.
    Anyone got some pointers in how I can extract both matches into strings?
    Cheers,
    Pazu

    Thank you very much mjolinor, greatly appreciated and is all working now, one last question, the regex seems to be matching the number 1 prior to the first proper match, I'm not good enough with regex yet to understand why.
    An example of the data match on the one is:
    Host : 10.0.0.1
    Output : Listening on eth1
    # Host name (port/service if enabled) last 2s last 10s last 40s cumulative
    1 Work-iPhone.internal.net => 3.57Kb 3.57Kb 3.57Kb 915B
    11.111.239.45 <= 2.81Kb 2.81Kb 2.81Kb 719B
    Total send rate: 3.57Kb 3.57Kb 3.57Kb
    Total receive rate: 2.96Kb 2.96Kb 2.96Kb
    Total send and receive rate: 6.54Kb 6.54Kb 6.54Kb
    Peak rate (sent/received/total): 3.57Kb 2.96Kb 6.54Kb
    Cumulative (sent/received/total): 915B 759B 1.63KB
    ============================================================================================
    ExitStatus : 0
    In this sample data, the second line ending in cloudfront doesn't get matched, but it matches 'Total'?
    Host : 10.0.0.1
    Output : Listening on eth1
    # Host name (port/service if enabled) last 2s last 10s last 40s cumulative
    1 TARDIS.mashdinternal.net => 2.50Kb 2.50Kb 2.50Kb 640B
    server-54-240-177-103.syd1.r.cloudfront <= 169Kb 169Kb 169Kb 42.2KB
    Total send rate: 2.66Kb 2.66Kb 2.66Kb
    Total receive rate: 169Kb 169Kb 169Kb
    Total send and receive rate: 172Kb 172Kb 172Kb
    Peak rate (sent/received/total): 2.66Kb 169Kb 172Kb
    Cumulative (sent/received/total): 681B 42.2KB 42.9KB
    ============================================================================================
    ExitStatus : 0
    Any ideas?
    Cheers,
    Pazu

  • Reg Exp always returning false value

    Hi,
    Below is my code. I want to restrict the values to only alphabets and numbers and not special char.
    But the below condition fails on a positive value also. Ex: ABCD, abcd, 1234. These should be acceptable.
    Is the reg exp wrong?
    private var special_char:RegExp = /^[A-Za-z0-9]*$/;
                                  private function validateSpecialChar(inputValue:String):Boolean {
                                            if (special_char.test(inputValue))
                                                      valid = true;
                                            else
                                                      valid = false;
                                            return valid;
    Thanks,
    Imran

    Try: http://stackoverflow.com/questions/9012387/regex-expression-only-numbers-and-characters

  • Regex question

    Hi,
    I have a question regarding the regular expressions in java.
    Let's say I have the following regex: "(one)|(two)|(three)" and the following string: "two". The string obviously matches the regex, because of the "\2" group. Is there any way to determine the group number that matched the string, without having to use something like:
    for (int i = 1; i <= matcher.groupCount(); i++)
    }

    It's not top secret, the time difference is the problem.
    It's for a school project. We have to make Pascal Compiler and the first step is the Lexical Analyzer. This means that I have some regular expressions for identifiers, numeric constants, string constants and so on...
    For example the regex for the identifiers (variable name) looks like: "[a-zA-Z_][a-zA-Z0-9_]*", but the one for the key words is basically an array, like the one in my first post.
    The regular expressions work fine, but for the next part of the project I need to know the index of the key words, within the key word array (which in my case is a regular expression). So this is why I was wondering if there is any way to get the group number, without having to iterate through the whole regex.

  • Regex pattern question

    Hi,
    I'm trying to get my feet wet wtih java and regular expressions, done a lof of it in perl, but need some help with java.
    I have an xml file (also working through the sax tutorial, but this question is related to regex)that has multiple elements, each element has a title tag:
    <element lev1>10<title>element title</title>
    <element lev2>20<title>another element title</title>
    </element lev2>
    </element lev1>If I have the following pattern:
    Pattern Title = Pattern.compile("(?i)<title>([^<]*)</title>");that picks up the titles, but I can't distinguish which title belongs to which element. Basically what I want to have is:
    Pattern coreTitle = Pattern.compile("(?i)<element lev1>(**any thing that isn't an </title> tag**)</title>");looked through the tutorials, will keep looking, I'm sure it's in there somewhere, but if someone could point me in the right direction, that would be great.
    thanks,
    bp

    Just guessing, but maybe...
    Pattern.compile("(?i)<element lev1>*<title>([^<]*)</title>");
    But it seems that things like parsing with SAX (or loading to a DOM) or XPath would be much better suited to parsing out XML then regexp.

  • Simple Java regex question

    I have a file with set of Name:Value pairs
    e.g
    Action1:fail
    Action2:pass
    Action3:fred
    Using regex package I Want to get value of Name "Action1"
    I have tried diff things but I cannot figure out how I can do it. I can find Action1: is present or not but dont know how I can get value associated with it.
    I have tried:
    Pattern pattern = Pattern.compile("Action1");
    CharSequence charSequence = CharSequenceFromFile(fileName); // method retuning charsq from a file
    Matcher matcher = pattern.matcher(charSequence);
    if(matcher.find()){
         int start = matcher.end(0);
         System.out.println("matcher.group(0)"+ matcher.group(0));
    how I can get value associated with specific tag?
    thanks
    anmol

    read the data from the text file on a line basis and you can do:
    String line //get this somehow
    String[] keyPair = line.split(":")g
    System.out.println(keyPair[0]); //your name
    System.out.println(keyPair[1]); //your valueor if you've got the text file in one big string:
    String pattern = "(\\a*):(\\a*)$"; //{alpha}:{alpha}newline //?
    //then
    //do some things with match objects
    //look in the API at java.util.regex

  • How to create a regex for the question mark as a literal?

    I get:
    Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 3
    (?)?For
    (\p{Punct})?
    and (\\?)?
    and (\\'?')?

    simpatico_gabriele wrote:
    I get:
    Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 3
    (?)?For
    (\p{Punct})?
    and (\\?)?
    and (\\'?')?Sorry but, as Darryl says, you need to explain your problem a bit better because the patterns
            Pattern p0 = Pattern.compile("(\\p{Punct})?");
            Pattern p1 = Pattern.compile("(\\?)?");
            Pattern p2 = Pattern.compile("(\\'?')?");all compile and run without any exception.

  • Question on FIND IN TABLE... REGEX

    Hello,
    I am trying to do a FIND IN TABLE itab, where itab is a table of strings. One of the strings is like the following:
    <bi:item name="CHART_ITEM_1" designheight="380" designwidth="730" type="CHART_ITEM">
    I'm trying to search for it using a REGEX pattern. So here's how I did it:
    CREATE OBJECT text_regex EXPORTING pattern = 'item name*type="CHART_ITEM"'.
    So apparently, the above statement does not work, as the results returned is 0.
    What gives? Any help would be much appreciated. Thanks in advance!

    Hi,
    I think your reg. expression is wrong. * does not stand for zero or more characters It is an operator which represents zero or more occurrences of some literal. You need to use literal . which represents any character. So your expression should look like item name.*type="CHART_ITEM".  There is a program DEMO_REGEX_TOY which you can use for reg. expression testing.
    Cheers

  • Java Regex Question (HTML Tokenizing

    Hello
    I would like to tokenize a HTML Page into its html tags and could not find any working expression. I tried it with:
    <[.]*>
    and for all input fields:
    <(INPUT.*)>
    But it doesn't find anything either or it findes anything.
    Can somebody help me?

    </?\S+?[\s\S+]*?>
    "/?" means: "/" can be there but doesnt have to
    "\S" means: every character which isnt a whitespace
    "+" means: look for the previous character if it is there at least one time.
    the "?" after the "+" means: look only for as few of the previous characters as needed to fullfill the regex.
    thats why <adf>sdf> isnt found because <adf> is the shortest string that fullfills the regex.
    "[]" means: treat everything inside the brackets as one term
    "\s" means: look for a whitespace
    "*" means: the previous character (which is the term inside the brackets) can be there as many times as it wants, even zero times
    "*?" is like "+?"

  • Question related to REGEX  functions

    Hi,
    I am working on Oracle 10gR2.
    I am working on a column which stores username. Let's say that one of the values in this column is "Ankur". I want to fetch all records where username is a concatenated string of "Ankur" followed by some numerical digits, like "Ankur1", "Ankur2", "Ankur345" and so on. I do not want to get records with values such as "Ankurab1" - that is anything which is concatenation of some characters to my input string.
    I tried to use REGEX functions to achieve the desired result, but am not able to.
    Can anyone help me here?
    Best,
    Ankur

    Her is one way.
    SQL> with sample_data as (
      2    select 'Ankur1' str from dual union all
      3    select 'Ankur2'  from dual union all
      4    select 'Ankur345' from dual union all
      5    select 'Ankurab1' from dual)
      6  select str from sample_data
      7  where regexp_replace(str, '[0-9]+$') = 'Ankur';
    STR
    Ankur1
    Ankur2
    Ankur345John

  • Question related to regex and whitespaces  \s

    Hello, i have a problem related to regex.
    I have a text area where sm types text. I noticed that when i click on the Enter button (i have a new line) the string is not being recognised.
    String regex = "[A-Za-z0123456789_./-]*";I tried to place \s but \s includes other whitespaces characters.
    I would like to include in my regex the \n character (the Enter button) or general the \s characters.
    How am i supposed to do that?
    Thanks, in advance!

    g_p_java wrote:
    prometheuzz wrote:
    >
    Note that on Windows, a line break is "\r\n".
    Also, A-Za-z0123456789_ can be written as \w:
    String regex = "[\r\n\\w./-]*";
    If we are using Linux , Unix is that different?The OS line break is just \n. I'm not sure what Swing puts into a GUI element, whether it's OS dependent or not. It won't hurt you to leave the \r in there though. If there's no \r in the string, it won't stop your regex from working, just like it won't stop it from working when you have A-Z and they don't happen to enter a Z.
    The only way it would cause a problem to leave the \r in the regex is if \r were somehow part of the input and you didn't want it treated as end-of-line. I don't see that happening though.

  • OT: Regex Question

    I'm doing a series of search and replace operations with Dreamweaver and wondered if anyone can suggest a regular expression for a particular situation.
    The following URL is fine as it is:
    <td><a href="http://www.geoworld.org/Brazil" title="Brazil">Brazil</a></td>
    However, I need to replace the spaces in this URL with underscores...
    <td><a href="http://www.geoworld.org/Central African Republic" title="Central African Republic">Central African Republic</a></td>
    The finished URL should like like this:
    <td><a href="http://www.geoworld.org/Central_African_Republic" title="Central African Republic">Central African Republic</a></td>
    In other words, I want to replace ALL spaces in the URL proper with underscores, but I want to leave the spaces in the title attributes and visible text alone. Does anyone know a regular expression that will do this?
    Thanks.

    Find:
    (href="[^"]+)\s([^"]+")
    Replace:
    $1_$2
    This will replace one space with an underscore each href attribute. Run the same regex several times until no more instances are found.

Maybe you are looking for

  • Looking for a graphical page  counter

    hi all i'm searching a graphical page counter, in order to place it in a simple html page anyone can tell me where to look? thanx sandro

  • Character Viewer fails to launch

    Character Viewer never launches with the first click on the menu bar. It launches only with the second attempt but even then it takes more time than expected for such an app. Sometimes it also launches (always second click) on the background. OS Bug?

  • E-recruitment BSP pages in EHP3

    Hi,       Can we integrate BSP pages in E-recruitment(EHP 3), as BSP iViews in portal. Regards, Anand Edited by: Kirupanand Venkatapathi on Jun 5, 2009 2:15 PM

  • /usr/include/libusb-1.0/libusb.h error update with libusb1

    Hi, today I run pacman -Syu but I have problem with core/libusb instead libusb1 update errore: impossibile eseguire l'operazione richiesta (file in conflitto) libusb: /usr/include/libusb-1.0/libusb.h è già presente nel filesystem libusb: /usr/lib/lib

  • Download of SAP XML Toolkit

    Hi, I want to use the SAP XML Toolkit to implement a Java Mapping. Where can I find this tool? From which URL can I download the tool? Thanks to all!