Question on regex

Hi,
I am currently trying to compile a regular expression that can ignore everything within < >.
From:-
<p>123</p><a>456
To:-
123456
The objective is to remove all the HTML tags. Can anybody shed some light in a regular expression that could cater for this?
Thanks.
Joseph

Note: my previous (naive) method only works if there are numbers between the tags. Check this page for details on regex patterns:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
If you're interested in parsing real html files, I suggest using a html parser:
http://java-source.net/open-source/html-parsers
Good luck.

Similar Messages

Ternary Operator Question (with regex).

Hi, I was learning regex in java and wrote a program which tests a string to see if it is an email address.
import javax.swing.*;
public class IsEmailAddress
     public static void main(String[] args)
          String address = JOptionPane.showInputDialog("Enter an email address");
          if(address.matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+"))
               JOptionPane.showMessageDialog(null, "It is an email address");
          else
               JOptionPane.showMessageDialog(null, "It is not an email address");
}The above program works correctly. But then I decided to try and make the program only one line inside the main:
import javax.swing.*;
public class IsEmailAddress
     public static void main(String[] args)
          JOptionPane.showInputDialog("Enter an email address").matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+") ? JOptionPane.showMessageDialog(null, "It is an email address") : JOptionPane.showMessageDialog(null, "It is not an email address");
}I get the compilation error not a statement, what am I doing wrong? Is this even possible?
Thanks for any help

I would never use a piece of code like that as part of a bigger program, I wanted to see if it was possible and maybe to learn something about the ternary operator.
Keeping in mind what you said about the ternary operator having to return something, I was able to make it work:
import javax.swing.*;
public class IsEmailAddress
     public static void main(String[] args)
          int a = JOptionPane.showInputDialog("Enter an email address").matches("[a-zA-Z0-9\\.]+@\\w+\\.{1}\\w+") ? JOptionPane.showOptionDialog(null, "It is an email address","Is it an email Adress?",JOptionPane.CANCEL_OPTION,JOptionPane.PLAIN_MESSAGE,null,null,null) : JOptionPane.showOptionDialog(null, "It is not an email address","Is it an email Adress?",JOptionPane.CANCEL_OPTION,JOptionPane.PLAIN_MESSAGE,null,null,null);
}I chose showOptionDialog because it returns a static int.
Thank you for your help!!!

Question on regex Matcher (group number)

HI, everybody
I am writing a program on replacement like the one below.
String regex = "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)";
String original = "ABCDEFGHIJKL";
String replacement = "$12";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(original);
String result = m.replaceFirst(replacement);What I actually want is to take out the first group, in this case an "A", and append a character "2" after it.
The result I am expecting is "A2". But the result I get is "L". For the regex engine takes it as the 12th group.
What should I do to remove the ambiguity.
Thanks.

In such case, use $1\\2.

Help: Regular Expression question??

Hello,
How can I extract the following content using Java Regular expression?
<tr bgcolor="#333333">
     <td class="title" colspan="4" height="18"> <b>SUPER_1</b> - SUPER_2</td>
</tr>
<tr bgcolor="#333333">
     <td class="match-light" width="45" height="18"> </td>
     <td class="match-light" colspan="3" width="286" align="right">March 19 </td>
</tr>
<tr>
     <td colspan="4" height="1"></td>
</tr>
<tr bgcolor="#cfcfcf">
     <td width="45" height="18"> FT</td>
     <td width="118" align="right">SUPER_3</td>
     <td width="50" align="center"><a class="scorelink" target="details" onclick="showDetails();">999 - 888</a></td>
     <td width="118">SUPER_4</td>
</tr>From the above contents, How can I define a regular expression for extract the "*SUPER_1*", "*SUPER_2*", "*March 19*", "*SUPER_3*", "*999*", "*888*" and "*SUPER_4*" ????
Please help.
Best regards,
Eric

Kayaman wrote:
Why not use a better way than regex, like an actual HTML parser (or XML if you have it well-formed)? People seem to love parsing (or rather, asking help how to parse) HTML with regex for some unknown reason.Indeed.
Read this (hilarious):
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Powershell Regex and multiple matches

Hi all,
I have been messing with powershell for a while now, but this regex challenge has got me stumped.
I have a block of text, with is an output from a previous command, the variable is called $data
The contents look like:
Host : 10.0.0.1
Output : Listening on eth1
# Host name (port/service if enabled) last 2s last 10s last 40s cumulative
1 LOCALPC.internaldomain.net => 79.1Kb 79.1Kb 79.1Kb 19.8KB
www.awebsite.com.au <= 3.99Mb 3.99Mb 3.99Mb 1.00MB
Total send rate: 83.3Kb 83.3Kb 83.3Kb
Total receive rate: 3.99Mb 3.99Mb 3.99Mb
Total send and receive rate: 4.08Mb 4.08Mb 4.08Mb
Peak rate (sent/received/total): 83.3Kb 3.99Mb 4.08Mb
Cumulative (sent/received/total): 20.8KB 1.00MB 1.02MB
============================================================================================
ExitStatus : 0
I am trying to extract the two host names from that block of text, one will always be a hostname ending in internaldomain.net the other one can either be a host name or IP address. They may appear in alternate orders as well.
I can get a single match without an issue:
if ($data -match "(?<=\s).*?(?=.internaldomain)") {
$pcname = $matches[0].Trim()
or
if ($data -match "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b") {
$externalip = $matches[0].Trim()
if ($data -match "([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)*?") {
$externalhost = $matches[0].Trim()
But I can't for the life of me extract the second match.
Anyone got some pointers in how I can extract both matches into strings?
Cheers,
Pazu

Thank you very much mjolinor, greatly appreciated and is all working now, one last question, the regex seems to be matching the number 1 prior to the first proper match, I'm not good enough with regex yet to understand why.
An example of the data match on the one is:
Host : 10.0.0.1
Output : Listening on eth1
# Host name (port/service if enabled) last 2s last 10s last 40s cumulative
1 Work-iPhone.internal.net => 3.57Kb 3.57Kb 3.57Kb 915B
11.111.239.45 <= 2.81Kb 2.81Kb 2.81Kb 719B
Total send rate: 3.57Kb 3.57Kb 3.57Kb
Total receive rate: 2.96Kb 2.96Kb 2.96Kb
Total send and receive rate: 6.54Kb 6.54Kb 6.54Kb
Peak rate (sent/received/total): 3.57Kb 2.96Kb 6.54Kb
Cumulative (sent/received/total): 915B 759B 1.63KB
============================================================================================
ExitStatus : 0
In this sample data, the second line ending in cloudfront doesn't get matched, but it matches 'Total'?
Host : 10.0.0.1
Output : Listening on eth1
# Host name (port/service if enabled) last 2s last 10s last 40s cumulative
1 TARDIS.mashdinternal.net => 2.50Kb 2.50Kb 2.50Kb 640B
server-54-240-177-103.syd1.r.cloudfront <= 169Kb 169Kb 169Kb 42.2KB
Total send rate: 2.66Kb 2.66Kb 2.66Kb
Total receive rate: 169Kb 169Kb 169Kb
Total send and receive rate: 172Kb 172Kb 172Kb
Peak rate (sent/received/total): 2.66Kb 169Kb 172Kb
Cumulative (sent/received/total): 681B 42.2KB 42.9KB
============================================================================================
ExitStatus : 0
Any ideas?
Cheers,
Pazu

Reg Exp always returning false value

Hi,
Below is my code. I want to restrict the values to only alphabets and numbers and not special char.
But the below condition fails on a positive value also. Ex: ABCD, abcd, 1234. These should be acceptable.
Is the reg exp wrong?
private var special_char:RegExp = /^[A-Za-z0-9]*$/;
                              private function validateSpecialChar(inputValue:String):Boolean {
                                        if (special_char.test(inputValue))
                                                  valid = true;
                                        else
                                                  valid = false;
                                        return valid;
Thanks,
Imran

Try: http://stackoverflow.com/questions/9012387/regex-expression-only-numbers-and-characters

Regex question

Hi,
I have a question regarding the regular expressions in java.
Let's say I have the following regex: "(one)|(two)|(three)" and the following string: "two". The string obviously matches the regex, because of the "\2" group. Is there any way to determine the group number that matched the string, without having to use something like:
for (int i = 1; i <= matcher.groupCount(); i++)
}

It's not top secret, the time difference is the problem.
It's for a school project. We have to make Pascal Compiler and the first step is the Lexical Analyzer. This means that I have some regular expressions for identifiers, numeric constants, string constants and so on...
For example the regex for the identifiers (variable name) looks like: "[a-zA-Z_][a-zA-Z0-9_]*", but the one for the key words is basically an array, like the one in my first post.
The regular expressions work fine, but for the next part of the project I need to know the index of the key words, within the key word array (which in my case is a regular expression). So this is why I was wondering if there is any way to get the group number, without having to iterate through the whole regex.

Regex pattern question

Hi,
I'm trying to get my feet wet wtih java and regular expressions, done a lof of it in perl, but need some help with java.
I have an xml file (also working through the sax tutorial, but this question is related to regex)that has multiple elements, each element has a title tag:
<element lev1>10<title>element title</title>
<element lev2>20<title>another element title</title>
</element lev2>
</element lev1>If I have the following pattern:
Pattern Title = Pattern.compile("(?i)<title>([^<]*)</title>");that picks up the titles, but I can't distinguish which title belongs to which element. Basically what I want to have is:
Pattern coreTitle = Pattern.compile("(?i)<element lev1>(**any thing that isn't an </title> tag**)</title>");looked through the tutorials, will keep looking, I'm sure it's in there somewhere, but if someone could point me in the right direction, that would be great.
thanks,
bp

Just guessing, but maybe...
Pattern.compile("(?i)<element lev1>*<title>([^<]*)</title>");
But it seems that things like parsing with SAX (or loading to a DOM) or XPath would be much better suited to parsing out XML then regexp.

Simple Java regex question

I have a file with set of Name:Value pairs
e.g
Action1:fail
Action2:pass
Action3:fred
Using regex package I Want to get value of Name "Action1"
I have tried diff things but I cannot figure out how I can do it. I can find Action1: is present or not but dont know how I can get value associated with it.
I have tried:
Pattern pattern = Pattern.compile("Action1");
CharSequence charSequence = CharSequenceFromFile(fileName); // method retuning charsq from a file
Matcher matcher = pattern.matcher(charSequence);
if(matcher.find()){
int start = matcher.end(0);
System.out.println("matcher.group(0)"+ matcher.group(0));
how I can get value associated with specific tag?
thanks
anmol

read the data from the text file on a line basis and you can do:
String line //get this somehow
String[] keyPair = line.split(":")g
System.out.println(keyPair[0]); //your name
System.out.println(keyPair[1]); //your valueor if you've got the text file in one big string:
String pattern = "(\\a*):(\\a*)$"; //{alpha}:{alpha}newline //?
//then
//do some things with match objects
//look in the API at java.util.regex

How to create a regex for the question mark as a literal?

I get:
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 3
(?)?For
(\p{Punct})?
and (\\?)?
and (\\'?')?

simpatico_gabriele wrote:
I get:
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 3
(?)?For
(\p{Punct})?
and (\\?)?
and (\\'?')?Sorry but, as Darryl says, you need to explain your problem a bit better because the patterns
        Pattern p0 = Pattern.compile("(\\p{Punct})?");
        Pattern p1 = Pattern.compile("(\\?)?");
        Pattern p2 = Pattern.compile("(\\'?')?");all compile and run without any exception.

Question on FIND IN TABLE... REGEX

Hello,
I am trying to do a FIND IN TABLE itab, where itab is a table of strings. One of the strings is like the following:
<bi:item name="CHART_ITEM_1" designheight="380" designwidth="730" type="CHART_ITEM">
I'm trying to search for it using a REGEX pattern. So here's how I did it:
CREATE OBJECT text_regex EXPORTING pattern = 'item name*type="CHART_ITEM"'.
So apparently, the above statement does not work, as the results returned is 0.
What gives? Any help would be much appreciated. Thanks in advance!

Hi,
I think your reg. expression is wrong. * does not stand for zero or more characters It is an operator which represents zero or more occurrences of some literal. You need to use literal . which represents any character. So your expression should look like item name.*type="CHART_ITEM". There is a program DEMO_REGEX_TOY which you can use for reg. expression testing.
Cheers

Java Regex Question (HTML Tokenizing

Hello
I would like to tokenize a HTML Page into its html tags and could not find any working expression. I tried it with:
<[.]*>
and for all input fields:
<(INPUT.*)>
But it doesn't find anything either or it findes anything.
Can somebody help me?

</?\S+?[\s\S+]*?>
"/?" means: "/" can be there but doesnt have to
"\S" means: every character which isnt a whitespace
"+" means: look for the previous character if it is there at least one time.
the "?" after the "+" means: look only for as few of the previous characters as needed to fullfill the regex.
thats why <adf>sdf> isnt found because <adf> is the shortest string that fullfills the regex.
"[]" means: treat everything inside the brackets as one term
"\s" means: look for a whitespace
"*" means: the previous character (which is the term inside the brackets) can be there as many times as it wants, even zero times
"*?" is like "+?"

Question related to REGEX functions

Hi,
I am working on Oracle 10gR2.
I am working on a column which stores username. Let's say that one of the values in this column is "Ankur". I want to fetch all records where username is a concatenated string of "Ankur" followed by some numerical digits, like "Ankur1", "Ankur2", "Ankur345" and so on. I do not want to get records with values such as "Ankurab1" - that is anything which is concatenation of some characters to my input string.
I tried to use REGEX functions to achieve the desired result, but am not able to.
Can anyone help me here?
Best,
Ankur

Her is one way.
SQL> with sample_data as (
2    select 'Ankur1' str from dual union all
3    select 'Ankur2' from dual union all
4    select 'Ankur345' from dual union all
5    select 'Ankurab1' from dual)
6 select str from sample_data
7 where regexp_replace(str, '[0-9]+$') = 'Ankur';
STR
Ankur1
Ankur2
Ankur345John

Question related to regex and whitespaces \s

Hello, i have a problem related to regex.
I have a text area where sm types text. I noticed that when i click on the Enter button (i have a new line) the string is not being recognised.
String regex = "[A-Za-z0123456789_./-]*";I tried to place \s but \s includes other whitespaces characters.
I would like to include in my regex the \n character (the Enter button) or general the \s characters.
How am i supposed to do that?
Thanks, in advance!

g_p_java wrote:
prometheuzz wrote:
>
Note that on Windows, a line break is "\r\n".
Also, A-Za-z0123456789_ can be written as \w:
String regex = "[\r\n\\w./-]*";
If we are using Linux , Unix is that different?The OS line break is just \n. I'm not sure what Swing puts into a GUI element, whether it's OS dependent or not. It won't hurt you to leave the \r in there though. If there's no \r in the string, it won't stop your regex from working, just like it won't stop it from working when you have A-Z and they don't happen to enter a Z.
The only way it would cause a problem to leave the \r in the regex is if \r were somehow part of the input and you didn't want it treated as end-of-line. I don't see that happening though.

OT: Regex Question

I'm doing a series of search and replace operations with Dreamweaver and wondered if anyone can suggest a regular expression for a particular situation.
The following URL is fine as it is:
<td><a href="http://www.geoworld.org/Brazil" title="Brazil">Brazil</a></td>
However, I need to replace the spaces in this URL with underscores...
<td><a href="http://www.geoworld.org/Central African Republic" title="Central African Republic">Central African Republic</a></td>
The finished URL should like like this:
<td><a href="http://www.geoworld.org/Central_African_Republic" title="Central African Republic">Central African Republic</a></td>
In other words, I want to replace ALL spaces in the URL proper with underscores, but I want to leave the spaces in the title attributes and visible text alone. Does anyone know a regular expression that will do this?
Thanks.

Find:
(href="[^"]+)\s([^"]+")
Replace:
$1_$2
This will replace one space with an underscore each href attribute. Run the same regex several times until no more instances are found.

Question on regex

Similar Messages

Maybe you are looking for