Substring regex question

Hi, I am trying to come up with a regular expression that will find the 4 digit year in the following example strings and get the character position. (The string lengths may vary...):
/foo/boo/goo/2008/01/20/foo.htm
/foo/2/goo/2008/01/11/foo.htm
Ultimately, I would like to know the character position of the first digit of the year.
Thanks in advance for any help!

String s = "foo/foo/2008/01/02/foo";
Pattern p = Pattern.compile("\\d{4}/\\d{2}/\\d{2}");
Matcher m = p.matcher(s);
if (m.find())
System.out.println(m.start());

Similar Messages

Java Regex Question extract Substring

Hello
I've readt the regex course on http://www.regenechsen.de/regex_de/regex_1_de.html but the regex rules described in the course and its behavior in the "real world" doesn't makes sense. For sample: in the whole string: <INPUT TYPE="Text" name="Input_Vorname">
the matcher should extract only the fieldname so "Input_Vorname" i tried a lot of patterns so this:
"name="(.*?)\"";
"<.*name=\"(.*)\".?>";
"<.*?name=\"(w+)\".*>";
"name=\".*\"";
and so on. But nothing (NOTHING) works. Either it finds anything or nothing. Whats wrong ?
Can somebody declare me what I've made wrong and where my train of thought was gone wrong?
Roland

When you use the matches() method, the regex has to match the entire input, but if you use find(), the Matcher will search for a substring that matches the regex. Here's how you would use it: String nameRegex = "name=\"(.*?)\"";
Pattern namePattern = Pattern.compile(nameRegex,Pattern.CASE_INSENSITIVE);
Matcher nameMatcher = namePattern.matcher(token);
if (nameMatcher.find()) {
String fieldName = nameMatcher.group(1);
}But the main issue is that you're using the wrong method(s) to retrieve the name. The start() and end() methods return the start and end positions of the entire match, but you're only interested in whatever was matched by the subexpression inside the parentheses (round brackets). That's called a capturing group, and groups are numbered according to the order in which they appear, so you should be using start(1) and end(1) instead of start() and end(). Or you can just use group(1), as I've done here, which returns the same thing as your substring() call.
Knowing that, you could go ahead and use matches(), with an appropriate regex: String nameRegex = "<.*?name=\"(\\w+)\".*?>";
Pattern namePattern = Pattern.compile(nameRegex,Pattern.CASE_INSENSITIVE);
Matcher nameMatcher = namePattern.matcher(token);
if (nameMatcher.matches()) {
String fieldName = nameMatcher.group(1);
}

Regex question

Hi,
I have a question regarding the regular expressions in java.
Let's say I have the following regex: "(one)|(two)|(three)" and the following string: "two". The string obviously matches the regex, because of the "\2" group. Is there any way to determine the group number that matched the string, without having to use something like:
for (int i = 1; i <= matcher.groupCount(); i++)
}

It's not top secret, the time difference is the problem.
It's for a school project. We have to make Pascal Compiler and the first step is the Lexical Analyzer. This means that I have some regular expressions for identifiers, numeric constants, string constants and so on...
For example the regex for the identifiers (variable name) looks like: "[a-zA-Z_][a-zA-Z0-9_]*", but the one for the key words is basically an array, like the one in my first post.
The regular expressions work fine, but for the next part of the project I need to know the index of the key words, within the key word array (which in my case is a regular expression). So this is why I was wondering if there is any way to get the group number, without having to iterate through the whole regex.

Simple Java regex question

I have a file with set of Name:Value pairs
e.g
Action1:fail
Action2:pass
Action3:fred
Using regex package I Want to get value of Name "Action1"
I have tried diff things but I cannot figure out how I can do it. I can find Action1: is present or not but dont know how I can get value associated with it.
I have tried:
Pattern pattern = Pattern.compile("Action1");
CharSequence charSequence = CharSequenceFromFile(fileName); // method retuning charsq from a file
Matcher matcher = pattern.matcher(charSequence);
if(matcher.find()){
int start = matcher.end(0);
System.out.println("matcher.group(0)"+ matcher.group(0));
how I can get value associated with specific tag?
thanks
anmol

read the data from the text file on a line basis and you can do:
String line //get this somehow
String[] keyPair = line.split(":")g
System.out.println(keyPair[0]); //your name
System.out.println(keyPair[1]); //your valueor if you've got the text file in one big string:
String pattern = "(\\a*):(\\a*)$"; //{alpha}:{alpha}newline //?
//then
//do some things with match objects
//look in the API at java.util.regex

Java Regex Question (HTML Tokenizing

Hello
I would like to tokenize a HTML Page into its html tags and could not find any working expression. I tried it with:
<[.]*>
and for all input fields:
<(INPUT.*)>
But it doesn't find anything either or it findes anything.
Can somebody help me?

</?\S+?[\s\S+]*?>
"/?" means: "/" can be there but doesnt have to
"\S" means: every character which isnt a whitespace
"+" means: look for the previous character if it is there at least one time.
the "?" after the "+" means: look only for as few of the previous characters as needed to fullfill the regex.
thats why <adf>sdf> isnt found because <adf> is the shortest string that fullfills the regex.
"[]" means: treat everything inside the brackets as one term
"\s" means: look for a whitespace
"*" means: the previous character (which is the term inside the brackets) can be there as many times as it wants, even zero times
"*?" is like "+?"

OT: Regex Question

I'm doing a series of search and replace operations with Dreamweaver and wondered if anyone can suggest a regular expression for a particular situation.
The following URL is fine as it is:
<td><a href="http://www.geoworld.org/Brazil" title="Brazil">Brazil</a></td>
However, I need to replace the spaces in this URL with underscores...
<td><a href="http://www.geoworld.org/Central African Republic" title="Central African Republic">Central African Republic</a></td>
The finished URL should like like this:
<td><a href="http://www.geoworld.org/Central_African_Republic" title="Central African Republic">Central African Republic</a></td>
In other words, I want to replace ALL spaces in the URL proper with underscores, but I want to leave the spaces in the title attributes and visible text alone. Does anyone know a regular expression that will do this?
Thanks.

Find:
(href="[^"]+)\s([^"]+")
Replace:
$1_$2
This will replace one space with an underscore each href attribute. Run the same regex several times until no more instances are found.

Java Regex Question

I wanted to do some regex to see if a string has a subdomain.
I want to pass string then check if there is a xxx.example.com or if it's just example.com. Anyone have a clue?
Thanks,
Brian

I just went around and used the split method to check, I'm posting my code in case someone else has this problem and limited to the 1.4 jdk.
String split = domain.split("[.]") ;
if(split.length > 2)
domain = split[split.length - 2] + "." + split[split.lengh -1] ;basically what I wanted to do was see if it was a subdomain and then strip the preceding and just get to the actual domain.
Thanks for the replys

Regex question (does not contain)

Can anyone tell me what regular expression I could use with Dreamweaver to search for files that do NOT contain the word "physiology"? Ideally, I'd like to find pages that don't contain any variation - physiology, Physiology or PHYSIOLOGY. However, if you have time to show me a couple regex's, including one that's case-sensitive, that would be great.
I've tried the following two "negative lookaround" regex's without success:
^(?:(?!Physiology).)*$
^(?!.*Physiology).*
I think they're both designed to work with strings, not with entire files.
Thanks.

Not sure how to do this in DW but I suggest try using Windows FindStr function as explained here:
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/findstr.msp x?mfr=true
Adapt the method to suit your needs.
Good luck.

Regex question: How do I insert commas between meta data?

Current search engine is being replaced with Google Search Appliance (GSA). It requires meta data to be separated by a comma + space, whereas the previous search engine required only a space. For example:
<meta name="C_NAME" content="Screen1 Screen2">
must become
<meta name="C_NAME" content="Screen1, Screen2">
There are 17 unique screen names and each of 2500 html files may have one or more screen names identified in that meta tag field.
I am hoping for some regular express magic to help me with that global search/replace effort. Suggestions are greatly appreciated.
Thanks,
Rick
================================
Nevermind... figured it out. Just needed to study regex syntax a bit. Here's the answer:
Find: <meta name="C_NAME" content="(\w+)\s(\w+)\s
Replace: <meta name="C_NAME" content="$1, $2,

The only transition you can add this way is default cross dissolve. If the images are in the timeline, move the playhead to the beginning of the images, select them all, and drag from the timeline to the canvas to overwrite with transition.

Regex question: replace

Hi,
I'm getting into java.util.regex lately. Having used Perl for regex I'm trying to get familiar with Java's regex "spirit".
Concerning replacement we can use replaceAll or replaceFirst however:
- what if I want to replace only the third or fourth element?
- what if I want to replace second to fourth element?
in PERL we use " regex_epression_here for 2..4;" for instance.
I you would have some interesting website/tutorials related to JAVA regex that would be great.
Thanks for your help.
Rgds,
SR

Yep,
here is a sample of replacement in Perl
$Line =~ s/\]/|/ for 2..4; #Replace 2nd 'til
4th delimiter (]) with pipe (|)
....Based on the reference I gave earlier
import java.util.regex.*;
* A rewriter does a global substitution in the strings passed to its
* 'rewrite' method. It uses the pattern supplied to its constructor,
* and is like 'String.replaceAll' except for the fact that its
* replacement strings are generated by invoking a method you write,
* rather than from another string.
* This class is supposed to be equivalent to Ruby's 'gsub' when given
* a block. This is the nicest syntax I've managed to come up with in
* Java so far. It's not too bad, and might actually be preferable if
* you want to do the same rewriting to a number of strings in the same
* method or class.
* See the example 'main' for a sample of how to use this class.
* @author Elliott Hughes
public abstract class Rewriter_1
    private Pattern pattern;
    private Matcher matcher;
     * Constructs a rewriter using the given regular expression;
     * the syntax is the same as for 'Pattern.compile'.
    public Rewriter_1(String regularExpression)
        this.pattern = Pattern.compile(regularExpression);
     * Returns the input subsequence captured by the given group
     * during the previous match operation.
    public String group(int i)
        return matcher.group(i);
     * Overridden to compute a replacement for each match. Use
     * the method 'group' to access the captured groups.
    public abstract String replacement(int index);
     * Returns the result of rewriting 'original' by invoking
     * the method 'replacement' for each match of the regular
     * expression supplied to the constructor.
    public String rewrite(CharSequence original)
        this.matcher = pattern.matcher(original);
        StringBuffer result = new StringBuffer(original.length());
        int index = 0;
        while (matcher.find())
            matcher.appendReplacement(result, replacement(++index));
        matcher.appendTail(result);
        return result.toString();
    public static void main(String[] arguments)
        String result = new Rewriter_1("\\|")
            public String replacement(int index)
                if ((index >= 3) && (index <=5))
                    return "y";
                else
                    return group(0);
        }.rewrite("| | | | | |");
        System.out.println(result);
}

Regex question. Please help!

I'm trying to capture instances like
&_l_t_;something&_g_t_;
&_l_t_;blahblah&_g_t_;I tried the following regular expressions:
"&_l_t_;.+?&_g_t_;"
"\\&_l_t_;.+?\\&_g_t_;"but neither worked.
In the code above, the underscore character should not be there, because i was not able to post my message correctly if i did not use underscore to connect the characters '&', 'l', 't', ';'
Please help!

import java.util.regex.*;
public class TagCheck {
public static void main(String[] args) {
    String[] codes = {
      "&_lt;html&_gt;", "a&_lt;b", "abc", "&_lt;head&_gt;", "c&_gt;d"
    String code = "^(&_lt;).*(&_gt;)$";
    Pattern codePattern = Pattern.compile(code);
    Matcher match;
    for(int j = 0; j < codes.length; j++) {
      match = codePattern.matcher(codes[j]);
      System.out.println("codes[" + j + "] = " + codes[j]);
      if(match.find())
        for(int k = 0; k <= match.groupCount(); k++)
          System.out.println("\t\t\tgroup " + k + " = " + match.group(k));
}

Regex question. How do I capture this pattern?

Hi
How do I capture pattern of strings like this:
<TagA attr <InnerTagB attr> attr>
Thanks!

static private String regex = ".*?(<[^<>]*(?:<.*?>)*.*?>).*";

Regex question; $1, $2, etc

Hi,
If I have the following regex Pattern set up:
Pattern title = Pattern.compile("<title>([^<]+)</title>");and I want what's within the parentheses to be stored as a varialbe, the way it would be in perl:
my $title = $1;or whatever, how do I do that in java? Couldn't find it on any of hte regex tutorials I was looking at.
thanks,
bp

The JDK regex package doesn't store captured groups in local variables like Perl does. Instead, you have to retrieve them from the Matcher using the group(int) methods. However, you can use $1, $2, etc. in the replacement string when you do a replaceAll or replaceFirst, and the Matcher will replace them with the appropriate captured groups.

A regex question

Hi all,
I'm trying to get a regular expression used in java to only replace all commas with '#' in a blanket from a specific string.
eg:
original string:"aaa,bbb,to_char(p.sss,'999,999,999.9999') sss,ddd,to_char(eee,'999,999'),fff"
desired output:"aaa,bbb,to_char(p.sss#'999#999#999.9999') sss,ddd,to_char(eee#'999#999'),fff"After some researches,I got this: "(?<=$[^$]{1,50}),(?=[^$]{1,100}$)".This one works fine in regex tools such as RegexBuddy..etc..
However the java program(jdk 1.5.0) seems not work correctly:
     public static void main(String[] args) {
          String str="aaa,bbb,to_char(p.sss,'999,999,999.9999') sss,ddd,to_char(eee,'999,999'),fff";
          System.out.println(str);
          System.out.println("-------------");
          str=str.replaceAll("(?<=\$[^\$]{1,100}),(?=[^\$]{1,100}\$)", "#");
          System.out.println(str);
     }the output still "aaa,bbb,to_char(p.sss,'999,999,999.9999') sss,ddd,to_char(eee,'999,999'),fff"It seems there is something wrong with the "positive lookahead",but as far as I know,java can support this kind of regex: (?<=\$[^\$]{1,100}?)Any ideas?
Thanks!

Right: we consume some of the text with one part of the regex to prevent the other part from seeing it. Here's the breakdown I promised:
With the lookaround approach, we were essentially locating a comma first, then looking backward and forward to figure out whether we should replace it. Since the lookbehind turned out to be unreliable, we need to start matching at some point before the comma, in such a way that all of the ineligible commas either get ignored, or get matched within a capturing group so we can plug them back into the replacement string. The first thing we need to do is match everything up to the first open-parenthesis, because we know we can ignore any commas before that point. As a standalone regex, that part would look like this: "[^(]+\\(" Once we're inside the parens, we can go ahead and match everything up to the next comma. In case we find a set of parentheses with no commas in it, we also add the close-paren to the negated character class: "[^),]+," That works fine for the first match, but it will break down after that because the first part will match everything up to the next open-paren, including the rest of the contents of the first set of parens. That part was meant to fail within parens; that's why it's optional. Since it's required to match an open-paren, and the only way it can reach the next one of those is to match the the intervening close-paren, we can fix it by adding the close-paren to the open-paren in the character class: "[^()]+\\(" And that's all we really need. Once the last comma inside the parens is matched, the first part of the regex takes us up the the next open-paren, where the second part takes over again. The lookbehind turns out not to be necessary once the rest of the regex is properly tuned--it was left over from my earlier attempts to create a working regex. The open-paren in the second character class isn't really needed either, but it doesn't hurt anything and it helps express our intentions. And, as I said earlier, the possessive quantifiers just make the regex a little more efficient. str = str.replaceAll("((?:[^()]++\\()?+[^(),]++),", "$1#"); Although we developed this regex as a replacement for a lookaround-based one, I would encourage everyone to look for non-lookaround solutions first. Despite all the enhancements that have been made to regexes over the years, they still work best when used in a forward-looking, positive-matching style like what we ended up with here.

REGEX: question about finding Overlapping matches using regular expressions

I have the following problem.
Say for my pattern I use:
Pattern pattern = Pattern.compile("AAA");
Matcher matcher = pattern.matcher("AAAAAA");when I run a loop
while (matcher.find())
System.out.println("Match Found: "+matcher.start()+" "+matcher.end());I get 2 Hits shown in the following output:
Match Found: 0 3
Match Found: 3 6
therefore the regex is seeing the first AAA then the second AAA.
I want it to find the other AAA's in there that are overlapping the other two finds i.e. I want the output to find
AAA from 0 to 3
AAA from 1 to 4
AAA from 2 to 5 and finally
AAA from 3 to 6
thereby including the overlapping finds.
How can I do this using regex? what am I missing that prevents the overlapping matches to be found? Do I need a quantifier?
Thanks for the help!

While the solutions above work fine with the given input, they don't really find all overlapping matches. They just find the longest possible match at each start position. Here's a more thorough approach:import java.util.*;
import java.util.regex.*;
public class Test
public static List<String> matchAllWays(String rgx, String str)
    Pattern p = Pattern.compile(rgx);
    Matcher m = p.matcher(str);
    List<String> result = new ArrayList<String>();
    int len = str.length();
    int start = 0;
    int end = len;
    while (start < len && m.region(start, len).find())
      start = m.start();
      do
        result.add(m.group());
        end = m.end() - 1;
      } while (end > start && m.region(start, end).find());
      start++;
    return result;
public static void main(String[] args)
    List<String> matches = matchAllWays("a.*a", "abracadabra");
    System.out.println(matches);
}This approach requires JDK 1.5 or later; that's when the regions API was added to Matcher.

Substring regex question

Similar Messages

Maybe you are looking for