Regex - matching literal characters

Im trying to match the following pattern using regex:
The string begins with a literal '\' is followed by any number of letters and/or numbers and ends with '&0]'
e.g. '\07761739009B&0]'
Im trying to devise my pattern but Im not exactly sure how to work with matching literal characters, I was lead to believe a '//' would dictate that the character is literal but this doesnt work:
Pattern Serial = Pattern.compile("(\\/.*+\\&0])");Thanks in advance for any advice

\ is an escape character both in Java string literals and in regex.
"\\" produces a String containing a single \ character. But for a literal \, regex needs \\. So "\\\\" produces a single string containing \\ which in regex becomes a single literal \.
Also, I don't think you need to escape &. And you might need to escape ] but I'm not sure--it might be okay bare if there was no preceding [.

Similar Messages

Regex Matching Involving Unicode

Hi,
I'm trying to do a regex match using boost::regex and followed the instructions on http://niemannross.com/developer/wiki/index.php?title=Using_boost_regular_expressions_(re gexp)_in_InDesign_CS/CS2/CS3_plug-in_code
My regex needs to match a line that ends with punctuation characters and return the string that excluding the ending punctuation characters.
ex. home -> home
ex regex: (.*?)[ \\x{201C}\\x{201D}]+$
however it does not match the line.
I tried using boost::u32regex but i'm getting a boost::icu_regex_traits::translate_nocase symbols not found error on linking.
How can I go around this problem?
Thanks in advance!
-- Jeff

Escaping the backslash in '\x' is necessary for your programming language, otherwise it is interpreted as a 'real' hex character. So as it is, this feeds '\x{201C]' into your program, rather than the literal 0x201C code. (It'd be a syntax error for C, but you get the point.)
However: because this is an expression IN GREP inside your running program, I think you have to escape it again, so it might need double double backslashes. Scripts suffer the same problem.

Regex matching bug?

it seems like j2sdk1.4.2b has some serious regex matching bug with strings that contain unicode characters. In my case, the string contained some Turkish chars.
regex is simple <[^>]*> which matches string runs that are enclosed in <>
(ex. <field>)
although the matching is successful with j2sdk1.4.1_02, it just doesn't match unicode containing text with 1.4.2b
What do you think? Is this a bug or could I be missing something?

ahmeti, did you submit a bug report on this? Because it definitely is a bug in the Pattern class, I finally figured out. They added a new node type to make matching ASCII characters in character classes more efficient, but they screwed up the match condition: it always returns false if the character it's looking at is not ASCII, even if the class has been negated. I'll go ahead submit a report myself unless I hear from you.

Regex replacing multiple characters in string.

I have been working through the Java regex tutorial and tried to modify one of the programs for my own use. Basically, I want to take a string and convert the chatracters A to T, T to A, C to G and G-C.
I produced the rather crude program below, but of course it doesn't work. A could be converted to T and back again before the program terminates.
I know that the code to do this correctly is probably quite complex, so could anyone point me in the direction of a tutorial which will help me to do this?
This aside, I take it that if I am looking for multiple matches of characters which won't give the problem already indicated above, my code is too bloated anyway. Say, for example, instead of wanting to replace A to T, T to A, C to G and G-C, I wanted to replace dog-cat, horse-donkey - lion, tiger , cat-mouse. My code will work for this, but I am sure that it could be compressed a lot. Surely I would not need all the lines of code to do this?
Thanks for any help,
Tim
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.io.*; // needed for BufferedReader, InputStreamReader, etc.
    /** A Java program that demonstrates console based input and output. */
     class dna {
        // Create a single shared BufferedReader for keyboard input
        private static BufferedReader stdin =
            new BufferedReader( new InputStreamReader( System.in ) );
        // Program execution starts here
        public static void main ( String [] args ) throws IOException
            // Prompt the user
            System.out.print( "Type your DNA sequence: " );
            // Read a line of text from the user.
            String DNA = stdin.readLine();
            DNA = DNA.toUpperCase();
            String DNA2 = DNA;
            //calculate reverse complement
            Pattern A = Pattern.compile("A");
            Pattern T = Pattern.compile("T");
            Pattern C = Pattern.compile("C");
            Pattern G = Pattern.compile("G");
            Matcher AA = A.matcher(DNA);
            DNA = AA.replaceAll("T");
            Matcher TT = T.matcher(DNA);
            DNA = TT.replaceAll("A");
            Matcher CC = C.matcher(DNA);
            DNA = CC.replaceAll("G");
            Matcher GG = G.matcher(DNA);
            DNA = GG.replaceAll("C");
            // Display the input back to the user.
            System.out.println( "DNA input             : " + DNA2);
            System.out.println ("Complementary sequence: " + DNA);
    }

TimM wrote:
Thanks a lot!!! Can't believe you managed all that with so few lines of code.You're welcome.
Must be great to know what you are doing :-)
Thanks again,
TimAfter being a bit more familiarised with the methods of String, you'll be able to do this in no time, I'm sure!

Question on regex Matcher (group number)

HI, everybody
I am writing a program on replacement like the one below.
String regex = "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)";
String original = "ABCDEFGHIJKL";
String replacement = "$12";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(original);
String result = m.replaceFirst(replacement);What I actually want is to take out the first group, in this case an "A", and append a character "2" after it.
The result I am expecting is "A2". But the result I get is "L". For the regex engine takes it as the 12th group.
What should I do to remove the ambiguity.
Thanks.

In such case, use $1\\2.

Capture regex match as a variable?

Hello!
I have this program and I basically want to match a part of a string and grab the match as a variable. In this case, the string I need to parse is 'foo'.
Here is what I have:
public class Test
     public static void main(String[] args)
                // link <link> format
          String foo = "http://www.foo.com <http://www.foo.com>";
          String the_regex = "\\<(http://[^\\>]*)\\>";
          String the_replacement = "<a href=\"$1\">$1</a>";
          System.out.println(foo.replaceAll(the_regex,the_replacement));
}$1 (sorta like PERL) should be the captured text from the_regex
Any ideas?
Thanks in advance.

Dubwai - I think I got it, thanks for the guidance. Here is what I used, and it seems to work. Thanks!
public class Test
     public static void main(String[] args)
          String foo = "http://www.foo.com <http://www.boo.com>";
          String regex="\\<(http://[^\\>]*)\\>";
       Pattern p = Pattern.compile(regex);
       Matcher m1 = p.matcher(foo);
       while (m1.find())
         System.out.println("The site = " + m1.group(1));
}

[bug]Jdev 11g:NullPointerException at java.util.regex.Matcher.getTextLength

Hi,
Jdev 11.1.1.0.31.51.56
If somebody of you get the following trace stack when running a jspx using ViewCriteriaRow.setOperator :
There is bug 7534359 and metalink note 747353.1 available.
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
at java.util.regex.Matcher.reset(Matcher.java:291)
at java.util.regex.Matcher.<init>(Matcher.java:211)
at java.util.regex.Pattern.matcher(Pattern.java:888)
at oracle.adfinternal.view.faces.model.binding.FacesCtrlSearchBinding._loadFilter
CriteriaValues(FacesCtrlSearchBinding.java:3695)
Truncated. see log file for complete stacktrace
Workaround:
If you use
        vcr.setAttribute("Job",job);
or
        vcr.setAttribute("Job","="+job);
than add following line of code:
        vcr.setOperator("Job","=");   regards
Peter

Hi,
useful to mention that this happens when setting the equal operator or LIKE operator
vcr.setAttribute("Job","= '"+job+"'");
or
vcr.setOperator("Job","=");
Frank

How to replace regex match into a char value (in the middle of a string)

Hi uncle_alice and other great regex gurus
One of my friends has a peculiar problem and I cant give him a solution.
Using String#replaceAll(), i.e. NOT a Matcher loop, how could we convert matched digit string such as "65" into a char of its numeric value. That is, "65" should be converted into letter 'A'.
Here's the failing code:
public class GetChar{
public static void main(String[] args){
    String orig = "this is an LF<#10#> and this is an 'A'<#65#>";
    String regx = "(<#)(\\d+)#>";
    //expected result : "this is an LF\n and this is an 'A'A"
    String result = orig.replaceAll(regx, "\\u00$2");
    // String result = orig.replaceAll(regx, "\\\\u00$2"); //this also doesn't work
    System.out.println(result);

I don't know that we have lost anything substantial.i think its just that the kind of task this is
especially useful for is kind of a blind-spot in the
range of things java is a good-fit for (?)
for certain tasks (eg process output munging) an
experienced perl programmer could knock up (in perl)
using built-in language features a couple of lines
which in java could takes pages to do. If the cost is
readability/maintainability/expandability etc.. then
this might be a problem, but for a number of
day-to-day tasks it isn't
i'm trying to learn perl at the moment for this exact
reason :)Yes. And when a Java source-code processor(a.k.a. compiler) sees the code like:
line = line.replaceAll(regexp, new String(new char[] {(char)(Integer.parseInt("$1"))}));or,
line = line.replaceAll(regexp, doMyProcessOn("$1")); //doMyProcess returns a Stringa common sense should have told him that "$1" isn't a literal string "$1" in this regular expression context.
By the way, I abhor Perl code becaus of its incomprehensibleness. They can't be read by an average common sense. Java code can be, sort of ...

A problem with regex and special characters

Hello,
I am using regex in my application but i have a problem with special characters. Here is the explanation of what i am doing:
I have a certain piece of text that i want to parse and replace every occurrence of a given word with some sort of a tag which have the word found inside it.
so that: go Going Go to gOschool by bus and to learn and to play GO Go
and i need to replace the word "go" (case insensitive and only at word boundaries) should be:
*<start>go<end> Going <start>Go<end> to gOschool by bus and to learn and to play <start>GO<end> <start>Go<end>*
Consider the following code and call the method with the parameter"go?"
The Matcher finds a weird match at the word "G?oing" with only the letter G !!!
It also ignores the "?" in the pattern completely.
Any clue of what is happening i would be very grateful...
private static String replaceMatches(String strToFind)
        String resultArticle="";
        String article = " "+"go? G?oing Go? to gOschool by bus and to learn and to play GO? Go?*"+" ";
        strToFind = "\\b"+ strToFind +"\\b";
        String linkPart1= "<start>";
        String linkPart2 = "<end>";
        Pattern p = null;
        try{
            p=Pattern.compile(strToFind, Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(article);
        String[] res = p.split(article);
        int i=0;
        //System.out.println("result of split: "+res.length );
        while(m.find())
            resultArticle+=(res[i]+" ");
            resultArticle+=linkPart1;
            resultArticle+=m.group().trim();
            resultArticle+=(linkPart2+" ");
            i++;
        if(i<res.length)
            resultArticle+=res;
//System.out.println("result of match: " + i);
System.out.println(article);
//System.out.println(resultArticle.trim()+scripts);
catch(PatternSyntaxException ex){}
return resultArticle.trim();
}Thanks

tarek.mamdouh wrote:
because split will not work when trying to replace the first word if i don't append a space at the beginning.Split doesn't work anyway. And my question wasn't why do you add spaces (which you really don't need to do), but why do you do them with " " + "go" rather than just " go"
replaceAll will replace all the occurrences in the text with only one word. without taking into consideration the case of the word i need to replace.No.
>
If i use replacaAll(article, strToFind) the output will be:
<start>go?<end> G?oing <start>go?<end> to gOschool by bus and to learn and to play <start>go?<end> <start>go?<end>No. I showed you the actual output of an actual replaceAll.
which is not what i want as i need to keep the case of the words i am replacingThe replaceAll I showed you does that.
Please study the examples given and read the docs carefully rather than making claims based on inaccurate guesses.

Regex matches function

Hi
I am trying to come up with a regex that I can use with the matches function to validate the user id I accept. The user id can contain alphabets, numbers and 3 special chars ".","-" and "#".
The regex I came up with was: user_id.matches("[a-zA-Z\\d\\.\\-#]"). The string I am trying to match is 'user-1'. But this fails to match.
I am not confident about the regex I am using to match my string to. Please let me know what I am doing wrong.
Thanks

I would use '+' rather than '*'; I doubt that an empty string would be considered a valid user ID. ^_^ String regex = "[a-zA-Z0-9#.-]+";

Regex matcher class

Hi
I have a simple problem in regex.
Whenever i try to write this piece of code i get an illegalStateException
Matcher m = p.matcher(" absdsdfksj ");
while (m.find()) {
     System.out.println("At loc : " + m.start());
     System.out.println("Found : " + m.group());
But if i rewrite these two console print lines into one line then i dont get any exception and it runs fine:
while (m.find()) {
     System.out.println("At loc : " + m.start() + " " + m.group());
Pls clarify the difference.
Thansk in advance
Gaurav

There must be more to the problem because I can run without problems
        Pattern p = Pattern.compile("s");
            Matcher m = p.matcher(" absdsdfksj ");
            while (m.find())
                System.out.println("At loc : " + m.start());
                System.out.println("Found : " + m.group());
            Matcher m = p.matcher(" absdsdfksj ");
            while (m.find())
                System.out.println("At loc : " + m.start() + " " + m.group());
        }What pattern are you using on what data? Please give a sample of both.

Regex: match any & not part of & a m p ;

First on the side, any advice about taint best practices for taint checking with jsp/java would be appreciated.
Second, I'm trying to match any "&" that does not appear as part of "& a m p ;". (The spaces are not supposed to be there, but you wouldn't see anything otherwise.)
I haven't gotten anything that works satisfactorily. I've tried variations on:
Pattern.compile("&^(amp;)");but with no avail, so I'm obviously missing something.
null

Yep, I understand that, but suppose, just suppose I want to match
&bfoo such that anything following that ampsersand should not be
equal to 'amp;' followed by 'foo'. Obviously 'b' isn't equal to 'amp;'
but there is no 'capturing-complement' notation. I know a horrible
RE can deal with that ;-)Not that horrible, really: &(?!amp;).*?fooBut you would probably want to be more specific and require the intervening characters to be non-whitespace: &(?!amp;)\S*?foo...or word characters: &(?!amp;)\w*?foo(What it lacks in 'horrible' quality I'm trying to make up for in quantity.)
Anyway, instead of "capture", say that lookaheads don't consume any characters. It may still not make sense to your listeners, but at least it gives them a fighting chance. :-/

Pattern regex matching advice needed

Hi All,
Many thanks for any/all advice :)
Here's my problem. I'm trying to scan a text file for...
\foo(parm1|parm2)
...in which I want the sub-string "parm1|parm2"
So... [\\]foo matches the first section. No problem...
It's when I try adding the '(' or ')' that I'm getting errors.
java.util.regex.PatternSyntaxException: Unclosed character class near index
[\]foo(.*)
Basically, I'm trying to create a pattern, which can recognize \foo(parms), and extract the parms sections.
Any ideas?

Yes you can do this. It is not allowed in basic java but there are always around the syntax rules. What you can do it use AspectJ plugin in for eclipse and define a cutpoint and make it extend from two classes. What it does is it parses the byte code and inputs the code directly into the byte code. It's pretty neat.
A simplier approach would be to have two classes A and B. Have A extend BASE and then have B Extend A and then therefore B "isa" A and a BASE.
Hope this helps.

Designating literal characters in "do shell script"

(Also posted in UNIX forum)
If, in Terminal, I do this:
echo 1/2/3 | sed 's/\/2\// and /g'
I get the expected result, which is:
1 and 3
But if, in AS, I do this:
do shell script "echo 1/2/3 | sed 's/\/2\// and /g'"
I get this error message:
Expected “"” but found unknown token.
If I change the AS command to:
do shell script "echo 1/2/3 | sed 's//2/ and /g'"
I get this error message:
sed: 1: "s//2// and /g": bad flag in substitute command: '/'
(Of course, if I change the AS command to:
do shell script "echo 1/2/3 | sed 's/2/ and /g'"
I get the expected result, which is:
1/ and /3
but this is no help in getting rid of the forward slashes, which is what I want.)
I think this also happens with some other "special" characters surrounding "2". A Google search would seem to indicate this as a well-known dilemma, but I haven't stumbled across a clear solution -- is there one?

osimp,
I also replied in the Unix forum. But here's a little more explanation.
Your command string gets "evaluated" twice... first by AppleScript and then by the shell. Since backslash characters are escape characters in both AppleScript and the shell, you want literal backslashes passed from AppleScript to the shell so when the shell gets the command string it will process them as escape characters in front of the forward slashes.
So you need to escape the backslashes (double-backslash) in AppleScript so they'll be passed to the shell as literal backslashes (single-backslash). Then when the shell evaluates the string the single-backslashes will be treated as escape characters.
<pre>
do shell script "echo 1/2/3 | sed 's/\\/2\\// and /g'"
</pre>
Steve

Regex Matching on Capture groups

I have this regular expression:
(throw|give)(?: ([1-3][A-B]))+
given this input:
throw 1A 2C 1B 3C
How would I capture each of the items 1A 2C 1B and 3C?
In the above expression I have 3 capture groups
group0: whole expression
group1: (throw|give)
group2: ((?:1|2|3)(?:A|B|C))
The problem is that when I execute the find() on my matcher it tries matching the whole expression at once! That means for the group 2 I always get the last match only:
group0: throw 1A 2C 1B 3C
group1: throw
group2: 3C
How do I get the matcher to only match on ONE capture group at a time?! Is it possible? I thought that was the purpose of the find() method. The documentation says find() matches on a "subsequence", yet I can only get it to match on the whole expression. Plus, I don't see where "subsequence" is defined in the documentation. What am I missing here?

"Attempts to find the next subsequence of the input sequence that matches the pattern." (my emphasis)
find() matches the whole regex, not components thereof. What you need to do is use one regex to match the whole expression and return a capture group with the digit-letter pairs, and then use another regex on that capture group to extract the pairs one at a time.*******************************************************************************
Answer provided by Friends of the Water Cooler. Please inform forum admin via the
'Discuss the JDC Web Site' forum that off-topic threads should be supported.

Regex - matching literal characters

Similar Messages

Maybe you are looking for