Regular expression-- How to match whole word

I use java.util.regex in my case,
my problem is following:
here is a sentence: "oRs As ordoRs"
i use this pattern: (?!\w)oRs(?<!\w)
the first word oRs should match, but why the third word also match?
Maybe my problem also could be scribled as following:
How to match whole word?

here is a sentence: "oRs As ordoRs"
i use this pattern: (?!\w)oRs(?<!\w)
the first word oRs should match, but why the third
word also match?What's even more interesting, there should be no matches at all! Just because the pattern is inherently inconsistent:
the (?!\w) requires the next char to be a non-letter, but
at the same time this char is reqired to be a letter 'o'.
The same thing in the end: the last char should be 's' but not a letter. So the pattern must match nothing.
Maybe my problem also could be scribled as following:
How to match whole word?It's simple, use word boundaries - either directed ( "\<" and
"\>") or undirected ("\b").
The word pattern will look like "\b\w+\b" or "\<\w+\>"
Test it here
http://jregex.sourceforge.net/demoapp.html

Similar Messages

  • SQL Injection and Java Regular Expression: How to match words?

    Dear friends,
    I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
    The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
    I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
    Thanks,
    Ricky
    Edited by: Ricky Ru on 28/04/2011 02:29

    Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

  • Regular expression: how to match "[somestuff]"?

    I have a problem with the following code.
    I meant to catch "[fm1,-]". But I got "[fm1,-] funder [fm2,-] of our country. [sn8,s-]" instead.
    import java.util.*;
    import java.util.regex.*;
    public class regPractice {
    public static void main(String[] args) {
    String s="<TITLE Getting to Know> I hope suitabe [fm1,-] funder [fm2,-] of our country. [sn8,s-]";
    Pattern p=Pattern.compile("\\[(.*)\\]");
    Matcher m=p.matcher(s);
    if (m.find() ){
    System.out.println(m.group(0) ) ;
    }else{
    System.out.println("Nothing");
    }

    Regular expressions are greedy - that is (.*) will grab as much as it
    possibly can before a ]. Hence you see what you see.
    What you want is a reluctant quantifier, in this case (.*?)
    These grab as little as they possibly can. The parentheses are also
    not needed in your example, but you may want group(1) for some
    other reason.
    So we end up with:import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    public class regPractice {
        public static void main(String[] args) {
            String s = "<TITLE Getting to Know> I hope suitabe [fm1,-] funder [fm2,-] of our country. [sn8,s-]";
            Pattern p = Pattern.compile("\\[(.*?)\\]");
            Matcher m = p.matcher(s);
            if (m.find()) {
                System.out.println(m.group(0));
            } else {
                System.out.println("Nothing");
    }which gives the desired output.
    The different types of quantifier are described here:
    http://java.sun.com/docs/books/tutorial/extra/regex/quant.html

  • Regular expressions and string matching

    Hi everyone,
    Here is my problem, I have a partially decrypted piece string which would appear something like.
    Partially deycrpted:     the?anage??esideshe?e
    Plain text:          themanagerresideshere
    So you can see that there are a few letter missing from the decryped text. What I am trying to do it insert spaces into the string so that I get:
                        The ?anage? ?esides he?e
    I have a method which splits up the string in substrings of varying lengths and then compares the substring with a word from a dictionary (implemented as an arraylist) and then inserts a space.
    The problem is that my function does not find the words in the dictionary because my string is only partially decryped.
         Eg:     ?anage? is not stored in the dictionary, but the word �manager� is.
    So my question is, is there a way to build a regular expression which would match the partially decrypted text with a word from a dictionary (ie - ?anage? is recognised and �manager� from the dictionary).

    I wrote the following method in order to test the matching using . in my regular expression.
    public void getWords(int y)
    int x = 0;
    for(y=y; y < buff.length(); y++){
    String strToCompare = buff.substring(x,y); //where buff holds the partially decrypted text
    x++;
    Pattern p = Pattern.compile(strToCompare);
    for(int z = 0; z < dict.size(); z++){
    String str = (String) dict.get(z); //where dict hold all the words in the dictionary
    Matcher m = p.matcher(str);
    if(m.matches()){
    System.out.println(str);
    System.out.println(strToCompare);
    // System.out.println(buff);
    If I run the method where my parameter = 12, I am given the following output.
    aestheticism
    aestheti.is.
    demographics
    de.o.ra.....
    Which suggests that the method is working correctly.
    However, after running for a short time, the method cuts and gives me the error:
    PatternSyntaxException:
    Null(in java.util.regex.Pattern).
    Does anyone know why this would occur?

  • Regular expressions and limiting matched input

    Hi everyone :) I am trying to put together a regular expression that matches strings that contain elements of the form;
    {<some text>}
    However, each piece of text may contain multiple embedded instances of this pattern. I want to ensure that I am always getting the first (or outermost) instance.
    So, if I had;
    {OneStart}{TwoStart}{TwoEnd}{OneEnd}
    I want to make sure that I get 'One' first and 'Two' second. So I have to place a stipulation in my regular expression to match this pattern only if it has not located the patten previously.
    At the moment, I have this -
    ([^\\{]*?)(\\{TagStart\\})(.*?)(\\{TagEnd\\})(.*)
    What I think that I have to do is modify the first capture group '([^\\{]*?)' which at the moment only does not match if a preceding '{' is found to match only if a preceding '{<text>}' sequence is not found.
    Anyone got any idea how to do this?
    Thanks in advance.
    Ben

    Doesn't it work anyway, since you're using greedy operators? If not, won't it work if you remove the .* at the end and use find() rather than matches()? And finally, what's the (.*?) supposed to match? Looks to me like that should be .*

  • Regular expression and pattern matching/replacing

    I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
    My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
    I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replacementStr);
    But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
    Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
    Any suggestions are most welcome.

    I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
    What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

  • Regular expression in exact match

    Hi,
    Please let me know how do i find an exact match within a string using Regular expression,
    var str1:String = "s1 + s2 + sf_s1 + s1 - s2";
    var replceableStr:String = "s1";
    var reggular:RegExp = /\s+(s1)\s+/gi;
    var str2:String;
    str2 = str1.replace(reggular,"format");
    i want only s1 to be replaced with format but sf_s1 should not be replaced, please let me know which Reg exp to use
    Expected output: format + s2 +sf_s1 + format -s2

    Odd, though. I thought regex wouldn't pick upthe
    \n unless you explicitly told it to with DOTALL or
    MULTILINE or (?s) or (?n).
    I thought "String.matches" matched the wholestring.
    So, since the string had a newline, it didn'tmatch.
    It does.Ah, that would explain it then.

  • Match Regular Expression does not match what Match Pattern does

    I have read through a lot of posts about how Match Pattern does not match what Match Regular Expression will due to not processing some characters.
    However, I found a problem with the other way. A simple Reg-Ex that works in Match Pattern but not Match Regular Expression.
    What I have here is just an example. I want to use Match Regular Expression so I can specify some sub-matches.
    The reg-ex is for: one or more non-numeric characters, a space, one or more numeric characters. At the start of the string.
    How can I get this working in Match Regular Expression? I am working in LabVIEW 2010f2 32 bit. Here is the code snippet and the results:
    Rob
    Solved!
    Go to Solution.

    Robert Cole wrote:
    I think I prefer the ~ for negation since ^ is also used for beginning of the string. But we work with what we have.
    Let me offer you a tip and perhaps defend the honor of the regex a little bit.  One of my favorite features of regexes is the ability to specify character classes (and their negation).  One of the reasons I have to think about the ~ versus ^ is that I rarely use ^ in a regex alternative. 
    Some examples:
    [0-9] = \d (digit)
    [^0-9] = \D (not a digit)
    The equivalent regex for your case is: \D+ \d+

  • Regular expressions-how to replace [ and ] characters from a string

    Hi,
    my input String is "sdf938 [98033]". Now from this given string, i would like to replace the characters occurring within square brackets to empty string, including the square brackets too.
    my output String needs to be "sdf938" in this case.. How should I do it using regular expressions? I tried several possible combinations but didn't get the expected results.

    "\\s*\\[[^\\]]+\\]"

  • Regular expression doesn't match

    Hello, I'm trying to match a String with this content (without the quotes):
    "83.107.84.188"
    whith the next regular expression (without quotes again):
    "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"
    I think that all is OK, but the call to String.matches(regularExpression) returns false. What I'm doing wrong?
    Thanks in advance.
    Jorge.

    Odd, though. I thought regex wouldn't pick upthe
    \n unless you explicitly told it to with DOTALL or
    MULTILINE or (?s) or (?n).
    I thought "String.matches" matched the wholestring.
    So, since the string had a newline, it didn'tmatch.
    It does.Ah, that would explain it then.

  • Regular Expressions - How can I read the char of a text using the index?

    Assuming that, this is a findWord function which searches in a text and returns the indexes that a given word is located. How to read the characters that are located some pos before each index?
    Kind regards,
    Vag.
    public static void findWord(String type) {
            int count = 0;
            String regex = "<a word>";
            String text = readFile("data/raw/" + type + "/" + type + ".page.all.txt");
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);
            System.out.print("Word <a word> was found at positions");
            while (matcher.find()) {
                count++;
                System.out.print(", " + matcher.start());
            System.out.println(" at a total of " + count + " times!");
        }

    Read the API docs for java.util.regex.Matcher. The matcher object exposes this data via start() and end() methods.

  • An additional question about regular expressions with String.matches

    does the String.matches() method match expressions when some substring of the String matches, or does it have to match the entire String? So, if i have the String "123ABC", and i ask to match "1 or more letters" will it fail because there are non-letters in the String, but then pass if i add "1 or more letters AND 1 or more digits"? so, in the latter every character in the String is accounted for in the search, as opposed to the first. Is that correct, or are there ways to JUST match some substring in the String instead of the whole thing? i WILL make some examples too... but does that make sense?

    It has to match the whole String. Use Matcher.find() to match on just a sub-string()

  • Regular expression to search if a word occurs with z and /z tags

    Hi ,
    I am trying to create a regex to search for the occurence of two words within <z> and </z> tags.(they must occur between <z> and and next immideate </z> tags)
    This is my regex
    <z>\s[\w\W]+?(?!</z>)word1[\w\W]+?(?!</z>)word2[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)word2[\w\W]+?(?!</z>)word1[\w\W]+?</z>
    I am trying to specify (?!</z>) in order that i insist that my regex engine does map for word1 and word2 within <z> and its next immideate </z> tags. The words can appear either ways word1 followed by word2 or vice versa.
    The above regex does not work fine.
    It maps fine for the following sentence
    <z> This is test for pattern for a Regex </z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
    The regex is as follows
    <z>\s[\w\W]+?(?!</z>)pattern[\w\W]+?(?!</z>)Regex[\w\W]+?</z>|<z>\s[\w\W]+?(?!</z>)Regex[\w\W]+?(?!</z>)pattern[\w\W]+?</z>
    But, when i include the </z> in between pattern and Regex , it should not match, but that is not what is happening.
    <z> This is test for pattern </z> for a Regex</z> <z> Also we would like to conclude what is happening </z> <z> Another test for paragraph is happening </z>
    Please let me know how I can accomplish the same.
    Thanks.

    oops.. sorry ..this is aligned better ...
    Hi , I am using this regex
    (?=(?:(?!</z>).)+?\bpattern\b.+</z>)(?!(?:(?!</z>).)+?\bscientific\b.+</z>)((\bpattern\b(?:(?!</z>).)+)\bpattern\b|\bpattern\b)I have written the above regex to match pattern between <z> and </z> provided there is no word "scientific" within that "z" tags.
    I have been trying to replace the regex to do the same between two sentences, here in my case I have a paragraph with multiple sentences. The delimiter to determine a sentence is dot (.). So i am trying to specify the condition as above to match between two sentences - using representation for dot as (\. )
    The following is my regex
    (?=(?:(?!\.).)+?\bpattern\b.+\.)(?!(?:(?!\.).)+?\bscientific\b.+\.)((\bpattern\b(?:(?!\.).)+)\bpattern\b|\bpattern\b) But this does not work..
    Can you please tell me how to go about this?
    Thanks.

  • [Flex 4.5.1] Regular Expressions - how to ignore case for cyrillic text /ignore flag doesn't work/ ?

    The only solution to this that I've found is to convert the text to lowercase before matching...
    Is there a more convenient way ?
    Thanks

    The only solution to this that I've found is to convert the text to lowercase before matching...
    Is there a more convenient way ?
    Thanks

  • Regular expression: multiple possible matches, find all?

    Hi, I'm looking over something I'm afraid, the solution to my problem will probably be simple :(
    Consider the input string "something aname bname something"
    I have a list of other strings that might occur in the string:
    "aname"
    "aname bname"
    I want to know if any of the strings in the list occurs in the input string, possibly both.
    For this, I created the pattern "(?i)(aname)|(aname bname)(?i)"
    With a Matcher, it will only match the "aname", while I want to find both the "aname" and the "aname bname".
    Reversing the pattern, "(?i)(aname bname)|(aname)(?i)", it will only find "aname bname", and not the single "aname".
    I cannot really create a very sofisticated pattern such as "(aname( bname)?)", since the list of strings comes from a database. Creating a very nice pattern by looking at the possible matcher strings before compiling the pattern is not really a clean solution.
    Again: what is my at-this-point-non-functional-brain missing? Thanx!

    Thanks for the link. Of course I have read the manual, I guess there is something wrong with my eyes because I don't immediatly see the answer. Last weekend I already ordered new glasses, but they take three to four weeks to deliver my new glasses. Could you in the mean time point me to the exact place I should look?
    In all seriousness, I already tried the Matcher. It's the pattern itself that causes it to match only the first alternative in the list of groups in that pattern. I did read the Pattern manual here.
    My test code:          String input = "something aname bname";
              String match1 = "aname";
              String match2 = "aname bname";
              String quotedPattern = "(?i)(" + Pattern.quote(match1) + ")|(" + Pattern.quote(match2) + ")";
              System.out.println(quotedPattern);
              Pattern p = Pattern.compile(quotedPattern);
              Matcher m = p.matcher(input);
              while (m.find()) {
                   for (int i = 0; i < m.groupCount() + 1; i++) {
                        System.out.println("Group " + i + ":\t" + m.group(i));
              }Outputs:
    (?i)(\Qaname\E)|(\Qaname bname\E)
    Group 0:     aname
    Group 1:     aname
    Group 2:     null
    The last matching group should also match in my opinion. If I turn the pattern around, i.e. first "aname bname" and then "aname", it will still only find "aname bname" and not "aname", which would also match.
    Thanks again

Maybe you are looking for