Regex character classes

You can match ']' in a regex character class by specifying it as the first literal
"[]]"How do you match '[' in a regex character class?
Thanks in advance, Mel

sabre150 wrote:
Escape it as in
"[\\[]"Check the Javadoc for Pattern.I swear i initially tried that... thanks sabre

Similar Messages

  • Regular Expression back-reference in character class

    I am trying to capture quoted text (excluding the quotes) with the following pattern:
    "(['\"])([^\\1]+)\\1"
    The input string might look like:
    "That's strange"
    or:
    'valid "regex" pattern'
    I get an exception when trying to compile the pattern, because of the back-reference within character-class brackets: [^\1]
    This pattern worked with the org.apache.oro package. Is this a bug in the 1.4 Pattern class? Can you offer an alternative solution?
    Thanks,
    Tony

    Thanks for the reply. Apparently, my pattern wasn't working as I thought with ORO. The pattern you suggested, however, wouldn't do what I need either, because it would match anything between the first and last quote character. I don't want to match the string you included in your reply, because it includes the same quote character that encloses the entire string. If you think about my pattern again, you'll notice that the first capture group should match either a single or double quote character. Then the back-reference should be the exact character (single character) that was matched and captured. So the second capture group should be any number of characters, as long as they don't match the first capture group character, followed by the exact string matched in the first capture group (the single character). You probably are aware that when one or more characters are included in square brackets [], that it specifies a match of a single character that is found anywhere within that character class. So if the first capture group happened to match more than one character (impossible due to the pattern in that grouping), the following character class [^\1] should match any character EXCEPT any of the ones matched in the first capture group...right? There must be a bug in ORO that doesn't match correctly per my description, but it seems there is a bug in JDK 1.4 Pattern that will not even compile it. Just to re-iterate...if \1 matched "abc", [^\1] should match any single character EXCEPT a or b or c. I can get what I want, though, if I do it in 2 matches:
    Pattern pat = Pattern.compile("(['\"])(.*)");
    Matcher mat = pat.matcher(sText);
    mat.matches();
    String sQuote = mat.group(1);
    String sRem = mat.group(2);
    pat = Pattern.compile("([^" + sQuote + "]*)" + sQuote);
    mat = pat.matcher(sRem);
    mat.matches();
    String sWhatIWanted = mat.group(1);
    Thanks,
    Tony

  • PatternSyntaxException: Unclosed character class near index

    Hi,
    I want to replace in a string a expression like "^1:2" by "^(1/2)":
    For example, "V/Hz^1:2" would be converted to "V/Hz^(1/2)".
    I tried the following code:
    String oldExponent = "[^](\\d+):(\\d+)"; // e.g. ^1:2
    String newExponent = "^($1/$2)";         // e.g. ^(1/2)
    myString = myString.replaceAll(oldExponent, newExponent);but the following exception is thrown in the third line:
    java.util.regex.PatternSyntaxException: Unclosed character class near index 13
    [^](\d+):(\d+)
                 ^Any idea?
    Thanks in advance.

    Thank you, jverd, you are pretty right.
    Now I tried with
    "\\^(\\d+):(\\d+)"and it works.
    I tried to avoid ^ being interpreted as the beginning of the line, and didn't realize the interpretation inside the brackets.
    Escaping this character works well.

  • Unclosed character class near index 0

    Hi foks,
    I am tryin to remove few characters from a string with the help of replaceAll ( String, String ) method.
    But at the first replacement itself it gives error "Unclosed character class near index 0".
    what does this error mean? I dont have any ' [ ' character in that string. It used to be there, but now I am removing all the occuring of the ' [ ' char right at the creation of string.
    Thanks in advance.

    Hi,
    Here is the code.
    [ CODE ]
    public static String extract(String para)
    String definition = para.substring(para.indexOf("<ol>"),para.indexOf("</ol>"));
    StringBuffer meanings=new StringBuffer();
    StringReader reader = new StringReader(definition);
    StringWriter writer = new StringWriter();
    boolean append=false;
    int ch;
    try
    while((ch=reader.read()) != -1)
    if(ch=='<' || ch=='&')
    append = false;
    continue;
    else if(ch=='>' || ch==';')
    append = true;
    continue;
    if(append && ch != '[' && ch != ']')
    writer.write(ch);
    catch(IOException e)
    System.out.println("IOException:"+e.getMessage());
    catch(Exception e)
    System.out.println("IOException:"+e.getMessage());
    meanings=writer.getBuffer();
    String temp = meanings.toString();
    System.out.println("temp:"+temp);
    temp=temp.replaceAll("["," ");
    System.out.println("temp:"+temp);
    temp=temp.replaceAll(" "," ");
    System.out.println("temp:"+temp);
    return temp;
    [ /CODE ]

  • Changing from upper to lower case without character class

    hi all,desperatley need help.
    I need to create an algorithm as follows
    the implement into a program without using character class
    i have no idea how to go about this
    Create an algorithm for a program that continually reads (loops) a single character from the user and displays the ordinal (ASCII) value of the character. The program should then change the case of the character, so if it is an 'a' change to an 'A' and vise-versa and then display the new ordinal value. The program should quit when the user enters the '#' character. The program should display an error message if an invalid character is entered.
    Implement the program from question 1 into a java program. You are not allowed to use any of the methods provided in the Character class to implement your solution
    regards Paul

    How time flies...they are already assigninghomework
    for the fall term.Yeah, the nice thing about the summer is that we don't
    get as many homework problems on the forums. Oh
    well...I love how they don't even take the time to reword the hw and instead just post the prof's exact text.
    "I need help with a problem, its Ch. 9, Q 23a, can anyone help me?"

  • Regular Expressions Character Class shortcuts

    I have been learning to use regular expressions to modify some of my text files. I noticed that on my ARCH box the Character Class shortcuts do not work e.g. [[:digit:]] in an expression works but \d does not. Is this normal or is my installation broken in some way?

    Bebo wrote:
    There are several regexp "dialects". It's quite painful actually For instance, as far as I know, \d works in perl, but not in sed or grep.
    So, yes, this is normal.
    Yeah -- Henry Spencer's regexp stuff is always generally considered the portable form for sed, awk, since they're all based from it.  Newer versions of grep though do allow for a -P flag for perl-regexps to be used, but this is non-portable, obviously.
    -- Thomas Adam

  • IsDigit / digit in Character class - ouput ?

    Can somebody help me in knowing why such an output is coming while I use forDigit / digit from Character class?
    The output that I get is:
    The for Digit is
    The digit is -1
    Note : There is a blank space after the is there.
    class chard
         public static void main(String[] args)
              int a = 66;
              char ch1 = 66;
              System.out.println("The for Digit is" + Character.forDigit(a, 2));
              System.out.println("The digit is " + Character.digit(ch1, 2));
    }

    Why aren't you reading the documentation????
    http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html
    digit(char ch, int radix)
    "Returns the numeric value of the character ch in the specified radix.
    If the radix is not in the range MIN_RADIX <= radix <= MAX_RADIX or if the value of ch is not a valid digit in the specified radix, -1 is returned. A character is a valid digit if at least one of the following is true: "
    How do you expect value 66 which is the same as 'B' to be a valid value in base 2?
    /Kaj

  • Match string against regex or character class in Applescript

    Hello,
    In my script i get string from user and need to ensure that string contains only alphanumerical characters. There is a sample:
    #!/bin/bash
    REGEX="^[[:alnum:]]*$"
    osascript <<EOF
    tell application "SystemUIServer"
    repeat
    set username to text returned of (display dialog "Enter your name" with icon caution default answer ""  buttons{"Continue"})
    if text returned of (do shell script "if ! [[ " & quoted form of username & " =~ $REGEX ]]; then echo \"notok\"; fi") as text is equal to "notok" then
    display alert "WRONG CHARACTER"
    else
    exit repeat
    end if
    end repeat
    end tell
    EOF
    The error:
    execution error: Can▒t make text returned of "notok" into type text. (-1700)
    How can one fix this? Is it possible to do this in pure AS without invoking shell?

    Follow Tony's advice.
    Yes, you can code it entirely in AppleScript, but is a verbose beast, none the least, do to the lack of an AppleScript range operator that would permit range('A'..'z') and avoid arduous lists.
    Code:
    set goodStr to "cAt134"
    set badStr to "cAt_134"
    if not isalnum(badStr) then
      display dialog "Not ok!"
    end if
    on isalnum(username)
              set uCase to {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", ¬
                        "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", ¬
                        "U", "V", "W", "X", "Y", "Z"}
              set lCase to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ¬
                        "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", ¬
                        "u", "v", "w", "x", "y", "z"}
              set nbr to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
              set alnum to uCase & lCase & nbr
              set status to true as boolean
              repeat with charptr from 1 to count of username
                        set current_char to item charptr of username
      -- log current_char
                        if current_char is not in alnum then
                                  set status to false
      -- display dialog current_char
                                  exit repeat
                        end if
              end repeat
              return status
    end isalnum

  • PR Release Strategy- transport of charact/class/ values using ALE

    Does anyone have the information relative to ALE set up for release strategy-
    I am using transactions BD91 for characteristics; BD 92 for Class and BD 93 for the actual values into release strategy.
    I need the information as to what needs to be set up in Development client and receiving clients ( Q and Prod) for these transactions i.e., RFC destinations, define ports, Identify Message types & setup partner profile configurations etc
    This will be a great help.
    Thanks
    Raj

    Hi Raj,
    Please check the below notes for more inrofmation:
      86900 -  Transport of Release strategies (OMGQ,OMGS)
      799345    Transport of release strategy disabled
      10745     Copying classification data to another client
      45951     Transporting class data: system / client. - has details of what you are looking for.
    Hope this helps.
    Regards,
    Ashwini.

  • Regex matcher class

    Hi
    I have a simple problem in regex.
    Whenever i try to write this piece of code i get an illegalStateException
    Matcher m = p.matcher(" absdsdfksj ");
    while (m.find()) {
         System.out.println("At loc : " + m.start());
         System.out.println("Found : " + m.group());
    But if i rewrite these two console print lines into one line then i dont get any exception and it runs fine:
    while (m.find()) {
         System.out.println("At loc : " + m.start() + " " + m.group());
    Pls clarify the difference.
    Thansk in advance
    Gaurav

    There must be more to the problem because I can run without problems
            Pattern p = Pattern.compile("s");
                Matcher m = p.matcher(" absdsdfksj ");
                while (m.find())
                    System.out.println("At loc : " + m.start());
                    System.out.println("Found : " + m.group());
                Matcher m = p.matcher(" absdsdfksj ");
                while (m.find())
                    System.out.println("At loc : " + m.start() + " " + m.group());
            }What pattern are you using on what data? Please give a sample of both.

  • TOLERANCE SETTINGS FOR BATCH/CHARACTER/CLASS

    Hi gurus
    A material with a alternate unit conversion like Kg and Pair, suppose material with 1 pair equal to 40kg as standard. We set a batch mgmt to this material,
    Here i have to set a tolerance limit with +or - 8% reference to 40 kg.
    Where i have to set the tolerance , help me out
    Chris

    Hi,
    All material data is updated in the base unit of measure. The tolerances can be had on the Base unit of measure.
    So, choose this unit carefully since an exact quantity can be expressed in an alternative unit of measure only if its value can be shown with decimal places. It is therefore important to observe the following two principles when defining the base unit of measure:
    - The base unit of measure is the unit that provides the maximum precision necessary.
    - Conversion from an alternative unit of measure to the base unit of measure should result in a simple decimal, not a recurring (or repeating) decimal.
    Regards,
    Narayana.

  • Identify Non English Character in a String

    All,
    We have a requirement to Identify the Non English Characters from the User Key In data and return an error message saying only valid English, Numeric and some special characters are allowed.
    For Example, If the User enters data like "This is a Test data" then the return value should be true. or if he enters something like "My Native Language is inglés" then it should return false. Similarly any Chinese, russian or japansese character entryies should also return false.
    How can we achieve this?
    Thanks,
    Nagarajan.

    Hi Nagarajan,
    You could use Unicode character blocks or simply craft a regular expression that contains all the characters you need. The latter is easy to understand and gives you full control over which characters you want to allow. E.g. I assume you might want something like this:
    if(!"This is a proper input string".matches("[\\s\\w\\p{Punct}]+")) {
      // Issue error message and re-get input string
    The String method matches() takes a regular expression as input parameter. If you haven't dealt with regular expressions before, check out the Java API help for class java.util.regex.Pattern. Here's a short breakdown of the pattern I used:
    <ol>
    <li>The square brackets [] enclose a list of allowed characters; here you can explicitly list all allowed characters.</li>
    <li>You can specify ranges like a-z as a character class, list individual characters like ;:| or utilize predefined character classes (\s for any whitespace character, \w for all letters a-z and A-Z, underscore and 0-9 and the posix class \p for a list of punctuation symbols). For a complete list check Java API help on java.util.regex.Pattern.
    <li>The + at the end indicates that the characters listed can occur once or more.</li>
    </ol>
    There's other ways to achieve what you want, but I think this might be an easy way to start with.
    Cheers, harald

  • Tell me how much my regex sucks, and help me make it better

    uncle alice,
    can you look at this and see if you see anything wrong with it, or better yet, do you know of a better solution using regex?
    following regex is used to extract all links from an html page (href, img src) both absolute and relative:
    (?im)(?:(?:(?:href)|(?:src))[ ]*?=[ ]*?[\"'])(((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))|((?:\\/{0,1}[\\w\\.]+)+))[\"']
    String absolute = m.group(2);
    String relative = m.group(3);

    There's a lot of good material in that regex, pedagogically speaking. :D Although you've solved your problem another way, I'd like to comment on some common errors I see.
    {color:#008000}(?im){color} : For anyone who doesn't know, these are inline flags, whose effect is the same as the compiler flags CASE_INSENSITIVE & MULTILINE. But you don't need the multiline flag. People often assume they have to use that flag when they're searching for strings that may span line breaks, but all it does is change the meaning of the ^ and $ metacharacters. (By default, they match the beginning and end, respectively, of the target string; with the multiline flag set, they also match the beginning and end of logical lines within the target string.) You aren't using those anchors, so that flag is irrelevant.
    {color:#008000}(?:(?:href)|(?:src)){color} : The outer set of parentheses is needed to contain the effect of the pipe, but the inner sets are just noise. In fact, most of the parens in your regex are unnecessary. Excessive grouping can significantly affect the performance of the regex if you really get carried away with it, although it takes quite a bit more than you've got here. The real problem is the visual complexity they add; regexes don't need any help in that department! :-/
    {color:#008000}[ ]*?{color} : You don't need to put the space character in square brackets to match it, although doing so can make the regex a little easier to read. More importantly, an HTML tag can contain any whitespace character at that point, not just spaces, so you should use &#92;&#92;s instead. Also, you shouldn't use a reluctant quantifier unless the thing it's quantifying can match something you don't want it to. Since they're inherently slower than normal (greedy) quantifiers, you should take care not to use them where they aren't needed, which is the case here.
    {color:#008000}(?:http|https){color} : Whenever you have two alternatives, of which one is a prefix of the other, you should list the longer one first. The alternatives are tried in the order they're listed, so listing them in the wrong order can reduce the efficiency of your regex in much the same way that using reluctant quantifiers inappropriately can. It isn't a problem here, since the next thing the regex has to match is so definite (i.e., "://"), but you should get in the habit of following that rule. In this case, you can just make the final letter optional: {color:#008000}https?{color}
    {color:#008000}&#92;&#92;/{2}{color} and {color:#008000}&#92;&#92;/{0,1}{color} : You don't need to escape the forward-slash in Java regexes; that's only necessary in languages like Perl and JavaScript that have language-level support for regexes. They use the forward-slash by default as the quoting character for regex literals, so they have to escape it for the same reason we have to escape the double-quote (but some languages also let you choose different quoting characters each time). Also, I agree with paulcw that the {2} just adds unnecessary complexity in this case. As for the '{0,1}', its meaning is exactly the same as '?', so why not use that instead?.
    {color:#008000}[&#92;&#92;/|&#92;&#92;.]{color} : First, you don't need any of those backslashes. The forward-slash is never special, and the period loses its special meaning inside the square brackets. The pipe is just a pipe, too, so your character class matches a slash, a pipe, or a period, which is probably not what you meant. You need to understand that character classes are like a language within a language. A regex is effectively a set of linear instructions: match this AND then this AND then this, etc.. If you want to create an OR branch, you have to do so explicitly, using a pipe or a quantifier. But the semantics change drastically when you go inside the square brackets. Since a character class only matches one character at a time, AND is irrelevant and OR is implicit: match this character OR this one OR one from this range, etc.. The only metacharacters that are needed in character classes are those that are used for set operations: the caret for NOT, hyphen for ranges, etc.; everything else is just a character.
    If you'd like me to keep going, I'll need to know more about your exact intentions. Do you want the protocol (e.g., "http://") to be optional? What about the quotes around the URL? Finally, I don't understand what the final pipe in your regex is supposed to do, but I'm pretty sure it isn't working. :D

  • Regex and implementing FilenameFilter problem

    Hello,
    So what I'm trying to do is to create a program that takes a certain set of files, pulls the first line of each file and uses it to name the file. Right now, I'm at the point of getting a listing of files based on a patterns. So when I run the program on the command line (of a windows machine), it spit out the files that I'm looking for. Something like:
    java FileRenamer *.txt
    Above should produce a listing of only files that have .txt on them (I want to have the capability to choose *.txt or whatever other combination of pattern match).
    To do the above, I want to use a FileNameFilter interface to figure out what files match. The problem that I'm running into is that when I run a unit test against the getFilesListBasedOnPattern method, I get:
    java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
    *.txt
    The problem is that the *.txt has a regex character (the *) and I'm not sure how make it behave like the wildcard in the dos command line where *.txt means everything that has .txt at the end.
    The code listing is below. Does anyone have any suggestions on how to best approach this?
    mapsmaps
    =======> Code below <=======
    // unit test snippet that causes blow out:
    FileRenamer fr = new FileRenamer();
    String [] strArrFilesBasePattern = fr.getFilesListBasedOnPattern(dirTestFiles,"*.txt");
    ====
    //main program
    package com.foo.filerenamer;
    import java.io.File;
    import java.io.FilenameFilter;
    import java.util.Vector;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * TODO Use regexp to filter out input to *.txt type of thing or nothing else
    public class FileRenamer
        // Vallid file patterns are *.*, ?
        public static final String strVALIDINPUTCHARS = "[_.a-zA-Z0-9\\s\\*\\?-]+";
        private static Pattern regexPattern = Pattern.compile(strVALIDINPUTCHARS);
        private static Matcher regexMatcher;
         * @param args
         * @throws InterruptedException
        public static void main(String[] args) throws InterruptedException
            int intMillis = 0;
            if (args.length > 0)
                try
                    intMillis = Integer.parseInt(args[0]);
                    System.out.println("Sleep set to " + intMillis + " seconds");
                catch (NumberFormatException e)
                    intMillis = 5000;
                    System.out.println("Sleep set to default of " + intMillis + " since first parameter was non-int");
                for (int i=0;i<args.length;i++)
                    System.out.println("hello there - args["+i+"] = "+ args);
    Thread.sleep(intMillis);
    // TODO Auto-generated method stub
    public boolean checkArgs(String [] p_strAr)
    boolean bRet = false;
    if (p_strAr.length != 1)
    return false;
    else
    regexMatcher = regexPattern.matcher(p_strAr[0]);
    bRet = regexMatcher.matches();
    return bRet;
    public String[] getFilesListBasedOnPattern(File p_dirFilesLoc, String p_strValidPattern)
    String[] strArrFilteredFileNames = p_dirFilesLoc.list(new RegExpFileFilter(p_strValidPattern));
    return strArrFilteredFileNames;
    class RegExpFileFilter implements FilenameFilter
    private String m_strPattern = null;
    private Pattern m_regexPattern;
    public RegExpFileFilter(String p_strPattern)
    m_strPattern = p_strPattern;
    m_regexPattern = Pattern.compile(m_strPattern);
    public boolean accept(File m_directory, String m_filename)
    if (m_regexPattern.matcher(m_filename).matches())
    return true;
    return false;

    I am doing something similar but have a problem with Java automatically converting wildcards in path-arguments to the first match (!).
    It seems the JVM is applying some intelligence here and checks if a path is passed to main() and if so, it automatically resolves wildcards (also quotes are escaped/resolved), which is pretty annoying and not what I want, since I do never see the original parameters this way:(
    Is there a way to get the original parameters without the JVM intervening / "helping"?
    Any help would be appreciated, as I want my utility to act just like any other shell-program...

  • Find and print illegal character in a string using regexp

    I have the following simple regexp that checks for illegal characters in a string:
    String regexp = ".*([\\[\\];]+|@).*"; // illegal: [ ] ; @
    String input = "Testing [ 123";
    System.out.println(Pattern.matches(regexp, input));How can I find and print the character that is invalid??
    I've tried using the Matcher class togheter with Pattern but cant get it to work. :(
    Like this:
    Pattern pattern = Pattern.compile(regexp);
    Matcher matcher = pattern.matcher(input);
    matcher.lookingAt();
    int matchedChar = matcher.end();
    if (matchedChar < input.length()) {
        String illegalCharFound = String.valueOf(input.charAt(matcher.end()));
        System.out.println(illegalCharFound);
    }What am I doing wrong?

    1. You call lookingAt(), but you don't check its return value, so you don't know if the regex actually matched.
    2. matcher.end() returns the index of the character following whatever was matched, assuming there was a match (if there wasn't, it will throw an exception). So either it will point to a legal character, or you'll get a StringIndexOutOfBoundsException because an illegal character was found at the end of the input. The start() method would be a better choice, but...
    3. Your regex can match multiple consecutive square brackets or semicolons (and why not put the at-sign in the character class, too?), but you're acting like it can only match one character. Even if there is only one character, group(1) is an easier way to extract it. Also, if there are more than one (non-consecutive) illegal characters, your regex will only find the last one. That's because the first dot-star initially gobbles up the whole input, then backtracks only as far as it has to to satisfy the rest of the regex. If your goal is to provide feedback to whoever supplied the input, it's going to be pretty confusing feedback. You should probably use the find() method in a loop to pick out all the illegal characters so you can report them properly.

Maybe you are looking for

  • IPad 3 will not sync "no limit" email option on exchange server email.

    I just purchased an iPad 3 updated to 5.1.1 software.  My personal exchange email from my ISP will not download the "no limit" option.  However I still have this email setup on my iPad 2 and i have removed/readded and it works no problem.  I have rem

  • Error while creating db control repos using  emca -repos create

    hello all I am trying to configure the db control on the database..db version 10.2.0.2.0 this is the error i m getting when recreating the repos [oracle@t3 bin]$ emca -repos recreate STARTED EMCA at Jun 10, 2009 6:37:34 PM EM Configuration Assistant,

  • Call VB from BSP

    Hi, anyone know if it's posible to call a VB app from within a BSP...? Or even, more generally, executing Windows commands (including starting applications) from a BSP? trond

  • Adobe Premiere Pro Family CS5.5 - Installation Solution

    When attempting to install encore cs5.5 I receive the following error logs: ERROR: Install MSI payload failed with error: 1612 - The installation source for this product is not available. Verify that the source exists and that you can access it. ERRO

  • Numbers doc disappeared, again!

    I have been using Numbers as a job list document. It is saved externally to a G-Tech G-safe RAID 1 drive with access to it via a Mac Mini Server. Twice the file has disappeared. After the first time I thought that it was possible, but unlikely that e