Regex character classes

You can match ']' in a regex character class by specifying it as the first literal
"[]]"How do you match '[' in a regex character class?
Thanks in advance, Mel

sabre150 wrote:
Escape it as in
"[\\[]"Check the Javadoc for Pattern.I swear i initially tried that... thanks sabre

Similar Messages

Regular Expression back-reference in character class

I am trying to capture quoted text (excluding the quotes) with the following pattern:
"(['\"])([^\\1]+)\\1"
The input string might look like:
"That's strange"
or:
'valid "regex" pattern'
I get an exception when trying to compile the pattern, because of the back-reference within character-class brackets: [^\1]
This pattern worked with the org.apache.oro package. Is this a bug in the 1.4 Pattern class? Can you offer an alternative solution?
Thanks,
Tony

Thanks for the reply. Apparently, my pattern wasn't working as I thought with ORO. The pattern you suggested, however, wouldn't do what I need either, because it would match anything between the first and last quote character. I don't want to match the string you included in your reply, because it includes the same quote character that encloses the entire string. If you think about my pattern again, you'll notice that the first capture group should match either a single or double quote character. Then the back-reference should be the exact character (single character) that was matched and captured. So the second capture group should be any number of characters, as long as they don't match the first capture group character, followed by the exact string matched in the first capture group (the single character). You probably are aware that when one or more characters are included in square brackets [], that it specifies a match of a single character that is found anywhere within that character class. So if the first capture group happened to match more than one character (impossible due to the pattern in that grouping), the following character class [^\1] should match any character EXCEPT any of the ones matched in the first capture group...right? There must be a bug in ORO that doesn't match correctly per my description, but it seems there is a bug in JDK 1.4 Pattern that will not even compile it. Just to re-iterate...if \1 matched "abc", [^\1] should match any single character EXCEPT a or b or c. I can get what I want, though, if I do it in 2 matches:
Pattern pat = Pattern.compile("(['\"])(.*)");
Matcher mat = pat.matcher(sText);
mat.matches();
String sQuote = mat.group(1);
String sRem = mat.group(2);
pat = Pattern.compile("([^" + sQuote + "]*)" + sQuote);
mat = pat.matcher(sRem);
mat.matches();
String sWhatIWanted = mat.group(1);
Thanks,
Tony

PatternSyntaxException: Unclosed character class near index

Hi,
I want to replace in a string a expression like "^1:2" by "^(1/2)":
For example, "V/Hz^1:2" would be converted to "V/Hz^(1/2)".
I tried the following code:
String oldExponent = "[^](\\d+):(\\d+)"; // e.g. ^1:2
String newExponent = "^($1/$2)"; // e.g. ^(1/2)
myString = myString.replaceAll(oldExponent, newExponent);but the following exception is thrown in the third line:
java.util.regex.PatternSyntaxException: Unclosed character class near index 13
[^](\d+):(\d+)
^Any idea?
Thanks in advance.

Thank you, jverd, you are pretty right.
Now I tried with
"\\^(\\d+):(\\d+)"and it works.
I tried to avoid ^ being interpreted as the beginning of the line, and didn't realize the interpretation inside the brackets.
Escaping this character works well.

Unclosed character class near index 0

Hi foks,
I am tryin to remove few characters from a string with the help of replaceAll ( String, String ) method.
But at the first replacement itself it gives error "Unclosed character class near index 0".
what does this error mean? I dont have any ' [ ' character in that string. It used to be there, but now I am removing all the occuring of the ' [ ' char right at the creation of string.
Thanks in advance.

Hi,
Here is the code.
[ CODE ]
public static String extract(String para)
String definition = para.substring(para.indexOf("<ol>"),para.indexOf("</ol>"));
StringBuffer meanings=new StringBuffer();
StringReader reader = new StringReader(definition);
StringWriter writer = new StringWriter();
boolean append=false;
int ch;
try
while((ch=reader.read()) != -1)
if(ch=='<' || ch=='&')
append = false;
continue;
else if(ch=='>' || ch==';')
append = true;
continue;
if(append && ch != '[' && ch != ']')
writer.write(ch);
catch(IOException e)
System.out.println("IOException:"+e.getMessage());
catch(Exception e)
System.out.println("IOException:"+e.getMessage());
meanings=writer.getBuffer();
String temp = meanings.toString();
System.out.println("temp:"+temp);
temp=temp.replaceAll("["," ");
System.out.println("temp:"+temp);
temp=temp.replaceAll(" "," ");
System.out.println("temp:"+temp);
return temp;
[ /CODE ]

Changing from upper to lower case without character class

hi all,desperatley need help.
I need to create an algorithm as follows
the implement into a program without using character class
i have no idea how to go about this
Create an algorithm for a program that continually reads (loops) a single character from the user and displays the ordinal (ASCII) value of the character. The program should then change the case of the character, so if it is an 'a' change to an 'A' and vise-versa and then display the new ordinal value. The program should quit when the user enters the '#' character. The program should display an error message if an invalid character is entered.
Implement the program from question 1 into a java program. You are not allowed to use any of the methods provided in the Character class to implement your solution
regards Paul

How time flies...they are already assigninghomework
for the fall term.Yeah, the nice thing about the summer is that we don't
get as many homework problems on the forums. Oh
well...I love how they don't even take the time to reword the hw and instead just post the prof's exact text.
"I need help with a problem, its Ch. 9, Q 23a, can anyone help me?"

Regular Expressions Character Class shortcuts

I have been learning to use regular expressions to modify some of my text files. I noticed that on my ARCH box the Character Class shortcuts do not work e.g. [[:digit:]] in an expression works but \d does not. Is this normal or is my installation broken in some way?

Bebo wrote:
There are several regexp "dialects". It's quite painful actually For instance, as far as I know, \d works in perl, but not in sed or grep.
So, yes, this is normal.
Yeah -- Henry Spencer's regexp stuff is always generally considered the portable form for sed, awk, since they're all based from it. Newer versions of grep though do allow for a -P flag for perl-regexps to be used, but this is non-portable, obviously.
-- Thomas Adam

IsDigit / digit in Character class - ouput ?

Can somebody help me in knowing why such an output is coming while I use forDigit / digit from Character class?
The output that I get is:
The for Digit is
The digit is -1
Note : There is a blank space after the is there.
class chard
     public static void main(String[] args)
          int a = 66;
          char ch1 = 66;
          System.out.println("The for Digit is" + Character.forDigit(a, 2));
          System.out.println("The digit is " + Character.digit(ch1, 2));
}

Why aren't you reading the documentation????
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html
digit(char ch, int radix)
"Returns the numeric value of the character ch in the specified radix.
If the radix is not in the range MIN_RADIX <= radix <= MAX_RADIX or if the value of ch is not a valid digit in the specified radix, -1 is returned. A character is a valid digit if at least one of the following is true: "
How do you expect value 66 which is the same as 'B' to be a valid value in base 2?
/Kaj

Match string against regex or character class in Applescript

Hello,
In my script i get string from user and need to ensure that string contains only alphanumerical characters. There is a sample:
#!/bin/bash
REGEX="^[[:alnum:]]*$"
osascript <<EOF
tell application "SystemUIServer"
repeat
set username to text returned of (display dialog "Enter your name" with icon caution default answer "" buttons{"Continue"})
if text returned of (do shell script "if ! [[ " & quoted form of username & " =~ $REGEX ]]; then echo \"notok\"; fi") as text is equal to "notok" then
display alert "WRONG CHARACTER"
else
exit repeat
end if
end repeat
end tell
EOF
The error:
execution error: Can▒t make text returned of "notok" into type text. (-1700)
How can one fix this? Is it possible to do this in pure AS without invoking shell?

Follow Tony's advice.
Yes, you can code it entirely in AppleScript, but is a verbose beast, none the least, do to the lack of an AppleScript range operator that would permit range('A'..'z') and avoid arduous lists.
Code:
set goodStr to "cAt134"
set badStr to "cAt_134"
if not isalnum(badStr) then
display dialog "Not ok!"
end if
on isalnum(username)
          set uCase to {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", ¬
                    "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", ¬
                    "U", "V", "W", "X", "Y", "Z"}
          set lCase to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ¬
                    "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", ¬
                    "u", "v", "w", "x", "y", "z"}
          set nbr to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
          set alnum to uCase & lCase & nbr
          set status to true as boolean
          repeat with charptr from 1 to count of username
                    set current_char to item charptr of username
-- log current_char
                    if current_char is not in alnum then
                              set status to false
-- display dialog current_char
                              exit repeat
                    end if
          end repeat
          return status
end isalnum

PR Release Strategy- transport of charact/class/ values using ALE

Does anyone have the information relative to ALE set up for release strategy-
I am using transactions BD91 for characteristics; BD 92 for Class and BD 93 for the actual values into release strategy.
I need the information as to what needs to be set up in Development client and receiving clients ( Q and Prod) for these transactions i.e., RFC destinations, define ports, Identify Message types & setup partner profile configurations etc
This will be a great help.
Thanks
Raj

Hi Raj,
Please check the below notes for more inrofmation:
86900 - Transport of Release strategies (OMGQ,OMGS)
799345    Transport of release strategy disabled
10745     Copying classification data to another client
45951     Transporting class data: system / client. - has details of what you are looking for.
Hope this helps.
Regards,
Ashwini.

Regex matcher class

Hi
I have a simple problem in regex.
Whenever i try to write this piece of code i get an illegalStateException
Matcher m = p.matcher(" absdsdfksj ");
while (m.find()) {
     System.out.println("At loc : " + m.start());
     System.out.println("Found : " + m.group());
But if i rewrite these two console print lines into one line then i dont get any exception and it runs fine:
while (m.find()) {
     System.out.println("At loc : " + m.start() + " " + m.group());
Pls clarify the difference.
Thansk in advance
Gaurav

There must be more to the problem because I can run without problems
        Pattern p = Pattern.compile("s");
            Matcher m = p.matcher(" absdsdfksj ");
            while (m.find())
                System.out.println("At loc : " + m.start());
                System.out.println("Found : " + m.group());
            Matcher m = p.matcher(" absdsdfksj ");
            while (m.find())
                System.out.println("At loc : " + m.start() + " " + m.group());
        }What pattern are you using on what data? Please give a sample of both.

TOLERANCE SETTINGS FOR BATCH/CHARACTER/CLASS

Hi gurus
A material with a alternate unit conversion like Kg and Pair, suppose material with 1 pair equal to 40kg as standard. We set a batch mgmt to this material,
Here i have to set a tolerance limit with +or - 8% reference to 40 kg.
Where i have to set the tolerance , help me out
Chris

Hi,
All material data is updated in the base unit of measure. The tolerances can be had on the Base unit of measure.
So, choose this unit carefully since an exact quantity can be expressed in an alternative unit of measure only if its value can be shown with decimal places. It is therefore important to observe the following two principles when defining the base unit of measure:
- The base unit of measure is the unit that provides the maximum precision necessary.
- Conversion from an alternative unit of measure to the base unit of measure should result in a simple decimal, not a recurring (or repeating) decimal.
Regards,
Narayana.

Identify Non English Character in a String

All,
We have a requirement to Identify the Non English Characters from the User Key In data and return an error message saying only valid English, Numeric and some special characters are allowed.
For Example, If the User enters data like "This is a Test data" then the return value should be true. or if he enters something like "My Native Language is inglés" then it should return false. Similarly any Chinese, russian or japansese character entryies should also return false.
How can we achieve this?
Thanks,
Nagarajan.

Hi Nagarajan,
You could use Unicode character blocks or simply craft a regular expression that contains all the characters you need. The latter is easy to understand and gives you full control over which characters you want to allow. E.g. I assume you might want something like this:
if(!"This is a proper input string".matches("[\\s\\w\\p{Punct}]+")) {
// Issue error message and re-get input string
The String method matches() takes a regular expression as input parameter. If you haven't dealt with regular expressions before, check out the Java API help for class java.util.regex.Pattern. Here's a short breakdown of the pattern I used:
<ol>
<li>The square brackets [] enclose a list of allowed characters; here you can explicitly list all allowed characters.</li>
<li>You can specify ranges like a-z as a character class, list individual characters like ;:| or utilize predefined character classes (\s for any whitespace character, \w for all letters a-z and A-Z, underscore and 0-9 and the posix class \p for a list of punctuation symbols). For a complete list check Java API help on java.util.regex.Pattern.
<li>The + at the end indicates that the characters listed can occur once or more.</li>
</ol>
There's other ways to achieve what you want, but I think this might be an easy way to start with.
Cheers, harald

Tell me how much my regex sucks, and help me make it better

uncle alice,
can you look at this and see if you see anything wrong with it, or better yet, do you know of a better solution using regex?
following regex is used to extract all links from an html page (href, img src) both absolute and relative:
(?im)(?:(?:(?:href)|(?:src))[ ]*?=[ ]*?[\"'])(((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))|((?:\\/{0,1}[\\w\\.]+)+))[\"']
String absolute = m.group(2);
String relative = m.group(3);

There's a lot of good material in that regex, pedagogically speaking. :D Although you've solved your problem another way, I'd like to comment on some common errors I see.
{color:#008000}(?im){color} : For anyone who doesn't know, these are inline flags, whose effect is the same as the compiler flags CASE_INSENSITIVE & MULTILINE. But you don't need the multiline flag. People often assume they have to use that flag when they're searching for strings that may span line breaks, but all it does is change the meaning of the ^ and $ metacharacters. (By default, they match the beginning and end, respectively, of the target string; with the multiline flag set, they also match the beginning and end of logical lines within the target string.) You aren't using those anchors, so that flag is irrelevant.
{color:#008000}(?:(?:href)|(?:src)){color} : The outer set of parentheses is needed to contain the effect of the pipe, but the inner sets are just noise. In fact, most of the parens in your regex are unnecessary. Excessive grouping can significantly affect the performance of the regex if you really get carried away with it, although it takes quite a bit more than you've got here. The real problem is the visual complexity they add; regexes don't need any help in that department! :-/
{color:#008000}[ ]*?{color} : You don't need to put the space character in square brackets to match it, although doing so can make the regex a little easier to read. More importantly, an HTML tag can contain any whitespace character at that point, not just spaces, so you should use \\s instead. Also, you shouldn't use a reluctant quantifier unless the thing it's quantifying can match something you don't want it to. Since they're inherently slower than normal (greedy) quantifiers, you should take care not to use them where they aren't needed, which is the case here.
{color:#008000}(?:http|https){color} : Whenever you have two alternatives, of which one is a prefix of the other, you should list the longer one first. The alternatives are tried in the order they're listed, so listing them in the wrong order can reduce the efficiency of your regex in much the same way that using reluctant quantifiers inappropriately can. It isn't a problem here, since the next thing the regex has to match is so definite (i.e., "://"), but you should get in the habit of following that rule. In this case, you can just make the final letter optional: {color:#008000}https?{color}
{color:#008000}\\/{2}{color} and {color:#008000}\\/{0,1}{color} : You don't need to escape the forward-slash in Java regexes; that's only necessary in languages like Perl and JavaScript that have language-level support for regexes. They use the forward-slash by default as the quoting character for regex literals, so they have to escape it for the same reason we have to escape the double-quote (but some languages also let you choose different quoting characters each time). Also, I agree with paulcw that the {2} just adds unnecessary complexity in this case. As for the '{0,1}', its meaning is exactly the same as '?', so why not use that instead?.
{color:#008000}[\\/|\\.]{color} : First, you don't need any of those backslashes. The forward-slash is never special, and the period loses its special meaning inside the square brackets. The pipe is just a pipe, too, so your character class matches a slash, a pipe, or a period, which is probably not what you meant. You need to understand that character classes are like a language within a language. A regex is effectively a set of linear instructions: match this AND then this AND then this, etc.. If you want to create an OR branch, you have to do so explicitly, using a pipe or a quantifier. But the semantics change drastically when you go inside the square brackets. Since a character class only matches one character at a time, AND is irrelevant and OR is implicit: match this character OR this one OR one from this range, etc.. The only metacharacters that are needed in character classes are those that are used for set operations: the caret for NOT, hyphen for ranges, etc.; everything else is just a character.
If you'd like me to keep going, I'll need to know more about your exact intentions. Do you want the protocol (e.g., "http://") to be optional? What about the quotes around the URL? Finally, I don't understand what the final pipe in your regex is supposed to do, but I'm pretty sure it isn't working. :D

Regex and implementing FilenameFilter problem

Hello,
So what I'm trying to do is to create a program that takes a certain set of files, pulls the first line of each file and uses it to name the file. Right now, I'm at the point of getting a listing of files based on a patterns. So when I run the program on the command line (of a windows machine), it spit out the files that I'm looking for. Something like:
java FileRenamer *.txt
Above should produce a listing of only files that have .txt on them (I want to have the capability to choose *.txt or whatever other combination of pattern match).
To do the above, I want to use a FileNameFilter interface to figure out what files match. The problem that I'm running into is that when I run a unit test against the getFilesListBasedOnPattern method, I get:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*.txt
The problem is that the *.txt has a regex character (the *) and I'm not sure how make it behave like the wildcard in the dos command line where *.txt means everything that has .txt at the end.
The code listing is below. Does anyone have any suggestions on how to best approach this?
mapsmaps
=======> Code below <=======
// unit test snippet that causes blow out:
FileRenamer fr = new FileRenamer();
String [] strArrFilesBasePattern = fr.getFilesListBasedOnPattern(dirTestFiles,"*.txt");
====
//main program
package com.foo.filerenamer;
import java.io.File;
import java.io.FilenameFilter;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* TODO Use regexp to filter out input to *.txt type of thing or nothing else
public class FileRenamer
    // Vallid file patterns are *.*, ?
    public static final String strVALIDINPUTCHARS = "[_.a-zA-Z0-9\\s\\*\\?-]+";
    private static Pattern regexPattern = Pattern.compile(strVALIDINPUTCHARS);
    private static Matcher regexMatcher;
     * @param args
     * @throws InterruptedException
    public static void main(String[] args) throws InterruptedException
        int intMillis = 0;
        if (args.length > 0)
            try
                intMillis = Integer.parseInt(args[0]);
                System.out.println("Sleep set to " + intMillis + " seconds");
            catch (NumberFormatException e)
                intMillis = 5000;
                System.out.println("Sleep set to default of " + intMillis + " since first parameter was non-int");
            for (int i=0;i<args.length;i++)
                System.out.println("hello there - args["+i+"] = "+ args);
Thread.sleep(intMillis);
// TODO Auto-generated method stub
public boolean checkArgs(String [] p_strAr)
boolean bRet = false;
if (p_strAr.length != 1)
return false;
else
regexMatcher = regexPattern.matcher(p_strAr[0]);
bRet = regexMatcher.matches();
return bRet;
public String[] getFilesListBasedOnPattern(File p_dirFilesLoc, String p_strValidPattern)
String[] strArrFilteredFileNames = p_dirFilesLoc.list(new RegExpFileFilter(p_strValidPattern));
return strArrFilteredFileNames;
class RegExpFileFilter implements FilenameFilter
private String m_strPattern = null;
private Pattern m_regexPattern;
public RegExpFileFilter(String p_strPattern)
m_strPattern = p_strPattern;
m_regexPattern = Pattern.compile(m_strPattern);
public boolean accept(File m_directory, String m_filename)
if (m_regexPattern.matcher(m_filename).matches())
return true;
return false;

I am doing something similar but have a problem with Java automatically converting wildcards in path-arguments to the first match (!).
It seems the JVM is applying some intelligence here and checks if a path is passed to main() and if so, it automatically resolves wildcards (also quotes are escaped/resolved), which is pretty annoying and not what I want, since I do never see the original parameters this way:(
Is there a way to get the original parameters without the JVM intervening / "helping"?
Any help would be appreciated, as I want my utility to act just like any other shell-program...

Find and print illegal character in a string using regexp

I have the following simple regexp that checks for illegal characters in a string:
String regexp = ".*([\\[\\];]+|@).*"; // illegal: [ ] ; @
String input = "Testing [ 123";
System.out.println(Pattern.matches(regexp, input));How can I find and print the character that is invalid??
I've tried using the Matcher class togheter with Pattern but cant get it to work. :(
Like this:
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.lookingAt();
int matchedChar = matcher.end();
if (matchedChar < input.length()) {
String illegalCharFound = String.valueOf(input.charAt(matcher.end()));
System.out.println(illegalCharFound);
}What am I doing wrong?

1. You call lookingAt(), but you don't check its return value, so you don't know if the regex actually matched.
2. matcher.end() returns the index of the character following whatever was matched, assuming there was a match (if there wasn't, it will throw an exception). So either it will point to a legal character, or you'll get a StringIndexOutOfBoundsException because an illegal character was found at the end of the input. The start() method would be a better choice, but...
3. Your regex can match multiple consecutive square brackets or semicolons (and why not put the at-sign in the character class, too?), but you're acting like it can only match one character. Even if there is only one character, group(1) is an easier way to extract it. Also, if there are more than one (non-consecutive) illegal characters, your regex will only find the last one. That's because the first dot-star initially gobbles up the whole input, then backtracks only as far as it has to to satisfy the rest of the regex. If your goal is to provide feedback to whoever supplied the input, it's going to be pretty confusing feedback. You should probably use the find() method in a loop to pick out all the illegal characters so you can report them properly.

Regex character classes

Similar Messages

Maybe you are looking for