Evaluate Regular expression complexity

Hi all
I've a problem on regular expression usage in my application.
I'm using a regular expression to identify objects and fetch them to be served depending on an input string with has to be matched.
each object has a property representing a regular expression to be matched to be candidate for fetching.
my program receive an external input string, then loops on the full objects collection identifying which are the object whose regular expression match the input stream.
doing an example:
obj1) key = "J*SDK"
obj2) key = "Ja*6*"
obj3) key = "JEE*"
if the input string is "Java 6.0 SDK" obj1 and obj2 are cadidate, while obj3 is discarded.
up to now everithing is fine, now here is my question:
i want only one object as output and I want the one best matching my input string.
this means that
-> obj1 is matching 4 chatacter ans has only one wildchar
-> obj2 is matching 3 characters and has two wildcard
so obj2 is discarded since it's regular expression is more complex than the obj1 one
my problem is HOW to evaulate correctly such complexity for each candidate object to be able to choose my best object.
is there some formal rule / api for this?
I'd like to match all wildcards into the regex, but doing this "by hand" would surely result in some bug due to some missing case, so a "third party" API or a grammar rule would be useful.
hoping for you help.
regards
Michele Sacchetti

ok, after days and days of research i came up to this solution:
1) I used this (http://www.brics.dk/automaton) package for regular expression which let me access the internal state automa data
2) use the getShortestExample() method to retrieve the shortest string matching the given regExp
3) evaluate the Levenshtein Distance between the given string and the one to be matched
PROs:
1) the regexp logic is fully handled by the same state machine which take cares of pattern matching in the first phase
2) the library provide me a non-regexp string to be used with comparison (e.g Levenshtein Distance evaluation)
CONs:
1) the methods getShortestExample is unaware of string to be matched, so if i use "aab|aaa" to match "aab" the method gets the first shortest sort alfabetically, that is "aaa", so I get a LD of 1 even if it should be 0, but it's quite a good deal for my application.
@endasil : your solution rely on grouping, and is based on a pre-parsing done manually so it basically went back to the "manual" parsing i wanted to avoid
Another way I'd like to give a try but had to give up was to use ANTLR (www.antlr.org) to create a parser for regular expression and then evaluating the resulting "tree size" of the parser, but wasn't able to find a formal description of RegExp grammar on the net.
do you have any suggestion or comment on my solution (or other to give a try? )

Similar Messages

Replace with regular expression -complex logics

What is the best way to replace input string with following rules:
1. All occurences of two or more sequential Hyphen-symbols (symbol: "-") must be replaced with one hypen. So if input string was "---" then output should be "-". If input was "a-a-a" then output stays as it is.
2. All occurences of two or more sequential Space-symbols (symbol: " ") must be replaced with one Space. So if input string was " " then output should be " ". If input was "a a a" then output stays as it is.
3. After rules 1-2 are applied following rule apply: All occurences of Space-symbol followed immediately after Hypen-symbol or vice vers- hypen followed by Space must be replaced with Hypen. So if input string was " -" or "- " then output should be "-". If input was "a a-a" then output stays as it is.
4. After rules 1-2-3 are apllied following rule must be applied: String start and end symbol may not be Space or Hypen. So if input string was " a" or "-a" or "a " or "a-" then output should be "a".
All rules 1-4 must be apllied to input string.
Example of the replacement logic:
input: 'a aa ---- ss-ee -'
output: 'a aa-ss-ee'
I think i should use function "regexp_replace" somehow.
This solution below doesn't work because it outputs two consecuent hypens "--":
with t as(
select 'a   aa ---- ss-ee -' str from dual
select regexp_replace(str, '--','-') a from t;
/*a   aa -- ss-ee -*/Can you suggest one neat regular expression for that?

SQL>with t as(
2   select 1 id,'a   aa ---- ss-ee -' str from dual union
3   select 2, 'a        aa -------- ss-ee ---- - - -' str from dual union
4   select 3, '-' str from dual union
5   select 4, '----- -- -a -' str from dual union
6   select 5, '-a-a-a-a----a-a-a-' str from dual
7 )
8 select id,
9 str,
10 trim(trim('-' from regexp_replace(str, '( |-){2,}', '\1'))) new_str,
11 length(str) len,
12 length(trim(trim('-' from regexp_replace(str, '( |-){2,}', '\1')))) new_len
13 from t;
ID STR                                 NEW_STR                      LEN          NEW_LEN
1 a   aa ---- ss-ee -                 a aa ss-ee                    19               10
2 a     aa -------- ss-ee ---- - - - a aa ss-ee                    34               10
3 -                                                                  1
4 ----- -- -a -                       a                             13                1
5 -a-a-a-a----a-a-a-                  a-a-a-a-a-a-a                 18               13

Match beginning of line with Regular Expression

I'm confused about dreamweaver's treatment of the characters
^ and $ (beginning of line, end of line) in regex searches. It
seems that these characters match the beginning of the file, not
the beginning of the various lines in the file. I would expect it
to work the other way around. A search like:
(^.)
should match every line in the file, so that a find/replace
could be performed at the beginning of each line, like this:
HELLO$1
which would add 'HELLO' at the start of each line in the
file.
Instead, this action only matches the first character of the
file, sticks 'HELLO' in front of it, and then quits (or moves on to
the next file). The endline character $ behaves in a similar
fashion, matching only the end of the file, not the end of each
line.
I've searched, and all the literature about regular
expressions in dreamweaver seems to indicate that I'm expecting the
correct behavior:
www.adobe.com/devnet/dreamweaver/articles/regular_expressions_03.html
quote:
^ Beginning of input or line ^T matches "T" in "This good
earth" but not in "Uncle Tom's Cabin"
$ End of input or line h$ matches "h" in "teach" but not in
"teacher"
Thanks for any insight, folks.

Hi Winston,
I am still digesting the material from the regular expression book and will take sometime to become proficient with it.
It seems that using groupCount() to eliminate the unwanted text does not work in this case, since all the lines returned the same value. Ie 3 posted earlier. This may be because the patterns are complex and only a few were grouped together. Otherwise, could you provide an example using the string posted as opposed to a hyperthetic one. In the meantime, at least one solution have been found by defining an additional special pattern “\\A[^%].*\\Z”, before combining / intersecting both existing and the new special pattern to get the best of both world. Another approach that should also work is to evaluate the size of String.split() and only accept those lines with a minimum number of tokens.
Anyhow, I have come a crossed another minor stumbling block in the mean time with the following line, where some hidden characters is preventing the existing pattern from reading it:
o;?Mervan Bay 40 Boyde St 7 br t $250,000 X West Park AE
Below is the existing regular expression that works for other lines with the same pattern but not for special hidden characters such as “o;?”:
\\A([A-Z][a-z]*){1,2} [0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La [0-9] br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}\\ZIs it possible to come up with a regular expression to ignore them so that this line could be picked up? Would also like to know whether I could combine both the special pattern “\\A[^%].*\\Z” with existing one as opposed to using 2 separate patterns altogether?
Many thanks,
Jack

Need advice on negating a whole string line with regular expression

Hi All,
I am not able to ignore / get rid of the following line even though my Java 6 (Windows XP) String Pattern matching has not taken cater for it:
*% Cleared: 61%*
Below is the existing Java String Pattern matching in the simple program:
Pattern pattern = Pattern.compile("(^.*[A-Z][a-z]*){1,2} \\d{0,4}/?\\d{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La \\d br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}.*$");This pattern is working for valid strings.
The following pattern has included "^(?!.*\.\.).*$" into the existing one but had no luck still:
Pattern pattern = Pattern.compile("^(?!.*\.\.).*$|((^.*[A-Z][a-z]*){1,2} \\d{0,4}/?\\d{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La \\d br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}.*$)");This picked up other rubbish including "*% Cleared: 61%*".
I am looking for a single regular expression that applies to the whole line.
I am quite new to regular expression but has read through Regular Expressions Cookbook (Oreilly - 2009) and is still not familiar with advance functions such as lookahead / lookbehind...
Your assistance would be appreciated.
Thanks,
Jack

Hi Winston,
I am still digesting the material from the regular expression book and will take sometime to become proficient with it.
It seems that using groupCount() to eliminate the unwanted text does not work in this case, since all the lines returned the same value. Ie 3 posted earlier. This may be because the patterns are complex and only a few were grouped together. Otherwise, could you provide an example using the string posted as opposed to a hyperthetic one. In the meantime, at least one solution have been found by defining an additional special pattern “\\A[^%].*\\Z”, before combining / intersecting both existing and the new special pattern to get the best of both world. Another approach that should also work is to evaluate the size of String.split() and only accept those lines with a minimum number of tokens.
Anyhow, I have come a crossed another minor stumbling block in the mean time with the following line, where some hidden characters is preventing the existing pattern from reading it:
o;?Mervan Bay 40 Boyde St 7 br t $250,000 X West Park AE
Below is the existing regular expression that works for other lines with the same pattern but not for special hidden characters such as “o;?”:
\\A([A-Z][a-z]*){1,2} [0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4} ([A-Z][a-z]*){1,2} St|Rd|Av|Sq|Cl|Pl|Cr|Gr|Dr|Hwy|Pde|Wy|La [0-9] br [h|u|t] \\$\\d+,\\d+|\\$\\d*\\,\\d+,\\d+ ([A-Z][a-z]*){1,}\\ZIs it possible to come up with a regular expression to ignore them so that this line could be picked up? Would also like to know whether I could combine both the special pattern “\\A[^%].*\\Z” with existing one as opposed to using 2 separate patterns altogether?
Many thanks,
Jack

Regular Expressions in num-exp

Hello All,
I had a problem on my SRST gateway with num-exp insterting a repeating pattern into my 7-digit dialing when in fallback mode.
For a brief example, the 7digit internal dialing is 21621.. or 21622..
The num-exp statement of 'num-exp 2... 2162...' was not allowing me to 7-digit dial directly from one IP phone to another while in fallback mode.
When I dialed 2162154 the 2162 would hit the num-exp and be expended to 2162162.
I have a work around that uses a voice translation-rule, applied to the call-manager-fallback config that will translate a 7-digit dialed string to the 4 digit dialed string which then hits the 4-digit to 7-digit num-exp and it is working fine.
However, I was wondering if there is a way to use regular expressions in num-exp so that perhaps I can skip the intermediate step of using the translation-rule. Based off my existing translation-rules that are working properly, I figured something like this might work for num-exp:
'num-exp /^2$[12]..$$/ /2162\1/'
But when I try to issue a num-exp with a regular expression I get the following message.
Incorrect format for Number macro pattern
regular expression must be of the form ^((\+)?([0-9#*A-F.]|(\\\*))+(\$)?)$
I have tried a number of different combinations with no success. I always get the same message. The regular expression that I tried first was:
'num-exp ^2... 2162...'
This is when I first saw the "Incorrect format..." message and figured that is must be possible. Is this just a generic warning similar to when you try to use complex regular expressions with the 'translation-rule' command vs. the 'voice translation-rule' command and in reality you cannot use regular expressions in the num-exp command?
Thank you,
Leo

Hi Chris,
Thank you for taking the time to answer my question. It looks like the answer is no, num-exp does not support regular expressions.
I don't insist on using num-exp for this I was just hoping to kill two birds with one stone and possibly skip the intermediate step of translating the 7-digits dial to 4-digits using a translation-rule just to expand from 4-digit to 7 again. This is only an issue while in SRST if a user tries to dial using 7-digits. We have a 7-digit internal dialing scheme and normally my num-exp is just to expand the 4 digits sent from the telco to our 7-digit internal dialing. The problem is that both our prefix and part of our DID range start with 21 so while in SRST if a user tried to dial a 7-digit DN, say 2162154, after they dialed the 4th digit (2162) that pattern would hit the num-exp and get expanded to 2162162. I was hoping to create a num-exp using a regular expression that would only expand a four digit string that begins with a 2 to seven digits and not any string that begins with a 2. This would 1) expand the four digits sent from the telco and 2) not match a seven digit string that begins with a 2 such as 2162154 which may be dialed by a user.
Again, this is only an issue while in SRST and I have a pretty good work around so I'm fine with not being able to use a regular expression as part of my num-exp config. I just thought it would be a cool application of a regular expression if it was possible.
Thanks again for answering my question.
Leo

Regular expressions with boolean connectives (AND, OR, NOT) in Java?

I'd like to use regular expression patterns that are made up of simple regex patterns connected via AND, OR, or NOT operators, in order to do some keyword-style pattern matching.
A pattern could look like this:
(.*Is there.*) && (.*library.*) && !((.*badword.*) || (^$))
Is there any Java regex library that allows these operators?
I know that in principle these operators should be available, since Regular languages are closed under union, intersection, and complement.

AND is implicit,
xy -- means x AND yThat's not what I need, though, since this is just
concatenation of a regex.
Thus, /xy/ would not match the string "a y a x",
because y precedes x.So it has to contain both x and y, but they could be
in any order?
You can't do that easily or generally.
"x.*y|y.*x" wouldll work here, but obviously
it will get ugly factorially fast as you add more
terms.You got that right: AND means the regex operands can appear in any order.
That's why I'm looking for some regex library that does all this ugly work for me. Again, from a theoretical point of view, it IS possible to express the described semantics of AND with regular expressions, although they will get rather obfuscated.
Unless somebody has done something similar in java (e.g., for C++, there's Ragel: http://www.cs.queensu.ca/~thurston/ragel/) , I will probably use some finite-state-machine libraries and compile the complex regex's into automata (which can be minimized using well-defined operations on FSMs).
>
You'd probably just be better off doing multiple
calls to matches() or whatever. Yes, that's another possibility, do the boolean operators in Java itself.
Of course, if you
really are just looking for literals, then you can
just use str.contains(a) && !str.contains(b) &&
(str.contains(c) || str.contains(d)). You don't
seem to need regex--at least not from your example.OK, bad example, I do have "real" regexp's in there :)

Help needed regarding regular expressions

hello
i need to write a program that recieves a matematical expression and evaluates
it...in other words a calculator :)
i know i need to use regular expressions inorder to determine if the input is legal or not ,but i'm really having trouble setting the pattern
the expression can be in the form : Axxze2223+log(5)+(2*3)*(5+4)
where Axxze2223 is a variable(i.e a combination of letters and numbers.)
where as: l o g (5) or log() or Axxx33aaaa or () are illegal
i tried to set the pattern but i got exceptions or it just didnt work the way i wanted it .
here's what i tried to do at least for the varibale form:
"\\s*(*([a-zA-Z]+\\d)+)*\\s*";
i'm really new to this...and i can't seem to set the pattern by using regular expressions,how can i combine all the rules to one string?
any help or references would be appreciated
thanks

so i'll explain
let's say i got token "abc22c"(let's call it "token")
i wan't to check if it's legal
i define:
String varPattern = "\\s*[a-zA-Z]+\\d+\\s*";If you want to check a sequence of ASCII characters, longer than one, followed by a single digit, the whole possibly surrounded by spaces -- yes.
>
now i want to check if it's o.k
so i check:
token.matches(varPattern);
am i correct?Quite. It's better to compile the Pattern (Pattern.compile(String)), create a java.util.regex.Matcher (Pattern#matcher(CharSequence)), and test the Matcher for Matcher#matches().
(Class.method -> static method, Class#method -> instance method)
>
now i'm having problem defining pattern for log()
sin() cos()
that brackets are mandatory ,and there must be an
expression inside
how do i do that?First, I'd check the overall function syntax (a valid name, brackets), then whether what's inside the brackets is a valid expression (maybe empty), then whether that expression is valid for that function (presumably always?).
I might add I'm no expert on parsing, so that's more a supposition than a guide.

Pllllllease help!!!! Regular expressions

Hi....i've been trying for almost 40 hours to write a regular expression and i don't succeed.......
I need a regex that matches a polinomial number.
that polinomal number "divided" in bracets with a complex number between them.
the enviorment i'm using is java eclipse with the REGEX library .
Example for a correct input:
avi=(25.0+5.0i)x^2+(15.3+2.85i)x^1
this is the regex i wrote
^[\w]+=[\$]{1}[-+]?[\d]+[\\.]{1}[\d]++[+-]{1}[\d]+[\\.]{1}[0-9]+i]*[\$]{1}[xX]{1}[\\^]{1}[\d]+$
the regex has to start with a name than = than 1 bracet than possibly a + or - than a number with a decimal than + or minus than i than 1 bracket (closing bracket) than x or X than 1 "^" sign than atleast one number than i want the pattern e.g (15.3+2.85i)x^1
the regex currently supports only this case : avi=(25.0+5.0i)x^2
but i want it to support this: avi=(25.0+5.0i)x^2+(15.3+2.85i)x^1
and the "+" sign between the two polinoms must be a "+" and not a "-"
how do i define that the pattern will repeat it self more once or more - when i say the pattern I mean this one : (25.0+5.0i)x^2
in conclusion. how do i fix it??
plz plz plz help me i'm going nuts and me and java's api are close buddies after this weekend still i don't succeed...
tnx alot in advance...........
(:

Arg! Why do you double-post???
I've just taken considerable time in answering your other post ( http://forum.java.sun.com/thread.jspa?messageID=10018850 ) and then found out that you posted here as well with additional information.
It's considered rude to make people duplicate their effort by posting the same question twice. Keep to one thread.

Regular Expressions in ABAP

Hi, all!
Are there any possibilities to make use of regular expressions in 4.6C (FMs, classes)?
Regards,
Maxim.

Hi Maxim and all others whoever may read this ,
try the following code - but be patient and leave my (c) where it is:::
You may also have a look at the specialities of JavaScipt RegEx.
Yours,
Johannes
* an Example Call:
DATA return_value TYPE string.
DATA: match type ztmatch,
lastindex TYPE i,
leftcontext TYPE string,
rightcontext TYPE string,
index TYPE i,
searchstring TYPE string,
modifier TYPE string,
regex TYPE string,
found TYPE boolean,
       error_message type string.
regex = 'b+(a)*(b+)'.
searchstring = 'abbbbabbaa'.
modifier = ''.
CALL METHOD ztr_bw_tools=>regex
IMPORTING
    LASTINDEX     = lastindex
    LEFTCONTEXT   = leftcontext
    RIGHTCONTEXT = rightcontext
    INDEX         = index
    FOUND         = found
    MATCH         = match
    RETURN_VALUE = return_value
    ERROR_MESSAGE = error_message
CHANGING
    SEARCHSTRING = searchstring
    MODIFIER      = modifier
    REGEX         = regex
Changing SEARCHSTRING TYPE STRING DEFAULT '' "string to be regex applicated
Changing MODIFIER TYPE STRING DEFAULT '' "/gims/
Changing REGEX TYPE STRING DEFAULT '' "regular expression
Exporting LASTINDEX TYPE I
Exporting LEFTCONTEXT TYPE STRING
Exporting RIGHTCONTEXT TYPE STRING
Exporting INDEX TYPE I
Exporting FOUND TYPE BOOLEAN "boolean variable (X=true, -=false, space=unknown)
Exporting MATCH TYPE ZTMATCH "For use with regular expressions
Exporting RETURN_VALUE TYPE STRING
Exporting ERROR_MESSAGE TYPE STRING
method REGEX .
* (c) by Johannes Rumpf - 2006 -
* Matching-Table of part matches of brackets
*DATA: BEGIN OF ztmatch,
*        comp TYPE string,
*      END OF ztmatch.
DATA source TYPE string.
DATA js_processor TYPE REF TO cl_java_script.
js_processor = cl_java_script=>create( ).
* JavaScript --> ABAP variablen Mapping
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'regex'
                     CHANGING data      = regex ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'searchstring'
                     CHANGING data      = searchstring ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'modifier'
                     CHANGING data      = modifier ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'index'
                    CHANGING data      = index ).
js_processor->bind( EXPORTING name_obj = 'abap'
                              name_prop = 'match'
                    CHANGING data      = match ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'lastindex'
                    CHANGING data      = lastindex ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'leftcontext'
                    CHANGING data      = leftcontext ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'rightcontext'
                    CHANGING data      = rightcontext ).
js_processor->bind( EXPORTING name_obj = ' '
                              name_prop = 'found'
                     CHANGING data      = found ).
* eine Leerzeile hinzufügen
DATA: wa like line of match.
wa-comp = ' '.
append wa to match.
* JavaScript Code *REGEX*
CONCATENATE
'var re = new RegExp(regex, modifier);'
'var m = re.exec(searchstring);'
' if (m == null) {'
'    found = false;'
' } else {'
' found = true; '
'    index = m.index;'
'    lastindex = m.lastIndex;'
'    leftcontext = m.leftContext;'
'    rightcontext = m.righContext;   '
'    var len = abap.match.length;'
'    for (i = 0; i < m.length; i++) {'
'      abap.match[len-1].comp = m<i>;'
'      abap.match.appendLine();'
'      len++;'
INTO source SEPARATED BY cl_abap_char_utilities=>cr_lf.
return_value = js_processor->evaluate( source ).
error_message = js_processor->LAST_ERROR_MESSAGE.
endmethod.

Regular Expressions on an online database

Hi everyone, I'm kinda new so if what I'm about to ask seems crazy bear with me.
What kind of issues would be involved with using a regular expression to search an online database?
The Site says that Java 1.4 now supports a great deal of regex functions but I'm wondering would it be possible to type a regex into some control and for t to evaluate this against what's in the Database?
Am I making sense?
Thanks everyone,
Robin Spiteri.

Normally you access a database via JDBC & SQL. So the question is whether SQL supports regular expressions; this has nothing to do with java (any version). As far as I know regexps are not part of standard SQL although this might not be true for all SQL databases --> I think you cannot use regexps directly to query a database.
If the database you're using supports regexps, and you are really sure that this will be "THE" database, and the system won't change, you can of course included regexps in the SQL, but that'll make it difficult to move to another database
What you definitely can do is loop through the resultset and kick out the records you don't want, e.g.
while(rs.next()) {
if(!matchMyRegExp(rs.getString(1))) {
// loop if reg exp is not matched
continue;
// do something
}This should work on any database, but at the cost of transferring more data than needed from the db to the application.

Regular expressions or "wild cards" and Mail rules

I'm working on building several rules to deal with various emails types i get during the average day at work. some of them are automatically generated by our servers, others are from customers etc. Some of the messages generated by our servers have a blank subject line. Mail won't let me build a rule with a blank subject line (or anything blank for that matter)...is there a way around this?
Also, does mail allow for use of regular expressions? I really don't know anything about regular expressions, but was hoping to learn enough to accomplish some more complex rules if it's possible. If not, i won't bother trying to tackle such an obscure language
Thanks in advance!

No, there is no provision for this in Mail's rules.
However, Mail's Junk Filter is based on a much more sophisticated method (Latent Semantic Analysis) that typically is more effective at detecting this sort of thing, in part because methods based on Regular Expressions usually have an unacceptably high false positive rate if they are made general enough to catch all the possible misspellings.

Regular Expressions from C++ to Java

Does anyone know if it is possible to use the same regular expressions used in C++ with in Java? Here it is a regular expression used in a C++ program:
^[[:digit:]]{4}$"I did a test, but it does not work. It always evaluate to false.
Thanks,
Sid

Does anyone know if it is possible to use the same
regular expressions used in C++ with in Java? C++ doesn't have regexes, so presumably you are using a library (and noting that might help.)

Does Applescript include and support Regular Expressions?

I'm starting to study regular expressions, and I just discovered AppleScript. So I wondered if AppleScript - or some other Mac utility - offers features that help with regular expressions.
I'm trying to figure out a variety of search and replace operations. For example, I'd like to copy a table, then replace every row in Table A that includes the word "billion," replacing every row with the word "million" in Table B.
I just wondered if AppleScript offers any shortcuts for figuring out complex regex operations like this.
Thanks.

While AppleScript has the usual kinds of comparisons, I don't know if I would consider them regular expressions. The Terminal gives you access to various utilities that do use regular expressions - see the bash and re_format manual pages.

Sed Request Regular Expression Format

A quick question....
There are lots of different syntaxes for regular expressions and lots for SED. With the sed_request and sed_response filter I have tried different syntaxes for marking word boundaries, but don't know which to use. The \b syntax is supported but doesn't seem to do anything and the \< and \> syntax throughs up errors when I start up the web server. I tried the more complex (?<!\w)(?=\w) and (?<=\w)(?!\w) but the \w isn't supported. I am wondering if I just can't do this.... I am trying to stop SQL injection attacks using a syntax such as
s/\bselect\b.{1,100}?\bfrom\b.{1,100}?\bwhere\b//g
Are word boundaries not supported?

Actually, the entries should be \\< and \\>, which looks double escaped to me but the entries are correct then
Input fn="insert-filter"
method="(GET|HEAD|POST)"
filter="sed-request"
sed="s/</\\</g"
sed="s/%3c/\\</g"
sed="s/%3C/\\</g"
sed="s/>/\\>/g"
sed="s/%3e/\\>/g"
sed="s/%3E/\\>/g"
sed="s/\x3C ?iframe//g"
sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<javascript://g"
sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
sed="s/\\<href\\>[^a-zA-Z_0-9]*?\\<javascript://g"
sed="s/\\<alert\\>[^a-zA-Z_0-9]*?\x28//g"
sed="s/\\<src\\>[^a-zA-Z_0-9]*?\\<http://g"
sed="s/\\<type\\>[^a-zA-Z_0-9]*?\\<text\\>[^a-zA-Z_0-9]*?\\<vbscript\\>//g"
sed="s/\\<href\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
sed="s/\\<url\\>[^a-zA-Z_0-9]*?\\<javascript://g"
sed="s/\x3C ?script\\>//g"
sed="s/\\<type\\>[^a-zA-Z_0-9]*?\\<text\\>[^a-zA-Z_0-9]*?\\<javascript\\>//g"
sed="s/\\<url\\>[^a-zA-Z_0-9]*?\\<vbscript://g"
sed="s/(asfunction|javascript|vbscript|data|mocha|livescript)://g"
sed="s/(?i:<object[ /+\t].*?((type)|(codetype)|(classid)|(code)|(data))[ /+\t]*=)//g"
sed="s/(?i:[ /+\t\"\'`]datasrc[ +\t]*?=.)//g"
sed="s/(?i:<link[ /+\t].*?href[ /+\t]*=)//g"
sed="s/(?i:<meta[ /+\t].*?http-equiv[ /+\t]*=)//g"
sed="s/(?i:<embed[ /+\t].*?SRC.*?=)//g"
sed="s/(?i:[ /+\t\"\'`]on\x63\x63\x63+?[ +\t]*?=.)//g"
sed="s/(?i:<?frame.*?[ /+\t]*?src[ /+\t]*=)//g"
sed="s/(?i:<isindex[ /+\t>])//g"
sed="s/(?i:<form.*?>)//g"
sed="s/(?i:<script.*?[ /+\t]*?src[ /+\t]*=)//g"
sed="s/(?i:<script.*?>)//g"
sed="s/\\<select\\>.{0,40}buser\\>//g"
sed="s/\\<select\\>.{0,40}\\<substring\\>//g"
sed="s/\\<select\\>.{0,40}\\<ascii\\>//g"
sed="s/\\<user_tables\\>//g"
sed="s/\\<user_tab_columns\\>//g"
sed="s/\\<all_objects\\>//g"
sed="s/\\<drop\\>//g"
sed="s/\\<substr\\>//g"
sed="s/\\<sysdba\\>//g"
sed="s/\\<user_password\\>//g"
sed="s/\\<user_users\\>//g"
sed="s/\\<user_constraints\\>//g"
sed="s/\\<column_name\\>//g"
sed="s/\\<substring\\>//g"
sed="s/\\<object_type\\>//g"
sed="s/\\<object_id\\>//g"
sed="s/\\<user_ind_columns\\>//g"
sed="s/\\<column_id\\>//g"
sed="s/\\<table_name\\>//g"
sed="s/\\<object_name\\>//g"
sed="s/\\<rownum\\>//g"
sed="s/\\<user_group\\>//g"
sed="s/\\<utl_http\\>//g"
sed="s/\\<select\\>.*?\\<to_number\\>//g"
sed="s/\\<group\\>.*\\<byb.{1,100}?\\<having\\>//g"
sed="s/\\<select\\>.*?\\<data_type\\>//g"
sed="s/\\<isnull\\>[^a-zA-Z_0-9]*?\x28//g"
sed="s/\\<union\\>.{1,100}?\\<select\\>//g"
sed="s/\\<insert\\>[^a-zA-Z_0-9]*?\\<into\\>//g"
sed="s/\\<select\\>.{1,100}?\\<count\\>.{1,100}?\\<from\\>//g"
sed="s/\x3B[^a-zA-Z_0-9]*?\\<drop\\>//g"
sed="s/\\<select\\>.*?\\<to_char\\>//g"
sed="s/\\<dbms_java\\>//g"
sed="s/\\<nvarchar\\>//g"
sed="s/\\<utl_file\\>//g"
sed="s/\\<inner\\>[^a-zA-Z_0-9]*?\\<join\\>//g"
sed="s/\\<select\\>.{1,100}?\\<from\\>.{1,100}?\\<where\\>//g"
sed="s/\\<intob[^a-zA-Z_0-9]*?\\<dumpfile\\>//g"
sed="s/\\<delete\\>[^a-zA-Z_0-9]*?\\<from\\>//g"
sed="s/\x3B[^a-zA-Z_0-9]*?\\<shutdown\\>//g"
sed="s/\\<dba_users\\>//g"
sed="s/\\<select\\>.{1,100}?\\<top\\>.{1,100}?\\<from\\>//g"

Search application that can handle regular expressions

I am desperately seeking for my PhD an OSX 10.6 application that can search through all my data. That application need to have *efficient search algorithms* for complex pattern searching.
For example I want to search all my documents having the word cancer inside the file.
Not only search for the filename cancer BUT search inside the documents with all the extensions (pdf, rtf, doc, etc.)
In Windows I use +Filelocator Pro+.
Is there a Mac OSX application like +Filelocator Pro+ for search?
An application that can handle regular expression support, with any of the following options:
+Export results to Text, command line options, Network drive searching, Boolean searches (e.g. AND, OR, and NOT), Perl compatible regexp option, Built in file viewer, Word, Excel and PDF searching, Open Office, Word Perfect option using IFilters, Unicode support, support for: ZIP, RAR, CAB, 7-Zip, ARJ, Bzip, CHM, CPIO, DEB, DMG, GZIP, HFS, ISO, LZH, MSI, NSIS, RPM, TAR, UDF, WIM, XAR, Z formats, Active Scripting support, Export as Text, CSV, XML, HTML, or XSLT custom format, File attribute searching , Relative date/time searches, Repositionable contents pane, Search within search, Exclude folders list,+ etc

You might try the freeware EasyFind. It allows boolean and wildcard searches. How many of the other features in your "wish list" it offers I haven't checked. If EasyFind doesn't offer sufficient power, take a look at FoxTrot Personal Search or FoxTrot Professional Search.
And of course there's always grep which can be incredibly powerful once you learn all its ins and outs.
Regards.

Evaluate Regular expression complexity

Similar Messages

Maybe you are looking for