Regular Expressions - Dictionary List

Good day all -
I am trying to create a signature(s) to provide a minimalistic "content management" scenario. We have a list of about 150 words that we need to flag if they are seen in user data. I know how to create the regex string for a single word ... and can use the | pipe to separate the words to allow me to combine multiple words into a single signature ... but just how large is the STRING field? 255? 128? unlimited?
The idea hopefully is to use only 10 - 20 signatures to cover the whole list. Certainly hope to avoid having to write a new signature for each word!
Looking for suggestions and/or experiences of anyone else having attempted to do something like this.
Maybe someone found that you could insert unlimited words in the list but by doing so they overtaxed the sensor ... or that it appeared that using more than 10 words in a list was an iffy proposition.
All your inputs will be appreciated - whether I like what I hear or not! Thanks everyone.
Hank Schupp

It all depends on how many states the regular expression will create in the engine. The maximum is 64K bytes, which is a pretty long string. You will have to experiment to find the maximum number of words you can pipe into a single signature. I would recommend dividing the 150 words into different categories and writing one signature for each category. In general, writing one signature for 20 words will make it easy to manage.

Similar Messages

Mining for regular expressions from text

Hi all
I have a situation where I need to develop a set of regular expressions from lists.
I have lists of files uploaded and downloaded, and when the event happened. I need to analyze these and boil them down to a set of regular expressions.
The regular expressions will be used in the future to check the logs for activities. For example:
foo*.zip is downloaded every night. Has it been downloaded tonight.
I am looking for help in analyzing the lists of files and deriving regular expressions to describe them.
Any ideas?
Thanks all

check the url http://java.sun.com/developer/technicalArticles/releases/1.4regex/index.html

Can somebody help me in getting some good material for Regular Expressions and IP Community list

can somebody help me in getting some good material for Regular Expressions and IP Community list

I'm not sure what you mean by "IP Community list", but here are 3 reference sites for Regular Expressions:
Regular Expression Tutorial - Learn How to Use Regular Expressions
http://www.regular-expressions.info/tutorial.html
Regular Expressions Cheat Sheet by DaveChild
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
Regular Expressions Quick Reference
http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm

Regular expression to remove SIDs from list

Hey everyone so I have a script and in it I try to find the current user, as well as the last user. Currently I'm using a regular expression to throw the System account SIDs and other things like that out of the list. However this doesn't seem to be taking
SQL SIDs out of my list ie. ReportServer$LOCAL with the SID S-1-5-80-4264962431-3932693095-1576469926-235475122-2208986020
What's the best way to get only user SIDs?
Here's what I have so far:
$Win32User = Get-WmiObject -Class Win32_UserProfile -ComputerName $Computer
$Win32User = $Win32User | Where-Object {($_.SID -notmatch "^S-1-5-\d[18|19|20]$")}
$Win32User = $Win32User | Sort-Object -Property LastUseTime -Descending
$LastUser = $Win32User | Select-Object -First 1
When I can this it breaks since there is no actual user tied to this SID:
$UserSID = New-Object System.Security.Principal.SecurityIdentifier($LastUser.SID)
$User = $UserSID.Translate([System.Security.Principal.NTAccount])
Thanks for any help!!

Start cmd.exe as your domain user and run whoami /user to get your own SID. You will get something like this:
USER INFORMATION
User Name SID
========================= =============================================
DEMOSYSTEM\CustomAccount1 S-1-5-21-3419697060-3810377854-678604692-1000
The last part of the SID, in this case 1000, is called RID. When you create a new user or computer object in your domain, only the RID will be different from your own SID. The RID starts on 1000 and increments as you create new objects.
If you are only interested in user accounts from the same domain as your user, you can use a regex like this, only based on your own SID:
$_.SID -match '^S-1-5-21-3419697060-3810377854-678604692-[\d]{4,10}$'

Regular expression for BBcode list to html list

Hi,
we are migrating BBforum to Jive forum.
BBforums has data which contains BBcode Strings.i found the follwoing code after googled.
public static String bbcode(String text) {
String html = text;
Map<String, String> bbMap = new HashMap<String, String>();
bbMap.put("(\r\n|\r|\n|\n\r)", "<br/>");
bbMap.put("\\[b\\](.+?)\\[b\\]", "<strong>$1</strong>");
for (Map.Entry entry : bbMap.entrySet()) {
html =
html.replaceAll(entry.getKey().toString(), entry.getValue().toString());
return html;
i have BBcode with format like
[list] [*]blue[*]red[*] green[list]
i have to replace this by <ul><li>blue</li><li>red</li>
Can any one sugeest me java regular expression which replace as above
Edited by: 875452 on Jul 31, 2011 8:03 AM

Moderator advice: Please read the announcement(s) at the top of the forum listings and the FAQ linked from every page. They are there for a purpose.
Then edit your post and format the code correctly.
Moderator action: Moved from Development Tools » General Questions
db

How to create a list of string if a regular expression is given ?

Hi folks,
I have a regular expression say abcd[a-z]\\\.[0-9] . ( please ignore one '\')
For this string i know that
following string matches successfully
1. abca.0
2. abcb.1
3. abcz.9 ......etc n number of combination are possible.
is there any algorithm which will create some randomn strings from a regular expression.
input to algorithm : some string pattern
output to algorithm : some matching strings ( can be a single or an array of matching strings)
Thanks in advance..
Sethu
Edited by: Sethumadhavan on Apr 16, 2008 6:32 AM

Can u please give little more explanation...
If i get some some values i can exit with the values ... and from the values i got i can ignore the duplicates ...
But i am not getting the basic algorithm to get list of strings.....( DFA? or NFA?)
thanks
sethu

Using regular expressions in content dictionaries

I need to create a content dictionary containing regular expressions. I also need to use the "\" to escape some characters that would otherwise be regex meta-characters. When using a regex in a message filter, the "\" must be doubled because of parsing issues. This is clearly documented in the manual. What isn't documented is whether this must be done when the regex is within a content dictionary.
Here's an example:
if (mail-from == "@bad-domain\\.com$") { drop(); }
I want to change this filter to:
if (mail-from-dictionary-match("bad-domains")) { drop(); }
So what do I put in the content dictionary, "@bad-domain\.com$" or "@bad-domain\\.com$"?
Thanks,

You should use this:
"@bad-domain\.com$"
The above tells the system to deference the "." (any character) to mean a literal period.
If you used this,
@bad-domain\\.com$
What the system would match is "@bad-domain\.com", because the first backslash would dereference the second backslash, to be taken literally. So, the double backslashes is the wrong format.
The only reason you see it in the final results when you've committed changes is that the system adds the backslash for you so that there's no error when it gets compiled.
Also, you could have left the single backslash out completely too and it would probably work.
"@bad-domain.com$"
If you sed that as your pattern in the dictionary, it would match against these:
@bad-domain.com
@bad-domainncom
@bad-domain1com
@bad-domain&com
basically, the "." means any character. But to be precise, you should only add one backslash in front of special characters. Here is a list of special characters:
| ( ) [ { ^ $ * + ? .
For a detailed explanation about special characters and how to use them, please see the Advanced User Guide.
[https://supportportal.ironport.com/irppcnctr/srvcd?u=http://secure-support.soma.ironport.com/subproducts/x-c_series&sid=900001]

Re: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't forget about the regular expression potential

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

[SOLVED]ZSH and regular expressions

Hi
I am getting into regular expressions and i have noticed that with my .zshrc file i have some problem. In bash this expression works:
\^\[^#]
but not also in zsh. I have also noted that regular expression works fine with other zshrc configurations found in archwiki (like grml) but i want to have my configuration. And i really can't find what command make a difference
My .zshrc file is pulled from this site https://github.com/slashbeast/things/bl … s/DOTzshrc.
# .zshrc
# Author: Piotr Karbowski <[email protected]>
# License: beerware.
# Basic zsh config.
umask 077
ZDOTDIR=${ZDOTDIR:-${HOME}}
ZSHDDIR="${HOME}/.config/zsh.d"
HISTFILE="${ZDOTDIR}/.zsh_history"
HISTSIZE='10000'
SAVEHIST="${HISTSIZE}"
export EDITOR="/usr/bin/vim"
export TMP="$HOME/tmp"
export TEMP="$TMP"
export TMPDIR="$TMP"
export TMPPREFIX="${TMPDIR}/zsh"
if [ ! -d "${TMP}" ]; then mkdir "${TMP}"; fi
if ! [[ "${PATH}" =~ "^${HOME}/bin" ]]; then
export PATH="${HOME}/bin:${PATH}"
fi
# Not all servers have terminfo for rxvt-256color. :<
if [ "${TERM}" = 'rxvt-256color' ] && ! [ -f '/usr/share/terminfo/r/rxvt-256color' ] && ! [ -f '/lib/terminfo/r/rxvt-256color' ] && ! [ -f "${HOME}/.terminfo/r/rxvt-256color" ]; then
export TERM='rxvt-unicode'
fi
# Colors.
red='\e[0;31m'
RED='\e[1;31m'
green='\e[0;32m'
GREEN='\e[1;32m'
yellow='\e[0;33m'
YELLOW='\e[1;33m'
blue='\e[0;34m'
BLUE='\e[1;34m'
purple='\e[0;35m'
PURPLE='\e[1;35m'
cyan='\e[0;36m'
CYAN='\e[1;36m'
NC='\e[0m'
# Functions
if [ -f '/etc/profile.d/prll.sh' ]; then
. "/etc/profile.d/prll.sh"
fi
run_under_tmux() {
# Run $1 under session or attach if such session already exist.
# $2 is optional path, if no specified, will use $1 from $PATH.
# If you need to pass extra variables, use $2 for it as in example below..
# Example usage:
# torrent() { run_under_tmux 'rtorrent' '/usr/local/rtorrent-git/bin/rtorrent'; }
# mutt() { run_under_tmux 'mutt'; }
# irc() { run_under_tmux 'irssi' "TERM='screen' command irssi"; }
# There is a bug in linux's libevent...
# export EVENT_NOEPOLL=1
command -v tmux >/dev/null 2>&1 || return 1
if [ -z "$1" ]; then return 1; fi
local name="$1"
if [ -n "$2" ]; then
local file_path="$2"
else
local file_path="command ${name}"
fi
if tmux has-session -t "${name}" 2>/dev/null; then
tmux attach -d -t "${name}"
else
tmux new-session -s "${name}" "${file_path}" \; set-option status \; set set-titles-string "${name} (tmux@${HOST})"
fi
t() { run_under_tmux rtorrent; }
irc() { run_under_tmux irssi "TERM='screen' command irssi"; }
over_ssh() {
if [ -n "${SSH_CLIENT}" ]; then
return 0
else
return 1
fi
reload () {
exec "${SHELL}" "$@"
confirm() {
local answer
echo -ne "zsh: sure you want to run '${YELLOW}$@${NC}' [yN]? "
read -q answer
echo
if [[ "${answer}" =~ ^[Yy]$ ]]; then
command "${=1}" "${=@:2}"
else
return 1
fi
confirm_wrapper() {
if [ "$1" = '--root' ]; then
local as_root='true'
shift
fi
local runcommand="$1"; shift
if [ "${as_root}" = 'true' ] && [ "${USER}" != 'root' ]; then
runcommand="sudo ${runcommand}"
fi
confirm "${runcommand}" "$@"
poweroff() { confirm_wrapper --root $0 "$@"; }
reboot() { confirm_wrapper --root $0 "$@"; }
hibernate() { confirm_wrapper --root $0 "$@"; }
detox() {
if [ "$#" -ge 1 ]; then
confirm detox "$@"
else
command detox "$@"
fi
has() {
local string="${1}"
shift
local element=''
for element in "$@"; do
if [ "${string}" = "${element}" ]; then
return 0
fi
done
return 1
begin_with() {
local string="${1}"
shift
local element=''
for element in "$@"; do
if [[ "${string}" =~ "^${element}" ]]; then
return 0
fi
done
return 1
termtitle() {
case "$TERM" in
rxvt*|xterm|nxterm|gnome|screen|screen-*)
local prompt_host="${(%):-%m}"
local prompt_user="${(%):-%n}"
local prompt_char="${(%):-%~}"
case "$1" in
precmd)
printf '\e]0;%s@%s: %s\a' "${prompt_user}" "${prompt_host}" "${prompt_char}"
preexec)
printf '\e]0;%s [%s@%s: %s]\a' "$2" "${prompt_user}" "${prompt_host}" "${prompt_char}"
esac
esac
git_check_if_worktree() {
# This function intend to be only executed in chpwd().
# Check if the current path is in git repo.
# We would want stop this function, on some big git repos it can take some time to cd into.
if [ -n "${skip_zsh_git}" ]; then
git_pwd_is_worktree='false'
return 1
fi
# The : separated list of paths where we will run check for git repo.
# If not set, then we will do it only for /root and /home.
if [ "${UID}" = '0' ]; then
# running 'git' in repo changes owner of git's index files to root, skip prompt git magic if CWD=/home/*
git_check_if_workdir_path="${git_check_if_workdir_path:-/root:/etc}"
else
git_check_if_workdir_path="${git_check_if_workdir_path:-/home}"
git_check_if_workdir_path_exclude="${git_check_if_workdir_path_exclude:-${HOME}/_sshfs}"
fi
if begin_with "${PWD}" ${=git_check_if_workdir_path//:/ }; then
if ! begin_with "${PWD}" ${=git_check_if_workdir_path_exclude//:/ }; then
local git_pwd_is_worktree_match='true'
else
local git_pwd_is_worktree_match='false'
fi
fi
if ! [ "${git_pwd_is_worktree_match}" = 'true' ]; then
git_pwd_is_worktree='false'
return 1
fi
# todo: Prevent checking for /.git or /home/.git, if PWD=/home or PWD=/ maybe...
# damn annoying RBAC messages about Access denied there.
if [ -d '.git' ] || [ "$(git rev-parse --is-inside-work-tree 2> /dev/null)" = 'true' ]; then
git_pwd_is_worktree='true'
git_worktree_is_bare="$(git config core.bare)"
else
unset git_branch git_worktree_is_bare
git_pwd_is_worktree='false'
fi
git_branch() {
git_branch="$(git symbolic-ref HEAD 2>/dev/null)"
git_branch="${git_branch##*/}"
git_branch="${git_branch:-no branch}"
git_dirty() {
if [ "${git_worktree_is_bare}" = 'false' ] && [ -n "$(git status --untracked-files='no' --porcelain)" ]; then
git_dirty='%F{green}*'
else
unset git_dirty
fi
precmd() {
# Set terminal title.
termtitle precmd
if [ "${git_pwd_is_worktree}" = 'true' ]; then
git_branch
git_dirty
git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
else
unset git_prompt
fi
preexec() {
# Set terminal title along with current executed command pass as second argument
termtitle preexec "${(V)1}"
chpwd() {
git_check_if_worktree
man() {
if command -v vimmanpager >/dev/null 2>&1; then
PAGER="vimmanpager" command man "$@"
else
command man "$@"
fi
# Are we running under grsecurity's RBAC?
rbac_auth() {
local auth_to_role='admin'
if [ "${USER}" = 'root' ]; then
if ! grep -qE '^RBAC:' "/proc/self/status" && command -v gradm > /dev/null 2>&1; then
echo -e "\n${BLUE}*${NC} ${GREEN}RBAC${NC} Authorize to '${auth_to_role}' RBAC role."
gradm -a "${auth_to_role}"
fi
fi
#rbac_auth
# Check if we started zsh in git worktree, useful with tmux when your new zsh may spawn in source dir.
git_check_if_worktree
if [ "${git_pwd_is_worktree}" = 'true' ]; then
git_branch
git_dirty
git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
else
unset git_prompt
fi
# Le features!
# extended globbing, awesome!
setopt extendedGlob
# zmv - a command for renaming files by means of shell patterns.
autoload -U zmv
# zargs, as an alternative to find -exec and xargs.
autoload -U zargs
# Turn on command substitution in the prompt (and parameter expansion and arithmetic expansion).
setopt promptsubst
# Control-x-e to open current line in $EDITOR, awesome when writting functions or editing multiline commands.
autoload -U edit-command-line
zle -N edit-command-line
bindkey '^x^e' edit-command-line
# Include user-specified configs.
if [ ! -d "${ZSHDDIR}" ]; then
mkdir -p "${ZSHDDIR}" && echo "# Put your user-specified config here." > "${ZSHDDIR}/example.zsh"
fi
for zshd in $(ls -A ${HOME}/.config/zsh.d/^*.(z)sh$); do
. "${zshd}"
done
# Completion.
autoload -Uz compinit
compinit
zstyle ':completion:*' matcher-list 'm:{a-z}={A-Z}'
zstyle ':completion:*' completer _expand _complete _ignored _approximate
zstyle ':completion:*' menu select=2
zstyle ':completion:*' select-prompt '%SScrolling active: current selection at %p%s'
zstyle ':completion::complete:*' use-cache 1
zstyle ':completion:*:descriptions' format '%U%F{cyan}%d%f%u'
# If running as root and nice >0, renice to 0.
if [ "$USER" = 'root' ] && [ "$(cut -d ' ' -f 19 /proc/$$/stat)" -gt 0 ]; then
renice -n 0 -p "$$" && echo "# Adjusted nice level for current shell to 0."
fi
# Fancy prompt.
if over_ssh && [ -z "${TMUX}" ]; then
prompt_is_ssh='%F{blue}[%F{red}SSH%F{blue}] '
elif over_ssh; then
prompt_is_ssh='%F{blue}[%F{253}SSH%F{blue}] '
else
unset prompt_is_ssh
fi
case $USER in
root)
PROMPT='%B%F{cyan}%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{blue}%1~${git_prompt}%F{blue} %# %b%f%k'
PROMPT='%B%F{blue}%n@%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{cyan}%1~${git_prompt}%F{cyan} %# %b%f%k'
esac
# Ignore lines prefixed with '#'.
setopt interactivecomments
# Ignore duplicate in history.
setopt hist_ignore_dups
# Prevent record in history entry if preceding them with at least one space
setopt hist_ignore_space
# Nobody need flow control anymore. Troublesome feature.
#stty -ixon
setopt noflowcontrol
# Fix for tmux on linux.
case "$(uname -o)" in
'GNU/Linux')
export EVENT_NOEPOLL=1
esac
# Aliases
alias cp='cp -iv'
alias rcp='rsync -v --progress'
alias rmv='rsync -v --progress --remove-source-files'
alias mv='mv -iv'
alias rm='rm -iv'
alias rmdir='rmdir -v'
alias ln='ln -v'
alias chmod="chmod -c"
alias chown="chown -c"
if command -v colordiff > /dev/null 2>&1; then
alias diff="colordiff -Nuar"
else
alias diff="diff -Nuar"
fi
alias grep='grep --colour=auto'
alias egrep='egrep --colour=auto'
alias ls='ls --color=auto --human-readable --group-directories-first --classify'
# Keys.
case $TERM in
rxvt*|xterm*)
bindkey "^[[7~" beginning-of-line #Home key
bindkey "^[[8~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward #Up Arrow
bindkey "^[[B" history-beginning-search-forward #Down Arrow
bindkey "^[Oc" forward-word # control + right arrow
bindkey "^[Od" backward-word # control + left arrow
bindkey "^H" backward-kill-word # control + backspace
bindkey "^[[3^" kill-word # control + delete
linux)
bindkey "^[[1~" beginning-of-line #Home key
bindkey "^[[4~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward
bindkey "^[[B" history-beginning-search-forward
screen|screen-*)
bindkey "^[[1~" beginning-of-line #Home key
bindkey "^[[4~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward #Up Arrow
bindkey "^[[B" history-beginning-search-forward #Down Arrow
bindkey "^[Oc" forward-word # control + right arrow
bindkey "^[Od" backward-word # control + left arrow
bindkey "^H" backward-kill-word # control + backspace
bindkey "^[[3^" kill-word # control + delete
esac
bindkey "^R" history-incremental-pattern-search-backward
bindkey "^S" history-incremental-pattern-search-forward
if [ -f ~/.alert ]; then cat ~/.alert; fi
Thanks for all the help.
Last edited by Shark (2013-05-11 22:32:24)

Raynman wrote:
"This expression doesn't work", "It doesn't work" ...
Could you try being a bit more specific?
Firstly, i am sorry i didn't post the output. I should have know better.
Secondly, chill out.
I have used above regex with grep command. Output from terminal is:
zsh: bad pattern: ^[^#]
In bash it works perfectly.
If i issue "setopt re_match_pcre" i have the same ouput as above.
EDIT: If i issue "unsetopt no_match" it actually works but i have to change the regex from "\^\[^#]" to "\^[^#]" otherwise i get the same output as above. In bash both options work.
Last edited by Shark (2013-05-11 22:07:21)

Regular expression question (should be an easy one...)

i'm using java to build a parser. im getting an expression, which i split on a white-space.
how can i build a regular-expression that will enable me to split only on unquoted space? example:
for the expression:
(X=33 AND Y=44) OR (Z="hello world" AND T=2)
I will get the following values split:
(X=33
AND
Y=34)
OR
(Z="hello world"
AND
T=2)
and not:
(Z="
hello
world"
thank you very much!

Instead of splitting on whitespace to get a list of tokens, use Matcher.find() to match the tokens themselves: import java.util.*;
import java.util.regex.*;
public class Test
public static void main(String[] args) throws Exception
    String str = "(X=33 AND Y=44) OR (Z=\"hello world\" AND T=2)";
    List<String> tokens = new ArrayList<String>();
    Matcher m = Pattern.compile("[^\\s\"]+(?:\".*?\")?").matcher(str);
    while (m.find())
      tokens.add(m.group());
    System.out.println(tokens);
}{code} The regex I used is based on the assumptions that there will be at most one run of quoted text per token, that it will always appear in the right hand side of an expression, and that the closing quote will always mark the end of the token. If the rules are more complicated (as sabre150 suggested), a more complicated regex will be needed. You might be better off doing the parsing the old-fashioned way, with out regexes.

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Regular Expressions in Discoverer 10g

I have figured out how to use LIKE to do some basic sorting, but I would like to be able to use the regular expressions that are supposed to be built into Oracle 10g. However, I keep getting the following error message when I try to use REGEXP_INSTR, REGEXP_LIKE, REGEXP_REPLACE, and REGEXP_SUBSTR:
Function REGEXP_LIKE has not been registered with the EUL.
What does this mean? Do I have to get permission from the Discoverer admin to use regular expresssions in the end user layer?
Thanks,
Rachel

Rachel,
This means that these functions are not available in Discoverer.
All available function are listed in Edit Calculation window of Calculation wizard - if you click on Functions radio button and click on '+' next to All Functions then you can see all database functions available to Discoverer and regular expression functions are not listed there.
However, you should be able to create your own database functions using regular expressions and then register them in Discoverer. All you would do is create a function ( a wrapper) around actual regular expression functions you want to use, compile the functions in database and then register in Discoverer Admin.
e.g.,
create or replace function my_own_reg_replace (parameter1, paramete2 etc)
return varchar2
my_result varchar2(2000);
begin
my_result := regexp_replace(paramete1, parameter2 etc);
return my_result
end;
Hope this helps.,
Raman
Hope this helps.

I need help renaming a file using regular expressions in Bridge.

Hi,
I work at a university, and we are working through files for our Thesis and Dissertations. We have been renaming them to make them more consistent. I am just wondering if there is a regular expression that could help with this process?
Here is come examples of current file names;
THESIS 1981 H343G
Thesis 1981 g996e
THESIS-1981-A543G
I don't need to change the actual names of the files. just how they are formatted.
Proper case on Thesis.
Hyphens(-) in all white space.
First letter capital, last letter lowercase on the call no (H343g)
So the list above should look like;
Thesis-1981-H343g
Thesis-1981-G996e
Thesis-1981-A543g
I have seen people do some pretty cool things with regular expressions! Any help would be greatly appreciated. Thanks!

You would be better off using a script to do this as an example as I don't think it would be possible in the Bridge re-name.
Using ExtendScript Toolkit or a Plain text editor copy the code into either and save it out as Filename.jsx
This needs to be saved into the correct folder. this is found by going to the preferences in Bridge, selecting Startup Scripts, this will open the folder where the script is to be saved.
Once this is done close and re-start Bridge.
To Use: Goto the Tools Menu and select Rename PDFs
Make sure you test the code with a few copied files into a seperate folder first to make sure it does what you want.
The script will do all PDF files in the selected folder.
#target bridge
if( BridgeTalk.appName == "bridge" ) {
renamePDFs = MenuElement.create("command", "Rename PDFs", "at the end of Tools");
renamePDFs.onSelect = function () {
app.document.deselectAll();
var thumbs = app.document.getSelection("pdf");
for( var z in thumbs){
var Name = decodeURI(thumbs[z].spec.name);
var parts = Name.toLowerCase().replace(/\s/g,'-').match(/(.*)(-)(.*)(-)(.*)(\.pdf)/);
var NewName = parts[1].replace(/^[a-z]/, function(s){ return s.toUpperCase() });
NewName += parts[2]+parts[3]+parts[4]+parts[5].toUpperCase().replace(/[A-Z]$/, function(s){ return s.toLowerCase() });
NewName += parts[6];
thumbs[z].spec.rename(NewName);

Introduction to regular expressions ... last part.

Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
Having fun with regular expressions - Part 3
In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
;I can even use strings composed of values like 'abc, xyz , 012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
Ready to try your own solution?
Does it look like this?
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz , 012', ' *, *', '|') || ')$')
;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
SELECT data
FROM (SELECT data, LOWER(DATA) search
          FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE search LIKE 'abc%'
    OR search LIKE 'xyz%'
    OR search LIKE '012%'
SELECT data
FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
          FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE search IN ('abc', 'xyz', '012')
; I'll leave it to your imagination how a complete non regular example with 'abc, xyz , 012' as search condition would look like.
As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
WITH t AS (SELECT '123-4567' phone
             FROM dual
            UNION
           SELECT '01 345678'
             FROM dual
            UNION
           SELECT '7 87 8787'
             FROM dual
SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
FROM t
;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
Single digit values: 0-9
Double digit values: 00-99
Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
SELECT data
FROM TABLE(regex_utils.gen_data('0123456789', 3))
WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$')
; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
Using first my brute force generator, I'll check if I've missed any possible combination:
SELECT data
FROM TABLE(regex_utils.gen_data('03.', 15))
WHERE REGEXP_LIKE(data,
                   '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
; Looks good to me. Let's check on some sample data:
WITH t AS (SELECT '127.0.0.1' ip
             FROM dual
            UNION
           SELECT '256.128.64.32'
             FROM dual
SELECT t.ip
FROM t WHERE REGEXP_LIKE(t.ip,
                   '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
; No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
WITH t AS (SELECT '127.0.0.1' ip
             FROM dual
SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
FROM t
; Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
WITH t AS (SELECT '127.0.0.1' ip
             FROM dual
SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                            '[0-9]*([0-9]{3})(\.?)', '\1\2'
FROM t
; Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
"[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
Here's mine:
SELECT data
FROM TABLE(regex_utils.gen_data('a1@.', 9))
WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
Does it look like this?
WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
             FROM dual
            UNION
           SELECT 'http://x/'
             FROM dual
SELECT t.URL
FROM t
WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
Update: Improvements in 10g2
All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
Rewriting my example from the first lesson, the WHERE clause would look like this:
WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
'^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
FROM dual
;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
FROM dual
;This is old style, "greedy" search pattern, returning everything until the last space.
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
FROM dual
;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
FROM dual
;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
FROM dual
;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
C.
Fixed some typos ...
Message was edited by:
cd
Included 10g2 features
Message was edited by:
cd

Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
but for better understanding what I'd like to get
next example:
a have strings of two blocks separated by space.
in the first block 5 symbols of [01] in the second block 3 symbols of [01].
In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
with t as (select '10(!)010 10(>)1' num from dual union all
select '1112(!)0 111' from dual union all --incorrect because of '2'
select '(!)10010 011' from dual union all
select '10010(!) 101' from dual union all
select '10010 100(>)' from dual union all
select '13001 110' from dual union all -- incorrect because of '3'
select '100!01 100' from dual union all --incorrect because of ! without (!)
select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
--end of test data

Regular expression and pattern matching/replacing

I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll(replacementStr);
But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
Any suggestions are most welcome.

I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

Regular Expressions - Dictionary List

Similar Messages

Maybe you are looking for