Regular expressions (easy money?)
This may be easy money for someone who knows a little bit of Perl...
Here is the code I want to search :
text 1
<!-- comment a test
Text for comA
Text2 for ComA Text3 forComA
-->
text 2
<!-- comment b
Text for comB
Text2 for ComB
-->
text 3
<!-- comment c
Text for comC
-->
Text 4
I use the demo applet for the ORO project from Apache to test my RegExpr : http://jakarta.apache.org/oro/demo.html
Here is my goal :
=============
Obtain three matches for the following pattern :
<!-- comment a [any combination of words/spaces/linespaces] -->
I am convinced that this is an easy one but I'm a newbie at Perl and RegExpr, so a little help would be greatly appreciated.
Thanks in advance!
Vince
Presuming you really do want perl, the following works.
use strict;
my $s = join("", <>); # Slurp in entire file
my $i=0;
while ($s =~ s/(<!-- comment (a|b|c)[^>]+-->)//)
my $result = $1;
$i++;
print "=====$i\n";
print "$result\n\n";
}
Similar Messages
-
Regular expression question (should be an easy one...)
i'm using java to build a parser. im getting an expression, which i split on a white-space.
how can i build a regular-expression that will enable me to split only on unquoted space? example:
for the expression:
(X=33 AND Y=44) OR (Z="hello world" AND T=2)
I will get the following values split:
(X=33
AND
Y=34)
OR
(Z="hello world"
AND
T=2)
and not:
(Z="
hello
world"
thank you very much!Instead of splitting on whitespace to get a list of tokens, use Matcher.find() to match the tokens themselves: import java.util.*;
import java.util.regex.*;
public class Test
public static void main(String[] args) throws Exception
String str = "(X=33 AND Y=44) OR (Z=\"hello world\" AND T=2)";
List<String> tokens = new ArrayList<String>();
Matcher m = Pattern.compile("[^\\s\"]+(?:\".*?\")?").matcher(str);
while (m.find())
tokens.add(m.group());
System.out.println(tokens);
}{code} The regex I used is based on the assumptions that there will be at most one run of quoted text per token, that it will always appear in the right hand side of an expression, and that the closing quote will always mark the end of the token. If the rules are more complicated (as sabre150 suggested), a more complicated regex will be needed. You might be better off doing the parsing the old-fashioned way, with out regexes. -
Help in regular expression matching
I have three expressions like
1) [(y2009)(y2011)]
2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
3) [(y2009M1d20)(y2011M12d31)]
i want regular expression pattern for the above three expressions
I am using :
REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
but its giving results for all above expressions while i want different expression for each.
i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
ThanksI dont get your question Can you post your desired output? and also give some sample data.
Please consider the following when you post a question.
1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
You can use the following query and do a copy past of the output.
select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
that are asked the answer is already there.
3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
I have the following table called sales
with sales
as
select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
union all
select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
select *
from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
For example in the above sales table, I want to know the total quantity and number of invoice for each product.
The output should look like this
Prod_id sum_qty count_inv
1 145 2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
use the {noformat}{noformat} tags.
The usage of the tag is like this.
<place your code here>\
7. If you are posting a *Performance Related Question*. Please read
{thread:id=501834} and {thread:id=863295}.
Following those guide will be very helpful.
8. Please keep in mind that this is a public forum. Here No question is URGENT.
So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude. -
Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:
<!--getPage to getViewBean-->
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel\(]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:
<!--getPage to getViewBean-->
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel\(]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected] -
How do I have to define a regular expression to filter out data from file?
Hi all,
I need to extract parts of lines of a ASCII file and didn't get it done with my low knowledge of regular expressions
The file contains hundreds of lines and I am just interested in a few lines, within that lines I just need a part of the data.
One original line looks like that:
TP3| |TP_SMD|Nicht in Stueckliste|~TP TP_SMD TESTPUNKT|-|0|87.770|157.950|0|top|c| |other|TP_SMD|TP_SMD_60RF-TP
Only the bold and underlined information is of interest, I don't need the rest.
I can open that file, read in each line but then I am struggling to pick out only the lines of interest (starting with TP), taking that TP with its number and the coordinates following later on and then writing these shortened lines to a new text file. So the new line should look like that:
TP3; 87.770;157.950;0 (It doesn't matter if the separator will be ; or |)
I thought of using regular expressions - is that the right way or is there a better approach?
Thanks & regards,
gedi, using LabVIEW 8.5
Regards,
gediHi max,
for finding a specific part of a string you can use the "Match Pattern" VI, it is located in the Strings Palette.
Maybe the Extract Numbers.vi example in the examples browser library can help you.
What I did to filter out my data of interest is first to sort out only the columns which I want to have -
then there are still a lot of lines remaining I don't need (this is the thing described above).
The rest I am going to filter out with a (then easy) regular expression with the "Match Pattern" VI.
Regards,
gedi
Regards,
gedi -
Regular Expression to remove space in HTML Tag
Hello All,
My HTML string is like below.
select '<CityName>RICHMOND</CityName>
<StateCd>ABCD CDE
<StateCd/>
<CtryCd>CAN</CtryCd>
<CtrySubDivCd>BC</CtrySubDivCd>' Str from dual
Desired Output is
<CityName>RICHMOND</CityName><StateCd>ABCD CDE
<StateCd/><CtryCd>CAN</CtryCd><CtrySubDivCd>BC</CtrySubDivCd>
i.e. want to remove those spaces from tag value area having only spaces otherwise leave as it is. Please help to implement the same using Regular expression.Hi,
It's unclear what you want. This site seems to be formatting your message in some odd way.
Post a statement like
SELECT '...' FROM dual;
without any formatting, to show your input, and post the exact output you want friom that, with as little formatting as possible. It might help if you use some character like ~ instead of spaces (just for posting; we'll find a solution that works for spaces).
To remove the text that consists of spaces and nothing else between the tags, you can say
REGEXP_REPLACE ( str
, '> +<'
, '><'
How is this string being generated? Maybe there's some easier, more efficient way to keep the bad sub-wrtings out of the string in the first place. -
Regular expressions in Format Definition add-on
Hello experts,
I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
Summary of my problem:
In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
:61:071222D208,00N026
:86:P 12345678BELASTINGDIENST F8R03782497 $GH
$0000009 BETALINGSKENM. 123456789123456
0 1234567891234560
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN
:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02
- I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
I am looking forward to the right solutions, I can give more info if you need any.Hello Hendri,
Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
Answer: Format Definition uses .Net regular expression.
You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
Example 6
Description:
To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
Regular expression:
(?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
Text:
:61:0711211121C216,08N051NONREF
Matches:
1
Tips:
1. All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
Example 7
Description:
Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
Regular expression:
(?<=\b)(?<!\.)\d+(?=\b)(?!\.)
Text:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
Matches:
6
Tips:
1. The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
2. It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
You may lose some real values like u201C3187u201D before the u201C-u201D.
Example 8
Description:
Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
Regular expression:
(?<=^(P|0))\d
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
2. u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
:86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
The regular expression becomes
(?<=:86:(P|0))\d
Example 9
Description:
Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
Regular expression:
(?<=^(P|0)\d)[a-zA-Z. ]*
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. In this case, put BP account number regular expression into the look behind assertion.
Example 10
Description:
Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc. A possible document identification sub-record is made up of the following parts:
u2022 keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
u2022 an optional group made up of following:
a separator of either a dot, hyphen or slash, and
an optional space, and
an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
an optional space
u2022 and finally document identification in digits
Regular expression:
(?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
Kind Regards
-Yatsea -
Introduction to regular expressions ... continued.
After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
Having fun with regular expressions - Part 2
Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
CREATE OR REPLACE PACKAGE regex_utils AS
-- Regular Expression Utilities
-- Version 0.1
TYPE t_outrec IS RECORD(
data VARCHAR2(255)
TYPE t_outtab IS TABLE OF t_outrec;
FUNCTION gen_data(
p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED;
END regex_utils;
CREATE OR REPLACE PACKAGE BODY regex_utils AS
-- FUNCTION gen_data returns a collection of generated varchar2 elements
FUNCTION gen_data(
p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER -- length of the generated
) RETURN t_outtab PIPELINED
IS
TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
v_counter t_counter;
v_exit BOOLEAN;
v_string VARCHAR2(255);
v_outrec t_outrec;
BEGIN
FOR max_length IN 1..p_length
LOOP
-- init counter loop
FOR i IN 1..max_length
LOOP
v_counter(i) := 1;
END LOOP;
-- start data generation loop
v_exit := FALSE;
WHILE NOT v_exit
LOOP
-- start generation
v_string := '';
FOR i IN 1..max_length
LOOP
v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
END LOOP;
-- set outgoing record
v_outrec.data := v_string;
-- now pipe the result
PIPE ROW(v_outrec);
-- increment loop
<<inc_loop>>
FOR i IN REVERSE 1..max_length
LOOP
v_counter(i) := v_counter(i) + 1;
IF v_counter(i) > LENGTH(p_charset) THEN
IF i > 1 THEN
v_counter(i) := 1;
ELSE
v_exit := TRUE;
END IF;
ELSE
-- no further processing required
EXIT inc_loop;
END IF;
END LOOP;
END LOOP;
END LOOP;
END gen_data;
END regex_utils;
/This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
SELECT *
FROM (SELECT data col1
FROM TABLE(regex_utils.gen_data('+-.0', 5))
) t
WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
'^([+-]?[^+-]+|[^+-]+[+-]?)$'
'^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
FROM dual
;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '( )\1*|.', '\1')))
FROM dual
;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
Let's compare our solutions than, shall we?
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions. !',
'([a-z])+|.', '\1', 1, 0, 'i')), 0)
FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
WITH t AS (SELECT 'Where is the last space?' col1
FROM dual)
SELECT INSTR(col1, ' ', -1)
FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')
FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
FROM dual
;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
SELECT REGEXP_REPLACE('10~20~30~40~50',
'^(.*)~(.*)~(.*)~(.*)~(.*)$',
'\5~\4~\3~\2~\1'
FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
Continued in Introduction to regular expressions ... last part..
C.
Fixed some typos and a flawed example ...
cdThank you very much C. Awaiting other parts.... keep going.
One german typo :-)
I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
tnsnames voor EDWH:
PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
(HOST=blah.blah.blah.com)
(PORT=5227)))
(CONNECT_DATA=(SID=pcescrd1)))
db user: BW_I2_VIEWER / BW_I2_VIEWER_SCRD1Had to look for translators.
Cheers
Sarma. -
Introduction to regular expressions ... last part.
Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
Having fun with regular expressions - Part 3
In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
;I can even use strings composed of values like 'abc, xyz , 012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
Ready to try your own solution?
Does it look like this?
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz , 012', ' *, *', '|') || ')$')
;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
SELECT data
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
SELECT data
FROM (SELECT data, LOWER(DATA) search
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE search LIKE 'abc%'
OR search LIKE 'xyz%'
OR search LIKE '012%'
SELECT data
FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
WHERE search IN ('abc', 'xyz', '012')
; I'll leave it to your imagination how a complete non regular example with 'abc, xyz , 012' as search condition would look like.
As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
WITH t AS (SELECT '123-4567' phone
FROM dual
UNION
SELECT '01 345678'
FROM dual
UNION
SELECT '7 87 8787'
FROM dual
SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
FROM t
;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
Single digit values: 0-9
Double digit values: 00-99
Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
SELECT data
FROM TABLE(regex_utils.gen_data('0123456789', 3))
WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$')
; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
Using first my brute force generator, I'll check if I've missed any possible combination:
SELECT data
FROM TABLE(regex_utils.gen_data('03.', 15))
WHERE REGEXP_LIKE(data,
'^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
; Looks good to me. Let's check on some sample data:
WITH t AS (SELECT '127.0.0.1' ip
FROM dual
UNION
SELECT '256.128.64.32'
FROM dual
SELECT t.ip
FROM t WHERE REGEXP_LIKE(t.ip,
'^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
; No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
WITH t AS (SELECT '127.0.0.1' ip
FROM dual
SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
FROM t
; Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
WITH t AS (SELECT '127.0.0.1' ip
FROM dual
SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
'[0-9]*([0-9]{3})(\.?)', '\1\2'
FROM t
; Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
"[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
Here's mine:
SELECT data
FROM TABLE(regex_utils.gen_data('a1@.', 9))
WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
Does it look like this?
WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
FROM dual
UNION
SELECT 'http://x/'
FROM dual
SELECT t.URL
FROM t
WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
Update: Improvements in 10g2
All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
Rewriting my example from the first lesson, the WHERE clause would look like this:
WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
'^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
FROM dual
;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
FROM dual
;This is old style, "greedy" search pattern, returning everything until the last space.
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
FROM dual
;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
FROM dual
;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
FROM dual
;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
C.
Fixed some typos ...
Message was edited by:
cd
Included 10g2 features
Message was edited by:
cdCan I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
but for better understanding what I'd like to get
next example:
a have strings of two blocks separated by space.
in the first block 5 symbols of [01] in the second block 3 symbols of [01].
In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
with t as (select '10(!)010 10(>)1' num from dual union all
select '1112(!)0 111' from dual union all --incorrect because of '2'
select '(!)10010 011' from dual union all
select '10010(!) 101' from dual union all
select '10010 100(>)' from dual union all
select '13001 110' from dual union all -- incorrect because of '3'
select '100!01 100' from dual union all --incorrect because of ! without (!)
select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
--end of test data -
Regular Expressions and String variables
Hi,
I am attempting to implement a system for searching text files for regular expression matches (similar to something like TextPad, etc.).
Looking at the regular expression API, it appears that you can only match using string variables. I just wanted to make sure this is true. Some of these files might be large and I feel uneasy about loading them into ginormous Strings. Is this the only way to do it? Can I make a String as big as I want?
Thanks,
-MikeNewlines are only a problem if you're reading the
text line-by-line and applying the regexp to each
line. It wouldn't catch expressions that span
lines.
@sabre150: your note re: CharSequence -- so what
you're suggesting is to implement a CharSequence that
wraps the file contents, and then use the regexps on
the whole thing? I like the idea but it seems like
it would only be easy to implement if the file uses a
fixed-width character set. Or am I missing
something...?You are correct for the most basic implementation. It is very easy to create a char sequence for fixed width character sets using RandomAccessFile. Once you go to character sets such as UTF-8 then more effort is required.
While ever the regex is moving forward thought the CharSequence one char at a time there is no problem because one can wrap a Reader but once it backtracks then one needs random access and one will need to have a buffer. I have used a ring buffer for this which seems to work OK but of course this will not allow the regex to move to any point in the CharSequence.
'uncle_alice' is the regex king round here so listen to him.
:-( I should read further ahead next time!
Message was edited by:
sabre150
Message was edited by:
sabre150 -
Hi Experts,
After going through some documentation on regular expressions in Oracle I have tried to draw some conclusions about the same. As I wasn’t much confident on how the patterns are built, I have tried to interpret them by looking at the output. It’s basically a reverse engineering I have tried to do.
Please let me know if my interpretations are correct. Any additions /suggestions/corrections are most welcome.
Some of the examples may lack conclusions, please ignore those.
select regexp_substr('1PSN/231_3253/ABc','^([[:alnum:]]*)') from dual;
Output: 1PSN
Interpreted as:
^ From the start of the source string
([[:alnum:]]*) zero or more occurrences of alphanumeric characters
select regexp_substr('@@/231_3253/ABc','@*([[:alnum:]]+)') from dual;
Output: 231
Interpreted as:
@* Search for zero or more occurrences of @
([[:alnum:]]+) followed by one or more occurrences of alphanumeric characters
Note: In the above example oracle looks for @(zero times or more) immediately followed by alphanumeric characters.
Since a '/' comes between @ and 231 the o/p is 0 occurences of @ + one or more occurrences of alphanumerics.
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]*)') from dual;
Output: @
Interpreted as:
@+ one or more ocurrences of @
([[:alnum:]]*) followed by 0 or more occurrences of alphanumerics
select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]+)') from dual;
Output: Null
Interpreted as:
@+ one or more occurences of @
([[:alnum:]]+) followed by one or more occurences of aplhanumerics
select regexp_substr('@1PSN/231_3253/ABc125','([[:digit:]]+)$') from dual;
Output: 125
Interpreted as:
([[:digit:]]+) one or more occurences of digits only
$ at the end of the string
select regexp_substr('@1PSN/231_3253/ABc','([^[:digit:]]+)$') from dual;
output: /ABc
Interpreted as:
([^[:digit:]]+)$ one or more occurrences of non-digit literals at the end of the string
'^' inside square brackets marks the negation of the class
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.)
SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT
FROM dual;
Output: http://www.oracle.com
Interpreted as:
[[:alnum:]]+ one or more occurences of alplanumeric characters
\.? dot optionally (backslash represents escape sequence,? represents optionally)
{3,4} 3 or 4 times
/? followed by forward slash optionally
If you have www.oracle.co.uk; {3,4} extracts it for you as well
Validate email:
select case when
REGEXP_LIKE('[email protected]',
'^([[:alnum:]]+(\_?|\.))([[:alnum:]]*)@([[:alnum:]]+)(.([[:alnum:]]+)){1,2}$') then 'Match Found'
else 'No Match Found'
end
as output from dual;
Interpreted as:
([[:alnum:]]+(\_?|\.)) one or more occurrences of alpha numerics optionally followed by an underscore or dot
([[:alnum:]]*) followed by 0 or more occurrences of alplhanumerics
@ followed by @
([[:alnum:]]+) followed by one or more occurrences of alplhanumerics
(.([[:alnum:]]+)){1,2} followed by a dot followed by alphanumerics from once till max of twice (Ex- .com or .co.uk)
Output: Match Found
Input: [email protected]
Output: Match Found
Input: [email protected]
Output: No Match Found
Truncate the part, ending with digits
select regexp_substr('Yahoo11245@US','^.*[[:digit:]]',1) from dual;
Output: Yahoo11245
select regexp_substr('*Yahoo*11245@US','^.*[[:digit:]]',1) from dual;
Output: *Yahoo*11245
Interpreted as:
.* zero or more occurrences of any characters (dot signifies any character)
Replace 2 to 8 spaces with single space
select regexp_replace('Hello you OPs there','[[:space:]]{2,8}',' ')
from dual;
Search for control characters
select case when
regexp_like('Super' || chr(13) || 'Star' ,'[[:cntrl:]]')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Output: Match Found
Search for lower case letters only with a string length varying from a min of 3 to max of 12
select case when
regexp_like('terminator' ,'^[[:lower:]]{3,12}$')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
4th character must be a special character
select case when
regexp_like('ter*minator' ,'^...[^[:alnum:]]')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Ouput: Match Found
Case Sensitive Search
select case when
regexp_like('Republic Of Africa' ,'of','c')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Output: No match found
c stands for case sensitive
select case when
regexp_like('Republic Of africa' ,'of','i')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Output: Match Found
i stands for case insensitive
Two consecutive occurences of characters from a to z
select regexp_substr('Republicc Of Africaa' ,'([a-z])\1', 1,1,'i') from dual;
Output: cc
Interpreted as:
([a-z]) character set a-z
\1 consecutive occurence of any character
1 starting from 1st character in the string
1 First occurence
i case insensitive
Three consecutive occurences of characters from 6 to 9
select case when
regexp_like('Patch 10888 applied' ,'([7-9])\1\1')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Output: Match Found
Phone validator:
select case when
regexp_like('123-44-5555' ,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$')
then 'Match Found'
else 'No Match Found'
end
as output from dual;
Output: Match Found
Input: 111-222-3333
Output: No match found
Interpreted as:
^ start of the string
[0-9]{3} three ocurrences of digits from 0-9
- followed by hyphen
[0-9]{2} two ocurrences of digits from 0-9
- followed by hyphen
[0-9]{4} four ocurrences of digits from 0-9
$ end of the string
************************************************************************Source Links:
http://www.psoug.org/reference/regexp.html
http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
Edited by: Preta on Feb 25, 2010 4:38 PM
Corrected the example for www.oracle.com
Edited by: Preta Incorported Logan's commentsHi,
It looks like you have a good understanding of how regular expressions work.
You can put comments like the ones in your message directly in the code. For example, your validate e-mail code could be re-written
select case
when REGEXP_LIKE ( '[email protected]'
, '^' || -- Starting from the beginning of the string
'(' || -- Begin \1
'[[:alnum:]]+'|| -- 0 or more alphnumerics
'(\_?|\.)' || -- optional underscore or dot
')' || -- End \1
'([[:alnum:]]*)'|| -- 0 or more alphnumerics
'@' || -- @ sign
'([[:alnum:]]+)'|| -- 1 or more alpanumerics
'(' || -- Begin \5
'\.' || -- dot
'([[:alnum:]]+)'
|| -- 1 or more alphanumerics
')' || -- End \5
'{1,2}' || -- \5 can occur 1 or 2 times
'$' -- End of string
then 'Match Found'
else 'No Match Found'
end as output
from dual;I find this easier to debug and maintain.
There's no denying, it does make the code very long. You be the judge of when to do this.
You use parentheses and \ unnceccessarily sometimes. That's not really an error; if you find they make the code easier to develop and maintain, use them as much as you like.
For example, about the 4th line of the regular expression as I formatted it above:
'(\_?|\.)' || -- optional underscore or dotUnderscore has no special meaning in regular expressions (only in LIKE), so you don't have to escape it.
I might write that line:
'(_|\.)?' || -- optional underscore or dotjust because I think it's clearer.
I think you forgot a \ about 7 lines later:
'\.' || -- dotBe very careful about testing patterns that include literal dots; always make sure that a random character, like ~ , fails in a place where a dot is expected. -
Using regular expressions for validating time fields
Similar to my problem with converting a big chunk of validation into smaller chunks of functions I am trying to use Regular Expressions to handle the validation of many, many time fields in a flexible working time sheet.
I have a set of FormCalc scripts to calculate the various values for days, hours and the gain/loss of hours over a four week period. For these scripts to work the time format must be in HH:MM.
Accessibility guidelines nix any use of message box pop ups so I wanted to get around this by having a hidden/visible field with warning text but can't get it to work.
So far I have:
var r = new RegExp(); // Create a new Regular Expression Object
r.compile ("^[00-99]:\\] + [00-59]");
var result = r.test(this.rawValue);
if (result == true){
true;
form1.flow.page.parent.part2.part2body.errorMessage.presence = "visible";
else (result == false){
false;
form1.flow.page.parent.part2.part2body.errorMessage.presence = "hidden";
Any help would be appreciated!Date and time fields are tricky because you have to consider the formattedValue versus the rawValue. If I am going to use regular expressions to do validation I find it easier to make them text fields and ignore the time patterns (formattedValue). Something like this works (as far as my very brief testing goes) for 24 hour time where time format is HH:MM.
// form1.page1.subform1.time_::exit - (JavaScript, client)
var error = false;
form1.page1.subform1.errorMsg.rawValue = "";
if (!(this.isNull)) {
var time_ = this.rawValue;
if (time_.length != 5) {
error = true;
else {
var regExp = /^([01]?[0-9]|2[0-3]):[0-5][0-9]$/;
if (!(regExp.test(time_))) {
error = true;
if (error == true) {
form1.page1.subform1.errorMsg.rawValue = "The time must be in the format HH:MM where HH is 00-23 and MM is 00-59.";
form1.page1.subform1.errorMsg.presence = "visible";
Steve -
Regular Expressions - Dictionary List
Good day all -
I am trying to create a signature(s) to provide a minimalistic "content management" scenario. We have a list of about 150 words that we need to flag if they are seen in user data. I know how to create the regex string for a single word ... and can use the | pipe to separate the words to allow me to combine multiple words into a single signature ... but just how large is the STRING field? 255? 128? unlimited?
The idea hopefully is to use only 10 - 20 signatures to cover the whole list. Certainly hope to avoid having to write a new signature for each word!
Looking for suggestions and/or experiences of anyone else having attempted to do something like this.
Maybe someone found that you could insert unlimited words in the list but by doing so they overtaxed the sensor ... or that it appeared that using more than 10 words in a list was an iffy proposition.
All your inputs will be appreciated - whether I like what I hear or not! Thanks everyone.
Hank SchuppIt all depends on how many states the regular expression will create in the engine. The maximum is 64K bytes, which is a pretty long string. You will have to experiment to find the maximum number of words you can pipe into a single signature. I would recommend dividing the 150 words into different categories and writing one signature for each category. In general, writing one signature for 20 words will make it easy to manage.
-
Regular expressions its URGENT !!!
i have a long string of regular expressions seperated by "|" and i need to know which regular expression the particular string matched how can i find that and can i do it using java .util.regex
thanks in advanceConsider to use "capturing groups" or a better solution should be to split this long regular expression with alternations in small ones that will cause considerable reduction in backtracking. Also in this way will be easier to find what regular expression matches the target string.
Regards. -
Pattern matching regular expressions
I'm attempting to determine if a string matches a pattern of containing less than 100 alphanumeric characters a-z or 0-9 case insensitive. So my regular expression string looks like:
"^[a-zA-Z0-9]{0,100}$"And I use something like...
Pattern pattern = Pattern.compile( regexString );I'd like to modify my regex string to include the email 'at' symbol "@". So that the at symbol will be allowed. But my understanding of regex is very limited. How do I include an "or at symbol" in my regex expression?
Thanks for your help.* Code by sabre150
private static final Pattern emailMatcher;
static
// Build up the regular expression according to RFC821
// http://www.ietf.org/rfc/rfc0821.txt
// <x> ::= any one of the 128 ASCII characters (no exceptions)
String x_ = "\u0000-\u007f";
// <special> ::= "<" | ">" | "(" | ")" | "[" | "]" | "\" | "."
// | "," | ";" | ":" | "@" """ | the control
// characters (ASCII codes 0 through 31 inclusive and
// 127)
String special_ = "<>()\\[\\]\\\\\\.,;:@\"\u0000-\u001f\u007f";
// <c> ::= any one of the 128 ASCII characters, but not any
// <special> or <SP>
String c_ = "[" + x_ + "&&" + "[^" + special_ + "]&&[^ ]]";
// <char> ::= <c> | "\" <x>
String char_ = "(?:" + c_ + "|\\\\[" + x_ + "])";
// <string> ::= <char> | <char> <string>
String string_ = char_ + "+";
// <dot-string> ::= <string> | <string> "." <dot-string>
String dot_string_ = string_ + "(?:\\." + string_ + ")*";
// <q> ::= any one of the 128 ASCII characters except <CR>,
// <LF>, quote ("), or backslash (\)
String q_ = "["+x_+"$$[^\r\n\"\\\\]]";
// <qtext> ::= "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext>
String qtext_ = "(?:\\\\[" + x_ + "]|" + q_ + ")+";
// <quoted-string> ::= """ <qtext> """
String quoted_string_ = "\"" + qtext_ + "\"";
// <local-part> ::= <dot-string> | <quoted-string>
String local_part_ = "(?:(?:" + dot_string_ + ")|(?:" + quoted_string_ + "))";
// <a> ::= any one of the 52 alphabetic characters A through Z
// in upper case and a through z in lower case
String a_ = "[a-zA-Z]";
// <d> ::= any one of the ten digits 0 through 9
String d_ = "[0-9]";
// <let-dig> ::= <a> | <d>
String let_dig_ = "[" + a_ + d_ + "]";
// <let-dig-hyp> ::= <a> | <d> | "-"
String let_dig_hyp_ = "[-" + a_ + d_ + "]";
// <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
// String ldh_str_ = let_dig_hyp_ + "+";
// RFC821 looks wrong since the production "<name> ::= <a> <ldh-str> <let-dig>"
// forces a name to have at least 3 characters and country codes such as
// uk,ca etc would be illegal! I shall change this to make the
// second term of <name> optional by make a zero length ldh-str allowable.
String ldh_str_ = let_dig_hyp_ + "*";
// <name> ::= <a> <ldh-str> <let-dig>
String name_ = "(?:" + a_ + ldh_str_ + let_dig_ + ")";
// <number> ::= <d> | <d> <number>
String number_ = d_ + "+";
// <snum> ::= one, two, or three digits representing a decimal
// integer value in the range 0 through 255
String snum_ = "(?:[01]?[0-9]{2}|2[0-4][0-9]|25[0-5])";
// <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum>
String dotnum_ = snum_ + "(?:\\." + snum_ + "){3}"; // + Dotted quad
// <element> ::= <name> | "#" <number> | "[" <dotnum> "]"
String element_ = "(?:" + name_ + "|#" + number_ + "|\\[" + dotnum_ + "\\])";
// <domain> ::= <element> | <element> "." <domain>
String domain_ = element_ + "(?:\\." + element_ + ")*";
// <mailbox> ::= <local-part> "@" <domain>
String mailbox_ = local_part_ + "@" + domain_;
emailMatcher = Pattern.compile(mailbox_);
System.out.println("Email address regex = " + emailMatcher);
}Wow. Sheesh, sabre150 that's pretty impressive. I like it for two reasons. First it avoids some false negatives that I would have gotten using the regex I mentioned. Like, [email protected] is a valid email address which my regex pattern has rejected and yours accepts. It's unusual but it's valid. And second I like the way you have compartmentalized each rule so that changes, if any custom changes are desired, are easier to make. Like if I want to specifically aim for a particular domain for whatever reason. And you've commented it so that it is easier to read, for someone like myself who knows almost nothing about regex.
Thanks, Good stuff!
Maybe you are looking for
-
Smartform : line type not working fine when used on Next page
Hi , Please help on below : I have a smartform and have line type which is used in printing the items. When I use this line type on very first page , it works fine. But for the next page which is the exact copy of first page , the line type does not
-
My favourites from ie won't open in my mac
I had my favorite websites transfered from my pc to my mac. They are there, but when i select one of them it won't do anything. I am new to a MAC so I don't know alot yet. Just simple directions would help at this point. Thanks.
-
MM02 Upload Problem in BDC in Quality
Hello Experts , Currently I have situation here with BDC for MM02 . In sandbox I have recoded for only Quality Management View (which is at the 10th position). For all material types this is working fine. But , when in Quality Server , the position o
-
I'm setting up a replacement laptop and want to upgrade to CS6
-
HT1338 i am unable to get my serial number right after updating my logic pro
i am unable to get my serial number for pro logic right after updating the software