Regular Expressions X Performance
I'd like know what do they you about the use of Regular Expressions? It's more fast?
Regular expressions can be interpreted by a DFA and the computational cost of its execution is the same cost of traversing the string char by char.
To find substrings there are algorithms that can skip a lot of chars during the traversing of the string and they result much faster than the analogus ones based on regular expression. But theese algorithms can't interpret regular expressions they work only on exact strings.
So when you're looking for an exact string the use of exact string matching is much faster than the use of regular expression matching.
http://en.wikipedia.org/wiki/String_searching_algorithm
http://en.wikipedia.org/wiki/Pattern_matching
http://en.wikipedia.org/wiki/Regular_expression
Bye Alessandro
Similar Messages
-
"Match Regular Expression" and "Match Pattern" vi's behave differently
Hi,
I have a simple string matching need and by experimenting found that the "Match Regular Expression" and "Match Pattern" vi's behave somewhat differently. I'd assume that the regular expression inputs on both would behave the same. A difference I've discovered is that the "|" character (the "vertical bar" character, commonly used as an "or" operator) is recognized as such in the Match Regular Expression vi, but not in the Match Pattern vi (where it is taken literally). Furthermore, I cannot find any documentation in Help (on-line or in LabVIEW) about the "|" character usage in regular expressions. Is this documented anywhere?
For example, suppose I want to match any of the following 4 words: "The" or "quick" or "brown" or "fox". The regular expression "The|quick|brown|fox" (without the quotes) works for the Match Regular Expression vi but not the Match Pattern vi. Below is a picture of the block diagram and the front panel results:
The Help says that the Match Regular Expression vi performs somewhat slower than the Match Pattern vi, so I started with the latter. But since it doesn't work for me, I'll use the former. But does anyone have any idea of the speed difference? I'd assume it is negligible in such a simple example.
Thanks!
Solved!
Go to Solution.Yep-
You hit a point that's frustrated me a time or two as well (and incidentally, caused some hair-pulling that I can ill afford)
The hint is in the help file:
for Match regular expression "The Match Regular Expression function gives you more options for matching
strings but performs more slowly than the Match Pattern function....Use regular
expressions in this function to refine searches....
Characters to Find
Regular Expression
VOLTS
VOLTS
A plus sign or a minus sign
[+-]
A sequence of one or more digits
[0-9]+
Zero or more spaces
\s* or * (that is, a space followed by an asterisk)
One or more spaces, tabs, new lines, or carriage returns
[\t \r \n \s]+
One or more characters other than digits
[^0-9]+
The word Level only if it
appears at the beginning of the string
^Level
The word Volts only if it
appears at the end of the string
Volts$
The longest string within parentheses
The first string within parentheses but not containing any
parentheses within it
\([^()]*\)
A left bracket
A right bracket
cat, cag, cot, cog, dat, dag, dot, and dag
[cd][ao][tg]
cat or dog
cat|dog
dog, cat
dog, cat cat dog,cat
cat cat dog, and so on
((cat )*dog)
One or more of the letter a
followed by a space and the same number of the letter a, that is, a a, aa aa, aaa aaa, and so
on
(a+) \1
For Match Pattern "This function is similar to the Search and Replace
Pattern VI. The Match Pattern function gives you fewer options for matching
strings but performs more quickly than the Match Regular Expression
function. For example, the Match Pattern function does not support the
parenthesis or vertical bar (|) characters.
Characters to Find
Regular Expression
VOLTS
VOLTS
All uppercase and lowercase versions of volts, that is, VOLTS, Volts, volts, and so on
[Vv][Oo][Ll][Tt][Ss]
A space, a plus sign, or a minus sign
[+-]
A sequence of one or more digits
[0-9]+
Zero or more spaces
\s* or * (that is, a space followed by an asterisk)
One or more spaces, tabs, new lines, or carriage returns
[\t \r \n \s]+
One or more characters other than digits
[~0-9]+
The word Level only if it begins
at the offset position in the string
^Level
The word Volts only if it
appears at the end of the string
Volts$
The longest string within parentheses
The longest string within parentheses but not containing any
parentheses within it
([~()]*)
A left bracket
A right bracket
cat, dog, cot, dot, cog, and so on.
[cd][ao][tg]
Frustrating- but still managable.
Jeff -
Regular expression vs oracle text performance
Does anyone have experience with comparig performance of regular expression vs oracle text?
We need to implement a text search on a large volume table, 100K-500K rows.
The select stmt will select from a VL, a view joining 2 tables, B and _TL.
We need to search 2 text columns from this _VL view.
Using regex seems less complex, but the deciding factor is of course performace.
Would oracle text search perform better than regular expression in general?
Thanks,
MargaretHi Dominc,
Thanks, we'll try both...
Would you be able to validate our code to create the multi-table index:
CREATE OR REPLACE PACKAGE requirements_util AS
PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2);
END requirements_util;
CREATE OR REPLACE PACKAGE BODY requirements_util AS
PROCEDURE concat_columns(i_rowid IN ROWID, io_text IN OUT NOCOPY VARCHAR2)
AS
tl_req pjt_requirements_tl%ROWTYPE;
b_req pjt_requirements_b%ROWTYPE;
CURSOR cur_req_name (i_rqmt_id IN pjt_requirements_tl.rqmt_id%TYPE) IS
SELECT rqmt_name FROM pjt_requirements_tl
WHERE rqmt_id = i_rqmt_id;
PROCEDURE add_piece(i_add_str IN VARCHAR2) IS
lx_too_big EXCEPTION;
PRAGMA EXCEPTION_INIT(lx_too_big, -6502);
BEGIN
io_text := io_text||' '||i_add_str;
EXCEPTION WHEN lx_too_big THEN NULL; -- silently don't add the string.
END add_piece;
BEGIN
BEGIN
SELECT * INTO b_req FROM pjt_requirements_b WHERE ROWID = i_rowid;
EXCEPTION
WHEN NO DATA_FOUND THEN
RETURN;
END;
add_piece(b_req.req_code);
FOR tl_req IN cur_req_name(b_req.rqmt_id) LOOP
add_piece(tl_req.rqmt_name);
END concat_columns;
END requirements_util;
EXEC ctx_ddl.drop_section_group('rqmt_sectioner');
EXEC ctx_ddl.drop_preference('rqmt_user_ds');
BEGIN
ctx_ddl.create_preference('rqmt_user_ds', 'USER_DATASTORE');
ctx_ddl.set_attribute('rqmt_user_ds', 'procedure', sys_context('userenv','current_schema')||'.'||'requirements_util.concat_columns');
ctx_ddl.set_attribute('rqmt_user_ds', 'output_type', 'VARCHAR2');
END;
CREATE INDEX rqmt_cidx ON pjt_requirements_b(req_code)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('DATASTORE rqmt_user_ds
SYNC (ON COMMIT)'); -
Request some help, over procedure's performance uses regular expressions for its functinality
Hi All,
Below is the procedure, having functionalities of populating two tables. For first table, its a simple insertion process but for second table, we need to break the soruce record as per business requirement and then insert into the table. [Have used regular expressions for that]
Procedure works fine but it takes around 23 mins for processing 1mm of rows.
Since this procedure would be used, parallely by different ETL processes, so append hint is not recommended.
Is there any ways to improve its performance, or any suggestion if my approach is not optimized? Thanks for all help in advance.
CREATE OR REPLACE PROCEDURE SONARDBO.PRC_PROCESS_EXCEPTIONS_LOGS_TT
P_PROCESS_ID IN NUMBER,
P_FEED_ID IN NUMBER,
P_TABLE_NAME IN VARCHAR2,
P_FEED_RECORD IN VARCHAR2,
P_EXCEPTION_RECORD IN VARCHAR2
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_EXCEPTION_LOG_ID EXCEPTION_LOG.EXCEPTION_LOG_ID%TYPE;
BEGIN
V_EXCEPTION_LOG_ID :=EXCEPTION_LOG_SEQ.NEXTVAL;
INSERT INTO SONARDBO.EXCEPTION_LOG
EXCEPTION_LOG_ID, PROCESS_DATE, PROCESS_ID,EXCEPTION_CODE,FEED_ID,SP_NAME
,ATTRIBUTE_NAME,TABLE_NAME,EXCEPTION_RECORD
,DATA_STRUCTURE
,CREATED_BY,CREATED_TS
VALUES
( V_EXCEPTION_LOG_ID
,TRUNC(SYSDATE)
,P_PROCESS_ID
,'N/A'
,P_FEED_ID
,NULL
,NULL
,P_TABLE_NAME
,P_FEED_RECORD
,NULL
,USER
,SYSDATE
INSERT INTO EXCEPTION_ATTR_LOG
EXCEPTION_ATTR_ID,EXCEPTION_LOG_ID,EXCEPTION_CODE,ATTRIBUTE_NAME,SP_NAME,TABLE_NAME,CREATED_BY,CREATED_TS,ATTRIBUTE_VALUE
SELECT
EXCEPTION_ATTR_LOG_SEQ.NEXTVAL EXCEPTION_ATTR_ID
,V_EXCEPTION_LOG_ID EXCEPTION_LOG_ID
,REGEXP_SUBSTR(str,'[^|]*',1,1) EXCEPTION_CODE
,REGEXP_SUBSTR(str,'[^|]+',1,2) ATTRIBUTE_NAME
,'N/A' SP_NAME
,p_table_name
,USER
,SYSDATE
,REGEXP_SUBSTR(str,'[^|]+',1,3) ATTRIBUTE_VALUE
FROM
SELECT
REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
FROM
DUAL t1 CROSS JOIN
TABLE
CAST
MULTISET
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
AS SYS.odciNumberList
) t2
WHERE REGEXP_SUBSTR(str,'[^|]*',1,1) IS NOT NULL
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
RAISE;
END;
Many Thanks,
ArpitRegex's are known to be CPU intensive specially when dealing with large number of rows.
If you have to reduce the processing time, you need to tune the Select statements.
One suggested change could be to change the following query
SELECT
REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
FROM
DUAL t1 CROSS JOIN
TABLE
CAST
MULTISET
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
AS SYS.odciNumberList
) t2
to
SELECT REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,level) str
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
Before looking for any performance benefit, you need to ensure that this does not change your output.
How many substrings are you expecting in the P_EXCEPTION_RECORD? If less than 5, it will be better to opt for SUBSTR and INSTR combination as it might work well with the number of records you are working with. Only trouble is, you will have to write different SUBSTR and INSTR statements for each column to be fetched.
How are you calling this procedure? Is it not possible to work with Collections? Delimited strings are not a very good option as it requires splitting of the data every time you need to refer to. -
Help in regular expression matching
I have three expressions like
1) [(y2009)(y2011)]
2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
3) [(y2009M1d20)(y2011M12d31)]
i want regular expression pattern for the above three expressions
I am using :
REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
but its giving results for all above expressions while i want different expression for each.
i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
ThanksI dont get your question Can you post your desired output? and also give some sample data.
Please consider the following when you post a question.
1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
You can use the following query and do a copy past of the output.
select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
that are asked the answer is already there.
3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
I have the following table called sales
with sales
as
select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
union all
select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
select *
from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
For example in the above sales table, I want to know the total quantity and number of invoice for each product.
The output should look like this
Prod_id sum_qty count_inv
1 145 2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
use the {noformat}{noformat} tags.
The usage of the tag is like this.
<place your code here>\
7. If you are posting a *Performance Related Question*. Please read
{thread:id=501834} and {thread:id=863295}.
Following those guide will be very helpful.
8. Please keep in mind that this is a public forum. Here No question is URGENT.
So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude. -
Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:
<!--getPage to getViewBean-->
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel\(]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:
<!--getPage to getViewBean-->
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel\(]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\(\)\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected] -
Regular Expressions and Full-text Requests.
Hi,
i have just read that Berkeley DB XML doesnt support regexp in XQuery (what a pity),
do you know how to look-alike regular expressions in Query?
for example, i'd like to perform a full-text request, all tags that contains text like "Be.+ley" (it would return tags that contains text like "Berkeley" or "Beverley"), how can i do that?
Thx.Hi,
A performant way to do something like you want is with a query that mixes use of contains() (index optimized) and matches() (not index optimized). Something like this:
collection()//tag[contains(., "Be") and contains(., "ley") and matches(., "Be.+ley")]John -
Regular expressions and input streams
Hello,
I am trying to find a way to read characters from a stream and find matches in them with regular expressions. The problem is that the stream may contain a big amount of characters, so I can't read all of them, keep them in a big string, and then perform the searches. Is there a way to perform searches while I read the characters? I scanned the documentation of Pattern and Matcher and found nothing that can help.
Any ideas?
Thanks.Looking at the method
public Matcher Pattern.matcher(CharSequence input);
one could be excused for assuming that one could turn an InputStream into a CharSequence object since a 'stream' implies sequencial access and CharSequence sounds like it provides sequential access.
This would just require the implementation of 4 method of which two (toString() and subSequence()) could probably be ignored BUT the method you wouild have to implement are charAt() and length() which imply random access rather than sequencial access!
Using the built in regex your problem looks difficult. If you are just looking for 'equality' matches (simple searching) then you could use Knuth-Morris-Pratt algorithm or the Boyer-Moore algorithm. -
Regular expressions in ActionScript??
I have been looking at the Adobe publication Programming Action Script (pdf) and it
specifies ECMA-262 3rd edition specification. But the specification don't seem to
state exactly what type of regular expression engine and version is used.
Is it POSIX, or PERL compatible regular expressions (or both)?
I have read and used the classic O'Reilly text Mastering Regular Expressions
and coded regular expressions in javascript/php/etc (anywhere regular expressions
could be use, Apache configuration file, other server config files, etc etc etc)
There is a difference in the type of engine used, where as performance is
concerned, as well as the range of syntax valid in a particular implementation.
Thank You
JKhttp://www.regular-expressions.info/javascript.html
-
Java Built-in regular expressions versus Jakarta
Are there any advantages to using the Jakarta regular expression package, over the built-in Java regular expressions?
I wasn't able to find much information on the Jakarta regexp package, except for the Javadoc and I didn't find that very informative.
Thanks!
JeffWell, the String.replaceAll(String, String) method uses the Java regexp internally, and I'm not sure there's a way to change that. So that at least is one place that will use it. I don't know for sure, but Jakarta's ORO is supposed to be fully compatible with Perl 5, and also supports other regexp types as well, so if you need that aspect of it, then go with Jakarta. I heard some of Java's are a little limited in some capabilities. I can't find any particular pages that refer to any comparisons of them at to compare runtime performance.
-
Regular Expressions and Numbers in Strings
Looking for someone who has had experience in using Regular Expressions in the Classification Rule Builder.
We have an eVar that is collecting the number of search results in this fashion:
<Total Results>_<# of Item 1>_<# of Item 2>_<# of Item 3>_<# of Item 4>
Example output would look like this:
150_50_0_25_75
What we've done is initially create a Regular Expression that looks like this:
^(.+)\_(.+)\_(.+)\_(.+)\_(.+)$
The problem is, it appears in situations where the output contains a zero in one of the slots, the value is ignored and it receives the value in the next place over. Using the example output shown above, I would end up with values like this:
$0 150_50_0_25_75
$1 150
$2 50
$3 25
$4 75
$5 {null}
Here's the weird part. When I perform a test of a single record, it appears like it will work just fine, but when it actually runs in Omniture, it's not working as expected. Here's something else I'd like to know if it's possible to address. The five-place string is only the newest iteration of this approach. In the past, we started out with a two-place version, then three-place and then four. Any recommendations for handling all scenarios?
Any and all advice is welcome. Thanks in advance!Doing some playing around on rubular.com and thinking the Regular Expression should be build this way instead:
^(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)$
Again, still looking for any additional guidance from more experienced individuals. Thanks! -
Regular expression and pattern matching/replacing
I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll(replacementStr);
But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
Any suggestions are most welcome.I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead. -
Matches from regular expression into collection
Hello,
I need to do the following:
I have a long string with some similar repeated data. I would like, using a regular expression, to extracts all matches in a collection. Is there a way of performing this task?
I have look through the owa_pattern package, but as far as I found out, I can extract only a simple match. Here is an exact quote:
"If multiple overlapping strings can match the regular expression, this function takes the longest match. " - http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/w_patt.htm
So what can I do if I want to get all the matches?
Thank you in anticipation. Any help would be appreciated.
Best regards,
beroetzI think your need a tokenizer-function.
If the string +:in_str+ is delimited by +:in_delimiter+ you could try this:
SELECT REGEXP_REPLACE(REGEXP_SUBSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ), :in_delimiter, '') TOKEN
BULK COLLECT INTO :my_nested_table
FROM DUAL
CONNECT BY REGEXP_INSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ) > 0
ORDER BY LEVEL ASC;
I wrote a string-to-textarray-tokenizer (and it's pendant) some times ago, being able to cut from certain positions within the string using regular expressions and return the elements into an nested table of varchar2. It looks like:
TYPE pos_arraytype IS TABLE OF POSITIVE ;
TYPE text_arraytype IS TABLE OF VARCHAR2(2000);
FUNCTION stringToTextarray(in_str IN VARCHAR2, in_pos_arr IN pos_arraytype, in_regexp_arr IN text_arraytype DEFAULT NULL, in_trim_strings IN BOOLEAN DEFAULT TRUE)
RETURN text_arraytype ;
in_str is the string to be tokenized
in_pos_arr is a table of positive values of positions in the string to be cut
in_regexp_arr is a table of regular expressions to use at each position declared by in_pos_arr
in_trim_strings is a flag, if the cutted element should be trimmed
using above for example:
in_str = 'Markus van Muster 347651234XY Musterdaam ABCDE'
in_pos_arr = (1, 13, 35, 35, 42)
in_regexp_arr = ('(.?){12}', '([^[:digit:]]?){22}', '[[:digit:]]{4}', '[[:alpha:]]{2}', '(.?){14}')
in_trim_strings = TRUE
RETURN collection ('Markus','van Muster','1234','XY','Musterdaam')
If you need the code, then tell me! I'm looking for....
Cheers,
Martin
Edited by: Nuerni on 17.10.2008 08:49 -
Hi
Can the following be done in OWB using operators
SELECT DISTINCT column1
FROM ( SELECT DISTINCT REGEXP_SUBSTR (description,
'[^[:blank:][:punct:] ]+',
1,
LEVEL)
column1,
LEVEL
FROM (SELECT FREE_TEXT DESCRIPTION FROM TEST_TABLE)
CONNECT BY LEVEL <=
LENGTH (REGEXP_REPLACE (description, '[[:alnum:]]'))
+ 1
ORDER BY LEVEL)
WHERE column1 IS NOT NULL
TEST_TABLE is a table with following data
FREE_TEXT
THE MAN IS WIELDING A SPADE
A%WOMAN,WAS SEEN.
1 THE'GIRL IS LAUGH$ING
Output as from SQL is
Column1
GIRL
1
SPADE
WIELDING
IS
THE
MAN
WAS
WOMAN
SEEN
ING
LAUGH
A
Cheers
BirdyHi David
I was trying to improve the performance of the above query after tips from some forums.
The new query is
SELECT REGEXP_SUBSTR ( free_text, '[^[:blank:][:punct:] ]+', 1, lvl)
FROM Test_table,
(SELECT LEVEL lvl
FROM (SELECT MAX (LENGTH (REGEXP_REPLACE ( free_text, '[[:alnum:]]'))) mx
FROM Test_table)
CONNECT BY LEVEL <= mx + 1)
WHERE lvl - 1 <= LENGTH (REGEXP_REPLACE ( free_text, '[[:alnum:]]'))
ORDER BY lvl;
The only issue I am facing is to get the max of length of regular expression.
I am not sure whether this will be possible in a mapping as we cannot have aggregations in a filter condition...
I even tried to first get the value MAX (LENGTH (REGEXP_REPLACE ( free_text, '[[:alnum:]]'))) in an aggregator and then use is further but the data flow will change as I will have to go through the aggregator.
Not sure how to go about this now. I guess I will have to embed it in a function.
The old query mapping works but is way too slow.
Birdy -
Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
To make this concrete, suppose that you want to split a String into lines, where the line delimiters are the same as the [line terminators used by Java regex|http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#lt] :
A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('\u0085'),
A line-separator character ('\u2028'), or
A paragraph-separator character ('\u2029)
This problem has [been considered before|http://forums.sun.com/thread.jspa?forumID=4&threadID=464846] .
If we ignore the idiotic microsoft two char \r\n sequence, then no problem; the Java code would be:
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]");How do we add support for \r\n? If we try
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");which pattern of the compound (OR) regex gets used if both match? The
[\\n\\r\\u0085\\u2028\\u2029]or the
\\r\\n?
For instance, if the above code is called when
s = "a\r\nb";and if the first pattern
[\\n\\r\\u0085\\u2028\\u2029]is used for the match when the \r is encountered, then the tokens will be
"a", "", "b"
because there is an empty String between the \r and following \n. On the other hand, if the rule is use the pattern which matches the most characters, then the
\\r\\n
pattern will match that entire \r\n and the tokens will be
"a", "b"
which is what you want.
On my particular box, using jdk 1.6.0_17, if I run this code
String s = "a\r\nb";
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");
System.out.print(lines.length + " lines: ");
for (String line : lines) System.out.print(" \"" + line + "\"");
System.out.println();
if (true) return;the answer that I get is
3 lines: "a" "" "b"So it seems like the first listed pattern is used, if it matches.
Therefore, to get the desired behavior, it seems like I should use
"\\r\\n|[\\n\\r\\u0085\\u2028\\u2029]"instead as the pattern, since that will ensure that the 2 char sequence is first tried for matches. Indeed, if change the above code to use this pattern, it generates the desired output
2 lines: "a" "b"But what has me worried is that I cannot find any documentation concerning this "first pattern of an OR" rule. This means that maybe the Java regex engine could change in the future, which is worrisome.
The only bulletproof way that I know of to do line splitting is the complicated regex
"(?:(?<=\\r)\\n)" + "|" + "(?:\\r(?!\\n))" + "|" + "(?:\\r\\n)" + "|" + "\\u0085" + "|" + "\\u2028" + "|" + "\\u2029"Here, I use negative lookbehind and lookahead in the first two patterns to guarantee that they never match on the end or start of a \r\n, but only on isolated \n and \r chars. Thus, no matter which order the patterns above are applied by the regex engine, it will work correctly. I also used non-capturing groups
(?:X)
to avoid memory wastage (since I am only interested in grouping, and not capturing).
Is the above complicated regex the only reliable way to do line splitting?bbatman wrote:
Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
The longest match wins, normally. Except for alternation (or) as can be read from the innocent sentence
The Pattern engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.
in the javadocs. More information can be found in Friedl's book, the relevant page of which google books shows at
[http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false|http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false]
If this link does not survive, search google for
regular expression "ordered alternation"
My first hit went right into Friedl's book.
Harald.
Maybe you are looking for
-
How do you move pictures from iPhoto to an SD Card on a late 2011 Mac Book Pro 15 inch?
How do you move pictures from iPhoto to an SD Card on a late 2011 Mac Book Pro 15 inch?
-
How to check null condition on a Object.......
Hi I have created a JTable and Iam retrieving the data from the Jtable as an Object. Object s=table.getValueAt(i,j) if(s!=null) dosomething(); }But the if loop is continuing even if there is a null value in the table.Where am I going wrong?
-
Hello Pals, Our client wants to go for split valuation. We have completed with all the settings related to split valuation in transaction OMWO & OMWC. Our valuation types are basically NEW, OVERHAUL & REPAIR, with Valaution category as C (STATUS). Th
-
one of my phones keeps doing strange things. First the bluetooth stops working (even when settings are on), then Find My iPhone then stops working. iPhone shuts down for no reason during usage, etc. the list can go on. I have rebooted, updated, e
-
As a Mac user, I usually do my own support. When I do have a problem, I search the web for others that are working through the problem. Very often threads in Apple Discussions appear in the results. What I find is that problems can be discussed for m