Regular expression again

Hi there
i'm using this regular expression:
String regexp = "(\\s+)\\di(\\s+)";
Pattern p = Pattern.compile(regexp,CASE_INSENSITIVE);
Matcher m = p.matcher(inputString);
clean = m.replaceAll(" ");
by which i'd like, given the following inputString:
"a da diapason di da_R_randomChars_DI_DA"
i want to obtain the following:
"diapason da__R_randomChars_DI_DA"
that's to say, to search for all occurrences of di,a,da char sequences that are followed by one or more white space or that follow one or more white space...
but it doesn't work...why?
where am I wrong?
thanks

1) Your capture groups are wrong
2) [] matches any of so da maches
da, a, d, di, i
     String inputString = "a da diapason di da_R_randomChars_DI_DA";
     String regexp = "\\s+(di|a|da)\\s+";
     Pattern p = Pattern.compile(regexp,Pattern.CASE_INSENSITIVE);
     Matcher m = p.matcher(inputString);
     String clean = m.replaceAll("");
     System.out.println( inputString );
     System.out.println( clean );
     System.out.println( "diapason da_R_randomChars_DI_DA" );give you:
$ java Test
a da diapason di da_R_randomChars_DI_DA
a diapason da_R_randomChars_DI_DA
diapason da_R_randomChars_DI_DA
Which is almost there, but your regex matches at lest one space, not one space, or start of line.

Similar Messages

Those darn regular expressions again ...

Greetings,
I feel reluctant to ask this question because I sincerely hate regular expressions, but here is a regular expression question:
Suppose I receive a byte stream from a device: after having received a, say, 'X' byte some more bytes follow, followed by a number: two or three digits followedby a dot and then some more digits follow. I am interested in that number so I cooked up this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
     public static void main(String[] args) {
          Pattern pat= Pattern.compile("X.*(\\d{2,3}\\.\\d*)");
          String str= "fooXbar123.456baz";
          Matcher mat= pat.matcher(str);
          if (mat.find())
               System.out.println(str.substring(mat.start(1), mat.end(1)));
}The regular expression represents what I want to see: a capital X, some bytes and the number. b.t.w. I convert those bytes to chars so there is no problem there. The output of this little snippet is "23.456", i.e. it takes the shortest variant of group #1 (the '1' is eaten by the "dot-star" subexpression.
I hope you understand why I hate those regular expressions so much; they're the devil's invention. My question boils down to: how can I find the longest variant of that group #1 expression? i.e. "123.456".
A bit more info: the 'bar' part can contain almost anything, including digits. It can even be totally empty. The capital 'X' has to be present and "bar" doesn't contain a capital 'X'.
kind regards,
Jos

JosAH wrote:
Never did get my head round the 'pimping lemon'!If a word x y z t u is in a language L and if x y^n z t^n u is also in that language the pumping lemma applies and L is a context free language; that is so trivial ... ;-)Was that a low flying jet or just the 'pimping lemon' going straight over my head!
>
I used to teach APL to non-(scientists/engineers/mathematicians) . One of the most unsatisfying jobs ever. I love APL but how do you teach APL to someone who does not understand matrices? How do you teach APL to someone who expects 1-1-1-1-1 to be -3? I still get agencies contacting me about APL jobs even though I last used it 25 years ago!I used to program APL on an old DEC LA/36 paper terminal. You had to lean over backwards because those APL symbols were printed on the front side of the keys. Therefore my knees blocked the paper feed so all my printouts turned into a mess ...I was lucky. I used an IBM (5110 springs to mind but at my age!) box and some IBM terminals attached to an IBM370. No silly paper tape. It just cost a fortune for every second one was connected to the IBM370.
>
How's the 'limp' Jos? Any better?Not really; I put too much strength on my left leg (the 'good' one) and my body thought something was wrong so it started a bad inflamation in that leg. I'm on inflamation suppressing pills now. dammit.
We are a real pair of crocks! I mixed and laid 6 ton of concrete and disposed of 8 ton of hardcore during Dec, Jan and Feb. My knee is swollen like a balloon and I'm on anti-inflammatory pills right now. BUT I have to eat a full meal before taking them and that means I have to have to take more pills to suppress my excess stomach acid. God I wish I was 50 again!
I shall have to visit Holland sometime before we are both confined to wheelchairs.

Introduction to regular expressions ...

I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
Introduction
Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
Having fun with regular expressions - Part 1
Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
WITH t AS (SELECT '456' col1
             FROM dual
            UNION
           SELECT '123x'
             FROM dual
            UNION
           SELECT 'x123'
             FROM dual
            UNION
           SELECT 'y'
             FROM dual
            UNION
           SELECT '+789'
             FROM dual
            UNION
           SELECT '-789'
             FROM dual
            UNION
           SELECT '159-'
             FROM dual
            UNION
           SELECT '-1-'
             FROM dual
SELECT t.col1
FROM t
WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
Wait a minute, what happened to the row with the column value "-1-"?
You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
Compare your solution to this one:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
'^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
I'll just use another SELECT to show what I want to do:
WITH t AS (SELECT '0' col1
             FROM dual
            UNION
           SELECT '0.'
             FROM dual
            UNION
           SELECT '.0'
             FROM dual
            UNION
           SELECT '0.0'
             FROM dual
            UNION
           SELECT '-1.0'
             FROM dual
            UNION
           SELECT '.1-'
             FROM dual
            UNION
           SELECT '.'
             FROM dual
            UNION
           SELECT '-1.1-'
             FROM dual
SELECT t.*
FROM t
;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
Does your solution look a bit like this?
WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                           '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                      )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
Continued in Introduction to regular expressions ... continued.
C.
Fixed some embarrassing typos ... and mistakes.
cd

Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

Bracket in Regular Expression constant?

I am a bit puzzled by the behavior I am experiencing in LV 2011. I hope to get some light from experts out there.
I am trying to parse a messy ASCII header file and after having split it into individual lines (strings), I use the "Match Regular Expression" function to remove some of the info before the substantial information.
Some of the strings include square brackets ([, ]), which are special characters for the function, therefore, as documented in the help, one needs to precede them with a backslash.
Example:
I want to parse the following line:
#PR [PR_DEV,I,2]
One way (which I am using because of considerations related to the rest of the header) is the the following:
Note that the first string constant is using "Code Display" whereas the second one is using "Normal Display".
Why did I not put a backslash in front of the bracket in the first string, you may ask? Well, I did, but it disappeared after I typed the other characters. And reverting to "Normal Display" did not restore it.
Of course, the first version does not parse the input string correctly, whereas the second one does it fine.
In other words, the custom display string (which is convenient for cryptic codes such as \s* or to distinguish between space and tab...or simply ENTER tabs!) seems to mess up with the \[ combo (likewise with the \] one).
It is not a huge deal. I can use the "Normal Display" mode, but I tend to think that this qualifies as a hidden "feature". And again, it is still a pain in the ... when dealing with special characters such as tabs, etc...
Solved!
Go to Solution.

I think that [ is a special character which needs to be preceded by a backslash, but it is not one of the defined backslash characters (like \s). So, you need to put in two \\ to get one \ while in '\' Codes Display.
You can put in any character by using \xx where the xx is a hex character using only upper case letters for A..F. I converted the strings to byte arrays and tried to see what made the arrays match and the Match work.
Lynn

URL paths and regular expressions in ASDM

Some background info - I've recently switched to an ASA 5510 on 8.4(3) coming from a Checkpoint NGX platform (let's say fairly quickly and without much warning ). I have a couple questions and they're kind of similar so I'll post them up. I've read docs about regex and creating them both via command line and ASDM, but the examples always seem to include info I don't need or honestly something I don't understand yet (mainly related to defining class\inspect maps). If someone could provide a simple example of how to do these in ASDM that would help a lot in understanding how regular expressions are properly configured. So here we go.
I know this is basic but I need to make sure I understand this properly - I have a single web server (so this won't be a global policy) where I need to allow access to a specific URL path\file and that's it. So we'll call it \test\testfile.doc. Any other access to any other path should be dropped. What's the best way to do this in ASDM (6.4)? I think if I saw a basic example for this I could figure out next few questions but I'll post them as well just in case.
I have another single public web server (again this won't be a global policy) where I'd like to specify blocking file types, like .php, .exe., etc... again a basic example would be great.
Lastly, and this is kind of related, but we have a single office/domain and sometimes we get spam from forged addresses appearing to be from our domain. On Checkpoint I used to use its built-in SMTP security server and could define if it received mail from *@mydomain.com to drop it because we would never receive mail externally from our own domain name. I saw something similar with ESMTP in ASDM and it looks kind of like how you set up the URL access mentioned above. Can I configure this in ASDM as well, and if so how?
TIA for your help,
Jordan

/bump

Get all groups from a regular expression match

Please help me understand how to use Java regular expressions:
I have an expression similar to this:
{noformat}"([^X]+)(X[^X]*)+"{noformat}This should match stuff like "asaasaXdfdfdfXXsdsfd".
How does one access all the matches for the second group (the second groups has a Kleene operator
added so it is not really just one group --- but match.groupCount() is always 2)
Here is roughly the code:
{noformat}java.util.regex.Pattern pattern = {noformat}{noformat}java.util.regex.Pattern.compile({noformat}{noformat}"([^X]+)(X[^X]*)+",{noformat}{noformat}java.util.regex.Pattern.MULTILINE{noformat}{noformat});{noformat}{noformat}java.util.regex.Matcher matcher = pattern.matcher(text);{noformat}{noformat}matcher.find();{noformat}{noformat}int groupcount = matcher.groupCount();{noformat}
Also, without matcher.find() I get an illegalStateException .. which I also get if I use matcher.matches() instead
of matcher.find().
I am obviously missing something here. There is always at least one "X" in the string so shouldn't that pattern always
match the whole string? Since there are often multiple X, shouldnt I get a group for each occurrence of X, followed
by 0 or more other characters?
{noformat}But when I try to match everything by using "^([^X]+)(X[^X]*)+$" I get an "IllegalStateException: No match available" again.{noformat}
What is the correct way to do this?
Edited by: johann_p on May 16, 2008 10:39 AM

I am sorry I messed this up. Here is a SSCCE:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class RegExp1 {
    public static void main(String[] args) {
      String testString = "first|aaaa | bbbb\n|cccc|ddddd";
      Pattern pattern = Pattern.compile("^([^|]+)(\\|[^|]*)+$");
      Matcher matcher = pattern.matcher(testString);
      matcher.find();
      int groupcount = matcher.groupCount();
      System.out.println("Found "+groupcount+" groups");
      System.out.println("Matcher: "+matcher);
      for (int i = 1; i <= groupcount; i++) {
        System.out.println("Match "+i+": "+testString.substring(matcher.start(i),matcher.end(i)));
}I figured out a small bug in my first code that explains some of the exception oddities, but my principal question remains:
how do I access all the matches that correspond to the second capturing group?
In the example I would get "first" for Match 1 and "|ddddd" for Match 2, but how do I access all the matches??
Thank you for your help!

Re: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't forget about the regular expression potential

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

Namburi,
When you said you used the Reg Exp tool, did you use it only as
preconfigured by the iMT migrate application wizard?
Because the default configuration of the regular expression tool will only
target the files in your ND project directories. If you wish to target
classes outside of the normal directory scope, you have to either modify the
"Source Directory" property OR create another instance of the regular
expression tool. See the "Tool" menu in the iMT to create additional tool
instances which can each be configured to target different sets of files
using different sets of rules.
Usually, I utilize 3 different sets of rules files on a given migration:
spider2jato.xml
these are the generic conversion rules (but includes the optimized rules for
ViewBean and Model based code, i.e. these rules do not utilize the
RequestManager since it is not needed for code running inside the ViewBean
or Model classes)
I run these rules against all files.
See the file download section of this forum for periodic updates to these
rules.
nonProjectFileRules.xml
these include rules that add the necessary
RequestManager.getRequestContext(). etc prefixes to many of the common
calls.
I run these rules against user module and any other classes that do not are
not ModuleServlet, ContainerView, or Model classes.
appXRules.xml
these rules include application specific changes that I discover while
working on the project. A common thing here is changing import statements
(since the migration tool moves ND project code into different jato
packaging structure, you sometime need to adjust imports in non-project
classes that previously imported ND project specific packages)
So you see, you are not limited to one set of rules at all. Just be careful
to keep track of your backups (the regexp tool provides several options in
its Expert Properties related to back up strategies).
----- Original Message -----
From: <vnamboori@y...>
Sent: Wednesday, August 08, 2001 6:08 AM
Subject: [iPlanet-JATO] Re: Use Of models in utility classes - Pease don't
forget about the regular expression potential
Thanks Matt, Mike, Todd
This is a great input for our migration. Though we used the existing
Regular Expression Mapping tool, we did not change this to meet our
own needs as mentioned by Mike.
We would certainly incorporate this to ease our migration.
Namburi
--- In iPlanet-JATO@y..., "Todd Fast" <toddwork@c...> wrote:
All--
Great response. By the way, the Regular Expression Tool uses thePerl5 RE
syntax as implemented by Apache OROMatcher. If you're doing lotsof these
sorts of migration changes manually, you should definitely buy theO'Reilly
book "Mastering Regular Expressions" and generate some rules toautomate the
conversion. Although they are definitely confusing at first,regular
expressions are fairly easy to understand with some documentation,and are
superbly effective at tackling this kind of migration task.
Todd
----- Original Message -----
From: "Mike Frisino" <Michael.Frisino@S...>
Sent: Tuesday, August 07, 2001 5:20 PM
Subject: Re: [iPlanet-JATO] Use Of models in utility classes -Pease don't
forget about the regular expression potential
Also, (and Matt's document may mention this)
Please bear in mind that this statement is not totally correct:
Since the migration tool does not do much of conversion for
these
utilities we have to do manually.Remember, the iMT is a SUITE of tools. There is the extractiontool, and
the translation tool, and the regular expression tool, and severalother
smaller tools (like the jar and compilation tools). It is correctto state
that the extraction and translation tools only significantlyconvert the
primary ND project objects (the pages, the data objects, and theproject
classes). The extraction and translation tools do minimumtranslation of the
User Module objects (i.e. they repackage the user module classes inthe new
jato module packages). It is correct that for all other utilityclasses
which are not formally part of the ND project, the extraction and
translation tools do not perform any migration.
However, the regular expression tool can "migrate" any arbitrary
file
(utility classes etc) to the degree that the regular expressionrules
correlate to the code present in the arbitrary file. So first andforemost,
if you have alot of spider code in your non-project classes youshould
consider using the regular expression tool and if warranted adding
additional rules to reduce the amount of manual adjustments thatneed to be
made. I can stress this enough. We can even help you write theregular
expression rules if you simply identify the code pattern you wish to
convert. Just because there is not already a regular expressionrule to
match your need does not mean it can't be written. We have notnearly
exhausted the possibilities.
For example if you say, we need to convert
CSpider.getDataObject("X");
To
RequestManager.getRequestContext().getModelManager().getModel(XModel.class);
Maybe we or somebody else in the list can help write that regularexpression if it has not already been written. For instance in thelast
updated spider2jato.xml file there is already aCSpider.getCommonPage("X")
rule:

<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getPage[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getViewBean($1ViewBean.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
Following this example a getDataObject to getModel would look
like this:
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[CSpider[.\s]*getDataObject[\s]*\(\"([^"]*)\"]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[getModel($1Model.class]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
In fact, one migration developer already wrote that rule andsubmitted it
for inclusion in the basic set. I will post another upgrade to thebasic
regular expression rule set, look for a "file uploaded" posting.Also,
please consider contributing any additional generic rules that youhave
written for inclusion in the basic set.
Please not, that in some cases (Utility classes in particular)
the rule
application may be more effective as TWO sequention rules ratherthan one
monolithic rule. Again using the example above, it will convert
CSpider.getDataObject("Foo");
To
getModel(FooModel.class);
Now that is the most effective conversion for that code if that
code is in
a page or data object class file. But if that code is in a Utilityclass you
really want:
>
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
So to go from
getModel(FooModel.class);
To
RequestManager.getRequestContext().getModelManager().getModel(FooModel.class
You would apply a second rule AND you would ONLY run this rule
against
your utility classes so that you would not otherwise affect yourViewBean
and Model classes which are completely fine with the simplegetModel call.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getModel$]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getModel\(]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getRequestContext().getModelManager().getModel(]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
A similer rule can be applied to getSession and other CSpider APIcalls.
For instance here is the rule for converting getSession calls toleverage
the RequestManager.
<mapping-rule>
<mapping-rule-primarymatch>
<![CDATA[getSession\($\.]]>
</mapping-rule-primarymatch>
<mapping-rule-replacement>
<mapping-rule-match>
<![CDATA[getSession\.]]>
</mapping-rule-match>
<mapping-rule-substitute>
<![CDATA[RequestManager.getSession().]]>
</mapping-rule-substitute>
</mapping-rule-replacement>
</mapping-rule>
----- Original Message -----
From: "Matthew Stevens" <matthew.stevens@e...>
Sent: Tuesday, August 07, 2001 12:56 PM
Subject: RE: [iPlanet-JATO] Use Of models in utility classes
Namburi,
I will post a document to the group site this evening which has
the
details
on various tactics of migrating these type of utilities.
Essentially,
you
either need to convert these utilities to Models themselves or
keep the
utilities as is and simply use the
RequestManager.getRequestContext.getModelManager().getModel()
to statically access Models.
For CSpSelect.executeImmediate() I have an example of customhelper
method
as a replacement whicch uses JDBC results instead of
CSpDBResult.
matt
-----Original Message-----
From: vnamboori@y... [mailto:<a href="/group/SunONE-JATO/post?protectID=081071113213093190112061186248100208071048">vnamboori@y...</a>]
Sent: Tuesday, August 07, 2001 3:24 PM
Subject: [iPlanet-JATO] Use Of models in utility classes
Hi All,
In the present ND project we have lots of utility classes.
These
classes in diffrent directory. Not part of nd pages.
In these classes we access the dataobjects and do themanipulations.
So we access dataobjects directly like
CSpider.getDataObject("do....");
and then execute it.
Since the migration tool does not do much of conversion forthese
utilities we have to do manually.
My question is Can we access the the models in the postmigration
sameway or do we need requestContext?
We have lots of utility classes which are DataObjectintensive. Can
someone suggest a better way to migrate this kind of code.
Thanks
Namburi
[email protected]
[email protected]
[Non-text portions of this message have been removed]
[email protected]
[email protected]

How to use special characters in regular expression

HI all, I am new to regular expression.
Can any one please tell me the regular expression for characters which are used in regular expression like " [({. tetc.Is there any particular expression to prefix before using these characters

Expression:
< td .*? bgcolor = \" ( [^ \" ] +) \" \\s* .*? > ( .+? ) </td>
It will search for expression starting with <td ,
.*? means any characters zero or more than one,
then it will find bgcolor = , then literal \"... \(any char) is that specific char
[ ^ \" ] any character but not literal \",this means there has to be something between ".... " if empty then wont match ,+ is 1 or more times
Then again literal \ " , after that \\s* means zero or more num of spaces,
then again , .*? means any characters zero or more than one,
it will search for literal > , again any chars . * ?
Finally </td> will be searched.....!!
So all expressions having this particular structure will be
matched.
Output :
<td align="left" valign="top" bgcolor="ffffff" width="177">bla bla bla</td>

Regular Expressions in num-exp

Hello All,
I had a problem on my SRST gateway with num-exp insterting a repeating pattern into my 7-digit dialing when in fallback mode.
For a brief example, the 7digit internal dialing is 21621.. or 21622..
The num-exp statement of 'num-exp 2... 2162...' was not allowing me to 7-digit dial directly from one IP phone to another while in fallback mode.
When I dialed 2162154 the 2162 would hit the num-exp and be expended to 2162162.
I have a work around that uses a voice translation-rule, applied to the call-manager-fallback config that will translate a 7-digit dialed string to the 4 digit dialed string which then hits the 4-digit to 7-digit num-exp and it is working fine.
However, I was wondering if there is a way to use regular expressions in num-exp so that perhaps I can skip the intermediate step of using the translation-rule. Based off my existing translation-rules that are working properly, I figured something like this might work for num-exp:
'num-exp /^2$[12]..$$/ /2162\1/'
But when I try to issue a num-exp with a regular expression I get the following message.
Incorrect format for Number macro pattern
regular expression must be of the form ^((\+)?([0-9#*A-F.]|(\\\*))+(\$)?)$
I have tried a number of different combinations with no success. I always get the same message. The regular expression that I tried first was:
'num-exp ^2... 2162...'
This is when I first saw the "Incorrect format..." message and figured that is must be possible. Is this just a generic warning similar to when you try to use complex regular expressions with the 'translation-rule' command vs. the 'voice translation-rule' command and in reality you cannot use regular expressions in the num-exp command?
Thank you,
Leo

Hi Chris,
Thank you for taking the time to answer my question. It looks like the answer is no, num-exp does not support regular expressions.
I don't insist on using num-exp for this I was just hoping to kill two birds with one stone and possibly skip the intermediate step of translating the 7-digits dial to 4-digits using a translation-rule just to expand from 4-digit to 7 again. This is only an issue while in SRST if a user tries to dial using 7-digits. We have a 7-digit internal dialing scheme and normally my num-exp is just to expand the 4 digits sent from the telco to our 7-digit internal dialing. The problem is that both our prefix and part of our DID range start with 21 so while in SRST if a user tried to dial a 7-digit DN, say 2162154, after they dialed the 4th digit (2162) that pattern would hit the num-exp and get expanded to 2162162. I was hoping to create a num-exp using a regular expression that would only expand a four digit string that begins with a 2 to seven digits and not any string that begins with a 2. This would 1) expand the four digits sent from the telco and 2) not match a seven digit string that begins with a 2 such as 2162154 which may be dialed by a user.
Again, this is only an issue while in SRST and I have a pretty good work around so I'm fine with not being able to use a regular expression as part of my num-exp config. I just thought it would be a cool application of a regular expression if it was possible.
Thanks again for answering my question.
Leo

Regular expressions in Format Definition add-on

Hello experts,
I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
Summary of my problem:
In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
:61:071222D208,00N026
:86:P 12345678BELASTINGDIENST       F8R03782497                $GH
$0000009                         BETALINGSKENM. 123456789123456
0 1234567891234560
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN
:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02
- I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
I am looking forward to the right solutions, I can give more info if you need any.

Hello Hendri,
Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
Answer: Format Definition uses .Net regular expression.
You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
Example 6
Description:
To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
Regular expression:
(?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
Text:
:61:0711211121C216,08N051NONREF
Matches:
1
Tips:
1.     All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
Example 7
Description:
Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
Regular expression:
(?<=\b)(?<!\.)\d+(?=\b)(?!\.)
Text:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
Matches:
6
Tips:
1.     The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
2.     It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
You may lose some real values like u201C3187u201D before the u201C-u201D.
Example 8
Description:
Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
Regular expression:
(?<=^(P|0))\d
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1.     Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
2.     u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
:86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
The regular expression becomes
(?<=:86:(P|0))\d
Example 9
Description:
Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
Regular expression:
(?<=^(P|0)\d)[a-zA-Z. ]*
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1.     In this case, put BP account number regular expression into the look behind assertion.
Example 10
Description:
Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc. A possible document identification sub-record is made up of the following parts:
u2022     keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
u2022     an optional group made up of following:
     a separator of either a dot, hyphen or slash, and
     an optional space, and
     an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
     an optional space
u2022     and finally document identification in digits
Regular expression:
(?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
Kind Regards
-Yatsea

Regular Expressions and Numbers in Strings

Looking for someone who has had experience in using Regular Expressions in the Classification Rule Builder.
We have an eVar that is collecting the number of search results in this fashion:
<Total Results>_<# of Item 1>_<# of Item 2>_<# of Item 3>_<# of Item 4>
Example output would look like this:
150_50_0_25_75
What we've done is initially create a Regular Expression that looks like this:
^(.+)\_(.+)\_(.+)\_(.+)\_(.+)$
The problem is, it appears in situations where the output contains a zero in one of the slots, the value is ignored and it receives the value in the next place over. Using the example output shown above, I would end up with values like this:
$0 150_50_0_25_75
$1 150
$2 50
$3 25
$4 75
$5 {null}
Here's the weird part. When I perform a test of a single record, it appears like it will work just fine, but when it actually runs in Omniture, it's not working as expected. Here's something else I'd like to know if it's possible to address. The five-place string is only the newest iteration of this approach. In the past, we started out with a two-place version, then three-place and then four. Any recommendations for handling all scenarios?
Any and all advice is welcome. Thanks in advance!

Doing some playing around on rubular.com and thinking the Regular Expression should be build this way instead:
^(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)$
Again, still looking for any additional guidance from more experienced individuals. Thanks!

Regular expressions with boolean connectives (AND, OR, NOT) in Java?

I'd like to use regular expression patterns that are made up of simple regex patterns connected via AND, OR, or NOT operators, in order to do some keyword-style pattern matching.
A pattern could look like this:
(.*Is there.*) && (.*library.*) && !((.*badword.*) || (^$))
Is there any Java regex library that allows these operators?
I know that in principle these operators should be available, since Regular languages are closed under union, intersection, and complement.

AND is implicit,
xy -- means x AND yThat's not what I need, though, since this is just
concatenation of a regex.
Thus, /xy/ would not match the string "a y a x",
because y precedes x.So it has to contain both x and y, but they could be
in any order?
You can't do that easily or generally.
"x.*y|y.*x" wouldll work here, but obviously
it will get ugly factorially fast as you add more
terms.You got that right: AND means the regex operands can appear in any order.
That's why I'm looking for some regex library that does all this ugly work for me. Again, from a theoretical point of view, it IS possible to express the described semantics of AND with regular expressions, although they will get rather obfuscated.
Unless somebody has done something similar in java (e.g., for C++, there's Ragel: http://www.cs.queensu.ca/~thurston/ragel/) , I will probably use some finite-state-machine libraries and compile the complex regex's into automata (which can be minimized using well-defined operations on FSMs).
>
You'd probably just be better off doing multiple
calls to matches() or whatever. Yes, that's another possibility, do the boolean operators in Java itself.
Of course, if you
really are just looking for literals, then you can
just use str.contains(a) && !str.contains(b) &&
(str.contains(c) || str.contains(d)). You don't
seem to need regex--at least not from your example.OK, bad example, I do have "real" regexp's in there :)

Help with Regular Expressions and regexp_replace

Oh great Oracle Guru can I can gets some help
I need to clean up the phone numbers that have been entered in Oracle eBusiness per_phones table. Some of the phone numbers have dashes, some have spaces and some have char. I would just like to take all the digits out and then re-format the number.
Ex.
914-123-1234 .. output (914) 123-1234
9141231234 ..again (914) 123-1234
914 123 1234 .. (914) 123-1234
myphone ... just null
(914)-123-1234.. (914) 123-1234
I really tried to understand the regular expressions statments, but for some reason I just can't understand it.

Hi,
Welcome to the forum!
I would create a user-defined function for this. I expect there will be a lot of exceptions to the regular rules (for example, strings that do not contain exactly 10 digits, such as '1-800-987-6543') that can be handled, but would require lots of nested fucntions and othwer complicted code if you had to do it in a single statement.
If you really want to do it with a regular expression:
SELECT     phone_txt
,     REGEXP_REPLACE ( phone_txt
                 , '^\D*'          || -- 0 or more non-digits at the beginning of the string
                       '(\d\d\d)'     || -- \1 = 3 consecutive digits
                '\D*'          || -- 0 or more non-digits
                       '(\d\d\d)'     || -- \2 = 3 consecutive digits
                '\D*'          || -- 0 or more non-digits
                       '(\d\d\d)'     || -- \3 = 4 consecutive digits
                '\D*$'             -- 0 or more non-digits at the end of the string
                 , '(\1) \2-\3'
                 )          AS new_phone_txt
FROM    table_x
;

Introduction to regular expressions ... continued.

After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
Having fun with regular expressions - Part 2
Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
CREATE OR REPLACE PACKAGE regex_utils AS
-- Regular Expression Utilities
-- Version 0.1
TYPE t_outrec IS RECORD(
    data VARCHAR2(255)
TYPE t_outtab IS TABLE OF t_outrec;
FUNCTION gen_data(
    p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER   -- length of the generated
) RETURN t_outtab PIPELINED;
END regex_utils;
CREATE OR REPLACE PACKAGE BODY regex_utils AS
-- FUNCTION gen_data returns a collection of generated varchar2 elements
FUNCTION gen_data(
    p_charset IN VARCHAR2 -- character set that is used for generation
, p_length IN NUMBER   -- length of the generated
) RETURN t_outtab PIPELINED
IS
    TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
    v_counter t_counter;
    v_exit    BOOLEAN;
    v_string VARCHAR2(255);
    v_outrec t_outrec;
BEGIN
    FOR max_length IN 1..p_length
    LOOP
      -- init counter loop
      FOR i IN 1..max_length
      LOOP
        v_counter(i) := 1;
      END LOOP;
      -- start data generation loop
      v_exit := FALSE;
      WHILE NOT v_exit
      LOOP
        -- start generation
        v_string := '';
        FOR i IN 1..max_length
        LOOP
          v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
        END LOOP;
        -- set outgoing record
        v_outrec.data := v_string;
        -- now pipe the result
        PIPE ROW(v_outrec);
        -- increment loop
        <<inc_loop>>
        FOR i IN REVERSE 1..max_length
        LOOP
          v_counter(i) := v_counter(i) + 1;
          IF v_counter(i) > LENGTH(p_charset) THEN
             IF i > 1 THEN
                v_counter(i) := 1;
             ELSE
                v_exit := TRUE;
             END IF;
          ELSE
             -- no further processing required
             EXIT inc_loop;
          END IF;
        END LOOP;
      END LOOP;
    END LOOP;
END gen_data;
END regex_utils;
/This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
SELECT *
FROM (SELECT data col1
          FROM TABLE(regex_utils.gen_data('+-.0', 5))
       ) t
WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                                     '^([+-]?[^+-]+|[^+-]+[+-]?)$'
                   '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
FROM dual
;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '( )\1*|.', '\1')))
FROM dual
;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
Let's compare our solutions than, shall we?
SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions. !',
                                 '([a-z])+|.', '\1', 1, 0, 'i')), 0)
FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
WITH t AS (SELECT 'Where is the last space?' col1
             FROM dual)
SELECT INSTR(col1, ' ', -1)
FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')
FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
FROM dual
;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
SELECT REGEXP_REPLACE('10~20~30~40~50',
                      '^(.*)~(.*)~(.*)~(.*)~(.*)$',
                      '\5~\4~\3~\2~\1'
FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
Continued in Introduction to regular expressions ... last part..
C.
Fixed some typos and a flawed example ...
cd

Thank you very much C. Awaiting other parts.... keep going.
One german typo :-)
I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
tnsnames voor EDWH:
PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
                                               (HOST=blah.blah.blah.com)
                                               (PORT=5227)))
           (CONNECT_DATA=(SID=pcescrd1)))
db user: BW_I2_VIEWER / BW_I2_VIEWER_SCRD1Had to look for translators.
Cheers
Sarma.

Converting String Characters into Regular Expression Automatically?

Hi guys.... is there any program or sample coding which is available to convert string characters into regular expression automatically when the program is run?
Example:
String Character Input: fnffffffffffnnnnnnnnnffffffnnfnnnnnnnnnfnnfnfnfffnfnfnfnfnfnnnnd
When the program runs, it automatically convert into this :
Regular Expression Output: f*d

hey guys.... i am sorry for not providing all the information that you guys need as i was rushing off to urgent meeting... for my string characters i only have a to n.. all these characters are collected from sensors and stored inside database... from many demos i have done... i found out that every demo has different strings of characters collected and these string of characters will not match with the regular expressions that i had created due to several unwanted inputs and stuff... i have a lot of different types of plan activities and therefore a lot of regular expressions.... if i put [a-z|0-9]*... it will capture all characters but in the same time it will be showing 1 plan only.... therefore, i am finding ways to get the strings i collected and let it form into regular expression by themselves in the program so that it will appear as different plans as output with comparing with the regular expression that i had created.... is there any way to do so?
please post again if there is any questions u are still not familiar with... thank you...

Regular expression again

Similar Messages

Maybe you are looking for