Regular Expression - Select two words after specific string

Hi,
I am trying to select the two words/strings after the first word "door". I am using the search pattern (?<=door).\w+ but in this case I get the complete text after the word "door". I only want to select the two words after the first "door" in the complete text.
Can anybody help me?
Thanks!
Marco Snels

Hi Marco,
I'm relatively handy with RegEx but this seems like a problem where I would employ a little bit of RegEx and CTL, just to make life easier.
You can use the following RegEx (note: I didn't test this in Integrator, only in a RegEx testing tool) to extract the two words after door (but including door, unfortunately):
(?:door)[\s]\w+[\s]\w+
This would give you something like the following in your extracted field:
door is brown
You could then pass through a re-formatter to remove "door" and the whitespace and be on your way. Not the best answer but should perform reasonably well and get you up and going.
Regards,
Patrick Rafferty
http://branchbird.com

Similar Messages

Regular expression help please. (extracting a string subset between two markers)

I haven't used regular expressions before, and I'm having trouble finding a regular expression to extract a string subset between two markers.
The string;
Header stuff I don't want
Header stuff I don't want
Header stuff I don't want
Header stuff I don't want
Header stuff I don't want
Header stuff I don't want
ERRORS 6
Info I want line 1
Info I want line 2
Info I want line 3
Info I want line 4
Info I want line 5
Info I want line 6
END_ERRORS
From the string above (this is read from a text file) I'm trying to extract the string subset between ERRORS 6 and END_ERRORS. The number of errors (6 in this case) can be any number 1 through 32, and the number of lines I want to extract will correspond with this number. I can supply this number from a calling VI if necessary.
My current solution, which works but is not very elegant;
(1) uses Match Regular Expression to the return the string after matching ERRORS 6
(2) uses Match Regular Expression returning all characters before match END_ERRORS of the string returned by (1)
Is there a way this can be accomplished using 1 Match Regular Expression? If so could anyone suggest how, together with an explanation of how the regular expression given works.
Many thanks
Alan
Solved!
Go to Solution.

I used a character class to catch any word or whitespace characters. Putting this inside parentheses makes a submatch that you can get by expanding the Match Regular Expression node. The \d finds digits and the two *s repeat the previous term. So, \d* will find the '6', as well as '123456'.
Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice

Regular Expression to capture user's input string

I am writing a helper class to split user input string into String array according to the following pattern:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestDelimiter {
static Pattern p = Pattern.compile("(.*?)[,;]{1}");
static Matcher m;
    public static void main(String[] args) {
        String input = "AAA111111,BBB222222;CCC333333";
        m = p.matcher(input);
        while(m.find()){
            String output = m.group(1);
            System.out.println(output);
}Output:
AAA111111
BBB222222My question is, how can I modify the regular expression string so that the CCC333333 (last element) can also be included?

roamer wrote:
Ok, let's don't argue on this point.Who's arguing?
I think I got the answer. For simplicity, I can just manually add a ";" or "," string after each user input. Just like:You never said anything about that before. You asked how to split "AAA111,BBB222;CCC333" into its components and you were given a correct answer.
Maybe you should rephrase your question including this added requirement. Do you need to have the separator included in the final outputs or why do you want to suddenly add things to user input?

Regular Expression for non-words

hello all!
can you help me construct a regular expression that will match non-word strings say "��". I will be needing this to filter words from a Microsoft Word Document.
Thanx!

hello all!
can you help me construct a regular expression that
will match non-word strings say "��". I will
be needing this to filter words from a Microsoft Word
Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

Regular Expression to Locate Words with Character

I want to identify all the words in a document that are followed by the register mark (®) symbol.
I built, what I thought was a regular expression that would search for a register mark preceeded by alpha number characters and a space. So if my text contained the sentence "Adobe InDesign® is a great product.", the regular expression would find "InDesign®"
Below is the regular expression I composed. It grabs anything with a register mark, not just the register marks preceded by a space and alpha numeric characters. Where did I go wrong? I though the \s would restrict the search to complete words with a register mark.
\s[a-zA-Z0-9]|®

\s is the special GREP code for "any kind of space" -- a regular space, a tab, hard return, or any of ID's own white space codes. It has nothing to do with "complete words", because a word can appear at the start of a story, without any preceding space. It would also not find "InDesign®" because there is no space before it, there is a double quote instead.
Your GREP does not work because, well, you got the general idea (words may consist of the set of characters "a-z", "A-Z", and "0-9") but since you use the [..] without any other code, GREP will apply this rule once -- per character. If you want to find words of more than one character, you need to tell GREP "one or more of these, please": with a +.
Second, where did that | come from? It's the OR operator. Essentially, you are looking for
any space followed by one character from the set "a-z", "A-Z", and "0-9"
OR
the ® character
The 'word break' you were looking for is this code: \b, so you could search for "\b[a-zA-Z0-9]+" (note the '+' to allow more than one instance) -- but it's not necessary, because by default GREP grabs as much as it can. The set 'a-zA-Z0-9' etc. describes the allowed "word" characters, but you might want to prefer these: \l (ell) and \u for all lowercase and all uppercase characters -- they are shorter, and they automatically include accented characters, Greek, Russian, and a lot more. Similar, \d (for "digits") is the short-cut for "0-9". And even better: \w is the shortcut for "word character", i.e., your set but then shorter and a bit better.
Try this one:
\w+~r

Regular expression breaks with \00 in input string

I wrote the following code to illustrate the problem.
Clearly the match should occur in both cases, but it does not occur if \00 is in the input string (within the < > ).
I am not even searching for \00 or \01. This makes the regular expression match rather useless if you have text that contains data in hex form. I am using Labview 8.2. Is this a known bug? Is there a work around?
Tammo
Message Edited by Tammo on 03-06-2008 12:26 PM
Message Edited by Tammo on 03-06-2008 12:26 PM
Attachments:
RegEx1.vi ‏56 KB
BlockDiagram1.jpg ‏18 KB
FrontPanel1.jpg ‏18 KB

There was a brief discussion on this not too long ago.

Regular expression to check the sequence of strings

HI,
I am validating an EDI document like as follows
ISA*XX*XXXXXXXXXXXXXXX*XX*XXXXXXXXXXXXXXX*030130*0912*~IEA*1*000005900~
I need to create a regular expression which checks ISA should always occur before IEA.
Please help me if you have any hints.
Thanks.

Thanks.I had taken that into consideration.
But using regular expression I could say
ISA* only once
IEA* only once
And
ISA before IEA
Any hints how to specify "before"/sequence using Regular expression ?
I agree sometime using basic String operation we can do this.

Regular Expression Abbreviation of Words

Suppose I have got data in my column like
Balla Ram Chog Mal College
Maharishi Dayanand University
Cambridge Public School
Now I want to write a query using regular expressions to find out the abbreviations. e.g the resulting data set should be:
BRCMC
MDU
CPS
How should I write regexp for it ?

One way, using SUBSTR and INSTR, tested on 10g.
with data as
select 'Balla Ram Chog Mal College' col from dual union all
select 'Maharishi Dayanand University' col from dual union all
select 'Cambridge Public School' col from dual
select col, replace(ltrim(max(sys_connect_by_path(str, ',')) keep (dense_rank last order by r), ','), ',') abbr
from (
select col, substr(col, decode(level, 1, 1, instr(col, ' ', 1, level - 1) + 1), 1) str, level, row_number() over (partition by col order by level) r
from data
connect by level <= length(col) - length(replace(col, ' ')) + 1
       and col = prior col
       and prior sys_guid() is not null
order by col, level
group by col
start with r = 1
connect by r - 1 = prior r
       and col = prior col
       and prior sys_guid() is not null;
COL                           ABBR
Balla Ram Chog Mal College    BRCMC
Cambridge Public School       CPS
Maharishi Dayanand University MDU
With 11g, you will not require the Outer query to concatenate the results, you can directly use LISTAGG as demonstrated by Hashim.

Regular Expression to spilt words

Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
SQL> SELECT REGEXP_SUBSTR('system hello sidval',
2 '[a-z]+\S+') RESULT
3 FROM DUAL;
RESULT
system
SQL> examples
1- if string is
Daivd from uk
output is uk if string is
David john
output is
john the maximum lenght of string is five words
regards
Edited by: Ayham on Oct 7, 2012 12:01 PM
Edited by: Ayham on Oct 7, 2012 12:18 PM

Ayham wrote:
Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
Try thisSQL> SELECT REGEXP_SUBSTR('system hello sidval', '[a-z]+\S*$') RESULT FROM DUAL; The extra <tt>$</tt> tells the regex to match the end of the line. the <tt>*</tt> instead of the <tt>+</tt> does also match if the line does not ent with a space character.
bye
TPD
Edited by: TPD Opitz-Consulting com on 07.10.2012 21:35

Replace after specific string

Hello people,
I have the below query on 9i:
SELECT ERRMESSAGE FROM USER_LOG;And it will return:
http://somepath/somepath/somepath/somepath.aspx form visit at:prod - Client:10.10.101.243
http://someotherpath/someotherpath/someotherpath/someotherpath.aspx form visit at:prod - Client:10.20.11.203
2500 more recordsI want to remove everything after the .aspx so that only the URL part remains. e.g:
http://somepath/somepath/somepath/somepath.aspx
http://someotherpath/someotherpath/someotherpath/someotherpath.aspx
...Thank you in advance for your help

>
drbiloukos wrote:
After the trim I saw that I also have .asp as well.
How can I have both reported ?
>
Try this way.
with user_log as
(   select 'http://somepath/somepath/somepath/somepath.aspx form visit at:prod - Client:10.10.101.243' ERRMESSAGE from dual union all
    select 'http://someotherpath/someotherpath/someotherpath/someotherpath.aspx form visit at:prod - Client:10.20.11.203' ERRMESSAGE from dual union all
    select 'http://someotherpath/someotherpath/someotherpath/someotherpath.asp form visit at:prod - Client:10.20.11.203' ERRMESSAGE from dual
SELECT
   CASE
    WHEN INSTR(errmessage,'.aspx')>0 THEN
      SUBSTR(errmessage,1,INSTR(errmessage,'.aspx') + length('.aspx') )
    WHEN INSTR(errmessage,'.asp')>0 THEN
      SUBSTR(errmessage,1,INSTR(errmessage,'.asp') + length('.asp') )
    ELSE
      errmessage
    END AS   trimmed_errmessage
FROM
    user_log
/ Edited by: Lokanath Giri on १३ दिसंबर, २०११ ४:३८ अपराह्न

Regular Expression string

Hi,
I'm looking for a single regular expression.
E.g. My string is 'get data for comparison'. So when using regular expression as \Wcompar then it returns true.
But I dont want it to return true for string 'get data for comparable' with the same regular epression.
Here, I do not want string comparable and rest all the forms of compar to be selected.

It's not really clear what you need, perhaps something like this?
-- data:
with t as
select 'get data for comparison' txt from dual union all
select 'get date for compar xxxxx' txt from dual union all
select 'compar xx' txt from dual union all
select 'compare xx' txt from dual union all
select 'xx compar' txt from dual union all
select 'xx compare' txt from dual
-- query:
select txt from t
where regexp_like(txt,'(\W|^)compar(\W|$)');
TXT
get date for compar xxxxx
compar xx
xx compar(\W|^) means word-limit or start of the string
(\W|$) means word-limit or end of the string
Edited by: hm on 21.12.2011 05:29

Dumbfounded by Scanner processing String using regular expression

I was reading Bruce Eckel's book when I came across something interesting: extending Scanner with regular expressions. Unfortunately, I was confronted with an issue that doesn't make much sense to me: if the String that I am scanning contains a hyphen, the Scanner doesn't produce anything. As soon as I take it out, it all works like a charm. Here is my example:
import java.util.Scanner;
import java.util.regex.*;
public class StringScan {
public static void main (String [] args){
     String input = "there's one caveat when scanning with regular expressions";
     Scanner scanner = new Scanner (input);
     String pattern = "[a-z]\\w+";
     while (scanner.hasNext(pattern)){
          scanner.next(pattern);
          MatchResult match = scanner.match();
          String output = match.group();
          System.out.println(output);
}What could be the reason? I imagined it could be because the hyphen for some reason gets given a special meaning but when I tried escaping it, it still didn't work.

Thanks for your prompt reply.
I have figured out what was wrong with my code, by the way. Since a single quote is not a word character, it does not match w+. And as the very first input token does not match, the scanner stops immediately. I rewrote my regex to "[a-z].*" and now it does work.

EBS Search String for two words separate by other characters

Hi
Is there a way to write a search string which will recognize two words in a string?
For example :
PAYPAL TRANSFER WTT489798709898BOOK
I need a search string which will recognize PAYPAL and BOOK
I have tried : (|)PAYPAL + BOOK( |$)    or (|)PAYPAL / BOOK( |$)but in the test it is not successfull.
Any help is appreciation

Hi
Does SAP treat multiple record 88 lines on the EBS as one continuous line for a single 16 record?
I used the string ^PAYPAL ?*DOTCOM PARTNERS LP$ based on your suggestion and uploaded a EBS statement.
The system did not post anything, even though in string test if I enter the follwowing line it displays correct results
:PAYPAL TRANSFER 110403 522222222454W DOTCOM PARTNERS LP/
Our EBS statement has the following data:
16,169,20000,,,00000000000,/
88,OTHER REFERENCE:IA0000122222222/
88,PAYPAL TRANSFER 110403 522222222454W DOTCOM PARTNERS LP/
Any advise is appreciated.
Edited by: Kirti Bhardwaj on Jul 5, 2011 9:20 PM

Regular expression for 2nd occurance of a substring in a string

Hi,
1)
i want to find the second occurrence of a substring in a string with regular expression so that i can modify that only.
Ex: i have a string like ---> axe,afn,sdk,jdi,afn,mki,mki
in this i want the second occurance of afn and change that one only...
which regular expression i have to use...
Note that ...i have to use regular expression only....no string manipulation methods...(strictly)
2)
How can i apply the multiple regular expressions multiple times on a single string ..i.e in the above instance i have to apply the same 2nd occurrence logic for
substring mki also. for this i have to use a single regular expression string that contains validations for both the sub strings mki and afn.
Thanks in advance,
Venkat

javafreak666 wrote:
Hi,
1)
i want to find the second occurrence of a substring in a string with regular expression so that i can modify that only.
Ex: i have a string like ---> axe,afn,sdk,jdi,afn,mki,mki
in this i want the second occurance of afn and change that one only...
which regular expression i have to use...
Note that ...i have to use regular expression only....no string manipulation methods...(strictly)
2)
How can i apply the multiple regular expressions multiple times on a single string ..i.e in the above instance i have to apply the same 2nd occurrence logic for
substring mki also. for this i have to use a single regular expression string that contains validations for both the sub strings mki and afn.
Thanks in advance,
VenkatWhat do you mean by using a regex to get the index of a second substring? There is not method in Java which uses regex to et the index of a substring.
There are various indexOf(...) methods for this:
String text = "axe,afn,sdk,jdi,afn,mki,mki";
String target = "afn";
int second = text.indexOf(target, text.indexOf(target)+1);
System.out.println("second="+second);Of course you can find the index of a group like this:
Matcher m = Pattern.compile(target+".*?("+target+")").matcher(text);
System.out.println(m.find() ? "index="+m.start(1) : "nothing found");but there is not single method that handles this: you'll have to call the find() and then the start(...) method on the Matcher instance, so the indexOf(...) approach is the favourable one, IMO.

A regular expression to detect a blank string...

Anyone know how to write a regular expression that will detect a blank string?
I.e., in a webservice xsd I'm adding a restriction to stop the user specifying a blank string for an element in the webservice operation call.
But I can't figure out a regular expression that will detect an entirely blank string but that will on the other hand allow the string to contain blank spaces in the text.
So the restriction should not allow "" or " " to be specified but will allow "Joe Bloggs" to be specified.
I tried [^ ]* but this wont allow "Joe Bloggs" to pass.
Any ideas?
Thanks,
Ruairi.

Hi ruairiw,
there is a shortcut for the set of whitespace chars in Java. It is the Expression *\s* which is equal to *[ \t\n\f\r\u000b]*.
With this expression you can test whether a String consists only of whitespace chars.
Expamle:
String regex = "\\s*"; // the slash needs to be escaped
// same as String regex = "[ \t\n\f\r\u000b]";
System.out.println("".matches(regex)); // true
System.out.println(" ".matches(regex)); // true
System.out.println(" \r\n\t ".matches(regex)); // true
System.out.println("\n\nTom".matches(regex)); // false
System.out.println(" Tom Smith".matches(regex)); // falseBesh Wishes
esprimo

Regular Expression - Select two words after specific string

Similar Messages

Maybe you are looking for