Regular Expression to spilt words

Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
SQL> SELECT REGEXP_SUBSTR('system hello sidval',
2 '[a-z]+\S+') RESULT
3 FROM DUAL;
RESULT
system
SQL> examples
1- if string is
Daivd from uk
output is uk if string is
David john
output is
john the maximum lenght of string is five words
regards
Edited by: Ayham on Oct 7, 2012 12:01 PM
Edited by: Ayham on Oct 7, 2012 12:18 PM

Ayham wrote:
Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
Try thisSQL> SELECT REGEXP_SUBSTR('system hello sidval', '[a-z]+\S*$') RESULT FROM DUAL; The extra <tt>$</tt> tells the regex to match the end of the line. the <tt>*</tt> instead of the <tt>+</tt> does also match if the line does not ent with a space character.
bye
TPD
Edited by: TPD Opitz-Consulting com on 07.10.2012 21:35

Similar Messages

Regular Expression to Locate Words with Character

I want to identify all the words in a document that are followed by the register mark (®) symbol.
I built, what I thought was a regular expression that would search for a register mark preceeded by alpha number characters and a space. So if my text contained the sentence "Adobe InDesign® is a great product.", the regular expression would find "InDesign®"
Below is the regular expression I composed. It grabs anything with a register mark, not just the register marks preceded by a space and alpha numeric characters. Where did I go wrong? I though the \s would restrict the search to complete words with a register mark.
\s[a-zA-Z0-9]|®

\s is the special GREP code for "any kind of space" -- a regular space, a tab, hard return, or any of ID's own white space codes. It has nothing to do with "complete words", because a word can appear at the start of a story, without any preceding space. It would also not find "InDesign®" because there is no space before it, there is a double quote instead.
Your GREP does not work because, well, you got the general idea (words may consist of the set of characters "a-z", "A-Z", and "0-9") but since you use the [..] without any other code, GREP will apply this rule once -- per character. If you want to find words of more than one character, you need to tell GREP "one or more of these, please": with a +.
Second, where did that | come from? It's the OR operator. Essentially, you are looking for
any space followed by one character from the set "a-z", "A-Z", and "0-9"
OR
the ® character
The 'word break' you were looking for is this code: \b, so you could search for "\b[a-zA-Z0-9]+" (note the '+' to allow more than one instance) -- but it's not necessary, because by default GREP grabs as much as it can. The set 'a-zA-Z0-9' etc. describes the allowed "word" characters, but you might want to prefer these: \l (ell) and \u for all lowercase and all uppercase characters -- they are shorter, and they automatically include accented characters, Greek, Russian, and a lot more. Similar, \d (for "digits") is the short-cut for "0-9". And even better: \w is the shortcut for "word character", i.e., your set but then shorter and a bit better.
Try this one:
\w+~r

Regular Expression for non-words

hello all!
can you help me construct a regular expression that will match non-word strings say "��". I will be needing this to filter words from a Microsoft Word Document.
Thanx!

hello all!
can you help me construct a regular expression that
will match non-word strings say "��". I will
be needing this to filter words from a Microsoft Word
Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

Regular Expression Abbreviation of Words

Suppose I have got data in my column like
Balla Ram Chog Mal College
Maharishi Dayanand University
Cambridge Public School
Now I want to write a query using regular expressions to find out the abbreviations. e.g the resulting data set should be:
BRCMC
MDU
CPS
How should I write regexp for it ?

One way, using SUBSTR and INSTR, tested on 10g.
with data as
select 'Balla Ram Chog Mal College' col from dual union all
select 'Maharishi Dayanand University' col from dual union all
select 'Cambridge Public School' col from dual
select col, replace(ltrim(max(sys_connect_by_path(str, ',')) keep (dense_rank last order by r), ','), ',') abbr
from (
select col, substr(col, decode(level, 1, 1, instr(col, ' ', 1, level - 1) + 1), 1) str, level, row_number() over (partition by col order by level) r
from data
connect by level <= length(col) - length(replace(col, ' ')) + 1
       and col = prior col
       and prior sys_guid() is not null
order by col, level
group by col
start with r = 1
connect by r - 1 = prior r
       and col = prior col
       and prior sys_guid() is not null;
COL                           ABBR
Balla Ram Chog Mal College    BRCMC
Cambridge Public School       CPS
Maharishi Dayanand University MDU
With 11g, you will not require the Outer query to concatenate the results, you can directly use LISTAGG as demonstrated by Hashim.

Regular Expression - Select two words after specific string

Hi,
I am trying to select the two words/strings after the first word "door". I am using the search pattern (?<=door).\w+ but in this case I get the complete text after the word "door". I only want to select the two words after the first "door" in the complete text.
Can anybody help me?
Thanks!
Marco Snels

Hi Marco,
I'm relatively handy with RegEx but this seems like a problem where I would employ a little bit of RegEx and CTL, just to make life easier.
You can use the following RegEx (note: I didn't test this in Integrator, only in a RegEx testing tool) to extract the two words after door (but including door, unfortunately):
(?:door)[\s]\w+[\s]\w+
This would give you something like the following in your extracted field:
door is brown
You could then pass through a re-formatter to remove "door" and the whitespace and be on your way. Not the best answer but should perform reasonably well and get you up and going.
Regards,
Patrick Rafferty
http://branchbird.com

Quick regular expression question/help

Can someone help me with two regular expressions I need. I could spend a while trying to figure it out myself, however times short and I really would like to get a fool proof optimal solution (my attempt would be buggy).
Sample sentence
The population, is projected to reach 200,000, or more (by 2020).[7] This is {dummy} text.
The first regular expression
I need all brackets and every thing between them to be removed from a sentence.
Brackets such as: ( ), [ ] and { } .
I.e. Given the above sentence the following would be returned:
The population, is projected to reach 200,000, or more. This is text.
The second regular expression
If a word has a trailing comma character I need to add a whitespace between the word and the comma.
I.e. Given the sentence returned from the first regular expression, this regex would return:
The population *,* is projected to reach 200,000 *,* or more. This is text.
Many thanks to anyonewho can help me with this!
Edited by: Myles on Jan 18, 2008 8:12 AM

http://java.sun.com/docs/books/tutorial/extra/regex/index.html
http://www.regular-expressions.info

SQL Injection and Java Regular Expression: How to match words?

Dear friends,
I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
Thanks,
Ricky
Edited by: Ricky Ru on 28/04/2011 02:29

Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

Regular Expression - Extract words before the PLUS Sign ?

Dear All,
I had many words with having a symbol plus. I need to extract the words before the plus sign.
I can able to do this by using String.indexOf or String.contains. But i like to know is there is any way to do this using Regular Expression.
sample string
Kathire+san Output Kathire
World+islike Output World
Thanks,
J.Kathir

Here's one way.
import java.util.regex.Pattern;
String input = "abc+def";
Pattern pat = pat.compile("\\+");
String beforePlus = pat.split(input)[0];
Sun's Regular Expression Tutorial for Java
Regular-Expressions.info

Regular expression to replace "emtpy space" ( ) bitween words with +

Hallo!
When I wish to find in code something like this:
12144541 FirstWord SecondWord
regular expression for that is:
(\d{1,100})[\s-]\D{1,100}[\s-]\D{1,100}
Now, please help me tu find regular expression to replace
"emtpy space" ( ) bitween words with +
12144541 FirstWord SecondWord to become
12144541+FirstWord+SecondWord
Thank you very, very, very much!

A simple-minded solution is to use \s to match all
whitespace; e.g. find \s and replace with +. DW CS3, at least, is
smart enough to not replace end of line characters with the '+'
character if you limit your search & replace to text.

Want to replace a string containing consecutive repeating words to one using regular expression

Hi Experts,
I need a regular expression to replace all duplicate words in a string with one.
eg: 'Hello Hello World 4-4-5 etc etc' should be changed to 'Hello World 4-4-5 etc'.
I tried many of them but they had one or the other problem. like (\w+\S\W)\1+' replace with ' \1' and ' (\w+\W)\1+' replaced with ' \1' , etc
Thanks in advance
Tarique

Hi,
Translating what frank said to JAVA would be something like this:
        StringBuffer result = new StringBuffer();
        String myString = "This is right right, that is wrong.";
        String[] words = myString.split(" ");
        String lastWord = "";
        for (String str : words){
            if (!str.contains(lastWord))
                result.append(str);
            else
                result.append(str.substring((lastWord.length() >= 0 ? lastWord.length() : 0 ) , str.length()));
            lastWord = str;
            result.append(" ");
        System.out.println(result);
If you didnt have points and commas in your message then would be easier. But the code is not 100% correct and you will need to make it work according to yours requirements.

Regular expression to check a value if it contains a specific word.

Hi All,
How can i check if a certain word exists in a value in regular expression ?
I have an attribute called Race. Race can contain the following:
White, Non-Hispanic
Black, Non-Hispanic
White, Non Hispanic
Black, Non Hispanic
White, NonHispanic
Non-Hispanic, white
Non Hispanic - black
What i want is to check if my value contains the word "NON" (NON can be at the beginning, middle or end), if it does, parse it and return it.
This is what I have, however I want to make sure it covers all cases and not missing anything else
select REGEXP_SUBSTR(UPPER(trim('Black, Non-Hispanic')), '[NON]+') from dual;Thanks in advance.

Rooney wrote:
Could you please explain what are the 2 ones's for ?The two 1 are not really needed for this. It is just taht the syntax requires those parameters when I add the fifth parameter.
http://docs.oracle.com/cd/E14072_01/server.112/e10592/functions148.htm
First 1 is where the search starts (same as in substr('Abc',1))
Second 1 is the number of occurences. Here meaning return the first occurence that was found. Replace it with 2 in my next example to see a (very slight) difference.
Also 'NON' alone will not cover all cases ?But you don't have non alone. You have regexp with non + upper. The 'i' replaces the upper. Also the output is slightly different. the 'i' version will return the same capitalization as it was found in the original. It depends a little what you want to achieve. And of cause INSTR will give the same info as your version. if the result is > 0 it means NON was found.
with testdata as (select 'White,Non-Hispanic' str from dual union all
                  select 'Non-White,nOn-Hispanic' str from dual union all
                  select 'White,Hispanic' str from dual
/* end of test data creation */
select str,
      REGEXP_SUBSTR(UPPER(TRIM(str)), 'NON') regexp1,
      REGEXP_SUBSTR(str, 'NON',1,1,'i') regexp2,
      instr(upper(str),'NON') instr
from testdata;
STR                    REGEXP1                REGEXP2                INSTR
White,Non-Hispanic     NON                    Non                        7
Non-White,Non-Hispanic NON                    Non                        1
White,Hispanic                                                           0

Regular expression on words with % wildcard

Hi,
I've got some processing working using regular expression where I need to process words e.g.
regexp_replace('word1 word2','(\w+)','myprefix{\1}') - results in - 'myprefixword1 myprefixword2'
However, if I'm presented with this; '%word0 word1% wo%d2 word3', then I need to treat % as special case and leave the word as is, so result here would be; - '%word0 word1% wo%d2 myprefixword3', is this achievable using regexp ?

And for those who don't know, I guess we should explain why we're having to expand single spaces to double spaces...
(I'll use the "¬" character to represent spaces to make it clearer to see)
If we have a string such as
word1¬word2¬word3and we want to identify the words in the string (without using any special regexp word identifier) then we are going to use the spaces to identify the start and end of words. To make life easy, we manually put a space at the start and end of the string so we can say that each word in the string will have a space before and after it regardless of where it is in the string...
¬word1¬word2¬word3¬However, when we specify what we want to search for we are going to say we want a space, followed by a number of characters (not spaces), followed by a space...
¬[^¬]*¬So, ideally, you'd expect it to look through the string and say
¬word1¬word2¬word3¬
\_____/... found word1
¬word1¬word2¬word3¬
      \_____/... found word2
¬word1¬word2¬word3¬
            \_____/... found word3
Unfortunately, there is a problem. Once the first word has been found the pointer for searching the rest of the string is located on the next character after the match i.e.
¬word1¬word2¬word3¬
       ^So it won't be able to pick out word2 and will only get to word3. Let's see it in action...
SQL> ed
Wrote file afiedt.buf
1 with t as (select ' word1 word2 word3 ' as txt from dual)
2 --
3 select regexp_replace(txt, ' [^ ]* ', 'xxxxx') as txt
4* from t
SQL> /
TXT
xxxxxword2xxxxx
SQL>In order to deal with this, if we replace the single spaces with double spaces (not required at the start and end) our string looks like...
¬word1¬¬word2¬¬word3¬So as it searches it finds word1 as a match and then the pointer in the string is located...
¬word1¬¬word2¬¬word3¬
       ^... so the next match for the pattern of space-characters-space is word2 and then the pointer is located...
¬word1¬¬word2¬¬word3¬
              ^... ready to find word 3. Example...
SQL> ed
Wrote file afiedt.buf
1 with t as (select ' word1 word2 word3 ' as txt from dual)
2 --
3 select regexp_replace(txt, ' [^ ]* ', 'xxxxx') as txt
4* from t
SQL> /
TXT
xxxxxxxxxxxxxxx
SQL>Hopefully that's a little clearer. You just have to remember the "pointer" principle and the fact that once a match is found it is located on the character after the match.
;)

Regular Expression - replaceAll() - how to replace words?

Hiya,
I have this regex to replace all instances of myWord:
String oldWord = "oldWord";
String newWord = "newWord";
String sentence = "some sentence that contains " + oldWord;
String newSentence = replaceWordsInSentence(sentence, oldWord, newWord);
private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
    return sentence.replaceAll("\b" + oldWord + "\b", newWord);
}...it works in most instances, but when oldWord is at the end of the sentence it is not replaced. Presumably the problem is that "/b" is not a sufficient word boundary. Can someone help me out with the correct regular expression code?
Thanks,
James

Mel, you did appear to misunderstand as you thought points 2 and 3 were alternatives, but you now recognise that they are additional "shoulds".
Of course, I applied the extra backslash as soon as Joachim advised. Maybe you don't agree with my rationale, but I prefer the complete solution that will work in all instances... so was simply waiting for him to post a code example that included the latter 2 points as (although I understood the point of them perfectly) I was not sure how to implement them.
Have come up with the following, expanded, method...
    private String replaceWordsInSentence(String sentence, String oldWord, String newWord) {
        return sentence.replaceAll("\\b" + Pattern.quote(oldWord) + "\\b", Matcher.quoteReplacement(newWord));
    }...works fine with the tests I have run. Joachim, can you confirm this is correct.

Help in regular expression matching

I have three expressions like
1) [(y2009)(y2011)]
2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
3) [(y2009M1d20)(y2011M12d31)]
i want regular expression pattern for the above three expressions
I am using :
REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
but its giving results for all above expressions while i want different expression for each.
i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
Thanks

I dont get your question Can you post your desired output? and also give some sample data.
Please consider the following when you post a question.
1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
You can use the following query and do a copy past of the output.
select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
that are asked the answer is already there.
3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
I have the following table called sales
with sales
as
      select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
      union all
      select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
select *
from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
For example in the above sales table, I want to know the total quantity and number of invoice for each product.
The output should look like this
Prod_id   sum_qty   count_inv
1         145       2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
use the {noformat}{noformat} tags.
The usage of the tag is like this.
<place your code here>\
7. If you are posting a *Performance Related Question*. Please read
   {thread:id=501834} and {thread:id=863295}.
   Following those guide will be very helpful.
8. Please keep in mind that this is a public forum. Here No question is URGENT.
   So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude.

Bracket in Regular Expression constant?

I am a bit puzzled by the behavior I am experiencing in LV 2011. I hope to get some light from experts out there.
I am trying to parse a messy ASCII header file and after having split it into individual lines (strings), I use the "Match Regular Expression" function to remove some of the info before the substantial information.
Some of the strings include square brackets ([, ]), which are special characters for the function, therefore, as documented in the help, one needs to precede them with a backslash.
Example:
I want to parse the following line:
#PR [PR_DEV,I,2]
One way (which I am using because of considerations related to the rest of the header) is the the following:
Note that the first string constant is using "Code Display" whereas the second one is using "Normal Display".
Why did I not put a backslash in front of the bracket in the first string, you may ask? Well, I did, but it disappeared after I typed the other characters. And reverting to "Normal Display" did not restore it.
Of course, the first version does not parse the input string correctly, whereas the second one does it fine.
In other words, the custom display string (which is convenient for cryptic codes such as \s* or to distinguish between space and tab...or simply ENTER tabs!) seems to mess up with the \[ combo (likewise with the \] one).
It is not a huge deal. I can use the "Normal Display" mode, but I tend to think that this qualifies as a hidden "feature". And again, it is still a pain in the ... when dealing with special characters such as tabs, etc...
Solved!
Go to Solution.

I think that [ is a special character which needs to be preceded by a backslash, but it is not one of the defined backslash characters (like \s). So, you need to put in two \\ to get one \ while in '\' Codes Display.
You can put in any character by using \xx where the xx is a hex character using only upper case letters for A..F. I converted the strings to byte arrays and tried to see what made the arrays match and the Match work.
Lynn

Regular Expression to spilt words

Similar Messages

Maybe you are looking for