Regular Expression to Locate Words with Character

I want to identify all the words in a document that are followed by the register mark (®) symbol.
I built, what I thought was a regular expression that would search for a register mark preceeded by alpha number characters and a space. So if my text contained the sentence "Adobe InDesign® is a great product.", the regular expression would find "InDesign®"
Below is the regular expression I composed. It grabs anything with a register mark, not just the register marks preceded by a space and alpha numeric characters. Where did I go wrong? I though the \s would restrict the search to complete words with a register mark.
\s[a-zA-Z0-9]|®

\s is the special GREP code for "any kind of space" -- a regular space, a tab, hard return, or any of ID's own white space codes. It has nothing to do with "complete words", because a word can appear at the start of a story, without any preceding space. It would also not find "InDesign®" because there is no space before it, there is a double quote instead.
Your GREP does not work because, well, you got the general idea (words may consist of the set of characters "a-z", "A-Z", and "0-9") but since you use the [..] without any other code, GREP will apply this rule once -- per character. If you want to find words of more than one character, you need to tell GREP "one or more of these, please": with a +.
Second, where did that | come from? It's the OR operator. Essentially, you are looking for
any space followed by one character from the set "a-z", "A-Z", and "0-9"
OR
the ® character
The 'word break' you were looking for is this code: \b, so you could search for "\b[a-zA-Z0-9]+" (note the '+' to allow more than one instance) -- but it's not necessary, because by default GREP grabs as much as it can. The set 'a-zA-Z0-9' etc. describes the allowed "word" characters, but you might want to prefer these: \l (ell) and \u for all lowercase and all uppercase characters -- they are shorter, and they automatically include accented characters, Greek, Russian, and a lot more. Similar, \d (for "digits") is the short-cut for "0-9". And even better: \w is the shortcut for "word character", i.e., your set but then shorter and a bit better.
Try this one:
\w+~r

Similar Messages

Regular expressions for replacing text with sms language text

Hi, I'm trying to write a function which converts normal, correctly spelled text into the shorter sms language format but struggling to come up with the regular expressions i need to do so, can anyone help?
1: remove surplus white space at the beginning of a sentence and at the end of a sentence.
e.g. " hello." --> "hello." OR "hello ." --> "hello."
2: remove preceeding and/or proceeding space if there's a word then a number possibly followed by another word
e.g. "come 2 me" --> "come2me" OR "dnt 4get" --> "dnt4get"
3: remove "aeiou" if word starts and ends with "!aeiou"
e.g. "text" --> "txt"

You can make the whitespace on either side optional: text = text.replaceAll("\\s*(\\d)\\s*", "$1");1. Use String's trim() method.
3. This one has to be done in two steps: import java.util.regex.*;
public class Test
public static void main(String... args) throws Exception
    String text = "The quick brown fox jumps over the lazy dog.";
    System.out.println(devowelize(text));
public static String devowelize(String str)
    Pattern p = Pattern.compile(
      "[a-z&&[^aeiou]]++(?:[aeiou]++[a-z&&[^aeiou]]++)+",
      Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(str);
    StringBuffer sb = new StringBuffer();
    while (m.find())
      m.appendReplacement(sb, m.group().replaceAll("[aeiou]+", ""));
    m.appendTail(sb);
    return sb.toString();
}

Regular Expression for non-words

hello all!
can you help me construct a regular expression that will match non-word strings say "��". I will be needing this to filter words from a Microsoft Word Document.
Thanx!

hello all!
can you help me construct a regular expression that
will match non-word strings say "��". I will
be needing this to filter words from a Microsoft Word
Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

Regular Expression to Check number with at least one decimal point

Hi,
I would like to use the REGEX_LIKE to check a number with up to two digits and at least one decimal point:
Ex.
10.1
1.1
1
12
This is what I have so far. Any help would be appreciated. Thanks.
if regexp_like(v_expr, '^(\d{0,2})+(\.[0-9]{1})?$') t

Hi,
Whenever you have a question, post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) for all the tables involved, and the results you want from that data.
Explain, using specific examples, how you get those results from that data.
Always say what version of Oracle you're using (e.g. 11.2.0.2.0).
See the forum FAQ: https://forums.oracle.com/message/9362002
SammyStyles wrote:
Hi,
I would like to use the REGEX_LIKE to check a number with up to two digits and at least one decimal point:
Ex.
10.1
1.1
1
12
This is what I have so far. Any help would be appreciated. Thanks.
if regexp_like(v_expr, '^(\d{0,2})+(\.[0-9]{1})?$') t
Do you really mean "up to two digits", that is, 3 or more digits is unacceptable? What if there are 0 digits? (0 is less than 2.)
Do you really mean "at least one decimal point", that is, 2, 3, 4 or more decimal points are okay? Include some examples when you post the sample data and results.
It might be more efficient without regular expressions. For example
WHERE TRANSLATE ( str -- nothing except digits and dots
 , 'A.0123456789'
 , 'A'
 ) IS NULL
AND str LIKE '%.%' -- at least 1 dot
AND LENGTH ( REPLACE ( str -- up to 2 digits
 ) <= 2

Regular Expression Abbreviation of Words

Suppose I have got data in my column like
Balla Ram Chog Mal College
Maharishi Dayanand University
Cambridge Public School
Now I want to write a query using regular expressions to find out the abbreviations. e.g the resulting data set should be:
BRCMC
MDU
CPS
How should I write regexp for it ?

One way, using SUBSTR and INSTR, tested on 10g.
with data as
select 'Balla Ram Chog Mal College' col from dual union all
select 'Maharishi Dayanand University' col from dual union all
select 'Cambridge Public School' col from dual
select col, replace(ltrim(max(sys_connect_by_path(str, ',')) keep (dense_rank last order by r), ','), ',') abbr
from (
select col, substr(col, decode(level, 1, 1, instr(col, ' ', 1, level - 1) + 1), 1) str, level, row_number() over (partition by col order by level) r
from data
connect by level <= length(col) - length(replace(col, ' ')) + 1
 and col = prior col
 and prior sys_guid() is not null
order by col, level
group by col
start with r = 1
connect by r - 1 = prior r
 and col = prior col
 and prior sys_guid() is not null;
COL ABBR
Balla Ram Chog Mal College BRCMC
Cambridge Public School CPS
Maharishi Dayanand University MDU
With 11g, you will not require the Outer query to concatenate the results, you can directly use LISTAGG as demonstrated by Hashim.

Regular Expression Find and Replace with Wildcards

Hi!
For the world of me, I can't figure out the right way to do this.
I basically have a list of last names, first names. I want the last name to have a different css style than the first name.
So this is what I have now:
AAGAARD, TODD, S. 
AAMOT, KARI, 
AARON, MARJORIE, C. 
and this is what I need to have:
AAGAARD , TODD, S. 
AAMOT , KARI, 
AARON , MARJORIE, C. 
Any ideas?
Thanks!

Make a backup first.
In the Find field use:
(\w+),\s+([^<]+)<\/b>\s* 
In the Replace field use:
$1 $2 
Select Use regular expression. Light the blue touch paper, and click Replace All.

Regular Expression to spilt words

Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
SQL> SELECT REGEXP_SUBSTR('system hello sidval',
2 '[a-z]+\S+') RESULT
3 FROM DUAL;
RESULT
system
SQL> examples
1- if string is
Daivd from uk
output is uk if string is
David john
output is
john the maximum lenght of string is five words
regards
Edited by: Ayham on Oct 7, 2012 12:01 PM
Edited by: Ayham on Oct 7, 2012 12:18 PM

Ayham wrote:
Hi all,
i want to split the last word in string, after found last space the maximum lenght of string is five words.
i used the follwoing query not working ok .
Try thisSQL> SELECT REGEXP_SUBSTR('system hello sidval', '[a-z]+\S*$') RESULT FROM DUAL; The extra <tt>$</tt> tells the regex to match the end of the line. the <tt>*</tt> instead of the <tt>+</tt> does also match if the line does not ent with a space character.
bye
TPD
Edited by: TPD Opitz-Consulting com on 07.10.2012 21:35

Regular expression matches string starts with &

Hello,
I am trying to write a Reg Exp that removes any string starts with "&" and Ends with ";" . In other words, I am trying to remove anything similar to:
& nbsp; & quot; & lt; & gt; Any help please.
This does not work:
select regexp_replace(ename, '^&[a-z]{2,4}[;]$') from emp;Regards,
Fateh

Fateh wrote:
I am trying to write a Reg Exp that removes any string starts with "&" and Ends with ";" . In other words, I am trying to remove anything similar to:
& nbsp; & quot; & lt; & gt;
Those are entity references (without the whitespace after '&').
Do you really want to remove them, or do you actually want to convert them back to their corresponding characters but don't know how to do it?
SQL> set scan off
SQL> select utl_i18n.unescape_reference('&quot;Test&quot;:&nbsp;3&gt;2') from dual;
UTL_I18N.UNESCAPE_REFERENCE('&
"Test": 3>2

Regular Expression - Select two words after specific string

Hi,
I am trying to select the two words/strings after the first word "door". I am using the search pattern (?<=door).\w+ but in this case I get the complete text after the word "door". I only want to select the two words after the first "door" in the complete text.
Can anybody help me?
Thanks!
Marco Snels

Hi Marco,
I'm relatively handy with RegEx but this seems like a problem where I would employ a little bit of RegEx and CTL, just to make life easier.
You can use the following RegEx (note: I didn't test this in Integrator, only in a RegEx testing tool) to extract the two words after door (but including door, unfortunately):
(?:door)[\s]\w+[\s]\w+
This would give you something like the following in your extracted field:
door is brown
You could then pass through a re-formatter to remove "door" and the whitespace and be on your way. Not the best answer but should perform reasonably well and get you up and going.
Regards,
Patrick Rafferty
http://branchbird.com

Match Regular Expression won't work with Null

Is that right? I don't see it in the documentation. I can use it on \01 , just not \00.
Is there a way around this problem? I know that Match Pattern works, but I want to use it with separate partial matches (a|b) which Match Pattern does not support.

Here's a possibility:
If you try to set the constant "\00" to "\0" with the '\' Code Display on, it just converts it back to "\00" on the display.
The function uses the PCRE library. From the library documentation (the pcrepattern man page):
"After \0 up to two further octal digits are read. In both cases, if there
are fewer than two digits, just those that are present are used. Thus the
sequence \0\x\07 specifies two binary zeros followed by a BEL character
(code value 7). Make sure you supply two digits after the initial zero if the
pattern character that follows is itself an octal digit."
So, what if, behind the scenes, LV is actually feeding the match function just a "\0"? I'm guessing (but haven't been able to verify) it would match *any* input string, immediately, with an offset of zero. Testing with random search strings shows behavior that might indicate this.
If the above is true, getting around it might be hard, since you're at the mercy of LV as to exactly how it calls that library.
Fun stuff... okay, back to work with me. Good luck,
Joe Z.

Find text using regular expression and add highlight annotation

Hi Friends
Is it possible to find text using regular expression and add highlight annotation using plugin

A plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.

Quick regular expression question/help

Can someone help me with two regular expressions I need. I could spend a while trying to figure it out myself, however times short and I really would like to get a fool proof optimal solution (my attempt would be buggy).
Sample sentence
The population, is projected to reach 200,000, or more (by 2020).[7] This is {dummy} text.
The first regular expression
I need all brackets and every thing between them to be removed from a sentence.
Brackets such as: ( ), [ ] and { } .
I.e. Given the above sentence the following would be returned:
The population, is projected to reach 200,000, or more. This is text.
The second regular expression
If a word has a trailing comma character I need to add a whitespace between the word and the comma.
I.e. Given the sentence returned from the first regular expression, this regex would return:
The population *,* is projected to reach 200,000 *,* or more. This is text.
Many thanks to anyonewho can help me with this!
Edited by: Myles on Jan 18, 2008 8:12 AM

http://java.sun.com/docs/books/tutorial/extra/regex/index.html
http://www.regular-expressions.info

Finding Words with more than Two Vowels (Regex)

Hello all, I've been working on this for quite some time now. I need to use a regular expression to find words that contain more than two vowels. I am getting stuck.
Here is what I have so far. I am using emacs to find them in a text file.
I use C-M-s and the expression /<[^aeiou]*[aeiou][^aeiou]/>
It finds words with one vowel, but I need to find if it has more than two, and I'm not sure how to go about doing that.
Any help is appreciated!

alphaniner wrote:
This better not be a homework question...
[aeiou].*[aeiou].*[aeiou]
or, more succinctly (I think...)
$[aeiou].*$\{3\}
I tested it with grep on a file with one word per line. Seems to work in that context. More than one word per line and it breaks. I know nothing of emacs or your data, so I have no idea if it will suffice.
I'd also suggest you go back over your expression and put into words exactly what you think it is doing. I'm no regex expert, but it doesn't seem at all fit for what you're trying to do.
Thanks that seemed to work!

Regular Expression Q.

Dear all,
I have been try to remove non printable characters from a string but wants to exclude CHR(10). How can it be done?. I have tried below but strips out every non printable character including the line-feed character. I have tried the \0xA0 but it is ignored by oracle since the documentation says that oracle evaluate by byte and not the display character. Any help is much appreciated. Thanks.
SELECT regex_replace( address, '[[:cntrl:]]','') FROM emp_data;
Regards,
Kueh.

KA Kueh wrote:
I wanted to strip out all control character except the chr(10) and with the [:cntrl:] character class it will strip out all control characters inclusive of the chr(10). So your solution still does not do the trick. Thanks.
Oops, I missed that.You could use regexp_replace to produce a list of control characters in address string, then strip CHR(10) and also strip CHR(0) which either has a special meaning to regexp or is a bug:
SQL> select regexp_replace('A'||chr(0)||'B',chr(0)) from dual;
REG
A B
SQL>
SQL> select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual;
select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual
ERROR at line 1:
ORA-12726: unmatched bracket in regular expression
SQL> select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual;
select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual
ERROR at line 1:
ORA-12726: unmatched bracket in regular expression
SQL> Anyway:
SQL> with t as (
2             select 'ABC'||CHR(0)||CHR(1)||CHR(10)||CHR(11)||'DEF'||CHR(5)||CHR(1)||'GHI' str from dual
3            )
4 select regexp_replace(
5                         replace(str,chr(0)),
6                         '['||replace(regexp_replace(replace(str,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
7                        )
8    from t
9 /
REGEXP_REP
ABC
DEFGHI
SQL> So you can try:
select regexp_replace(
                       replace(address,chr(0)),
                       '['||replace(regexp_replace(replace(address,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
from emp_data
/ SY.

I need help renaming a file using regular expressions in Bridge.

Hi,
I work at a university, and we are working through files for our Thesis and Dissertations. We have been renaming them to make them more consistent. I am just wondering if there is a regular expression that could help with this process?
Here is come examples of current file names;
THESIS 1981 H343G
Thesis 1981 g996e
THESIS-1981-A543G
I don't need to change the actual names of the files. just how they are formatted.
Proper case on Thesis.
Hyphens(-) in all white space.
First letter capital, last letter lowercase on the call no (H343g)
So the list above should look like;
Thesis-1981-H343g
Thesis-1981-G996e
Thesis-1981-A543g
I have seen people do some pretty cool things with regular expressions! Any help would be greatly appreciated. Thanks!

You would be better off using a script to do this as an example as I don't think it would be possible in the Bridge re-name.
Using ExtendScript Toolkit or a Plain text editor copy the code into either and save it out as Filename.jsx
This needs to be saved into the correct folder. this is found by going to the preferences in Bridge, selecting Startup Scripts, this will open the folder where the script is to be saved.
Once this is done close and re-start Bridge.
To Use: Goto the Tools Menu and select Rename PDFs
Make sure you test the code with a few copied files into a seperate folder first to make sure it does what you want.
The script will do all PDF files in the selected folder.
#target bridge
if( BridgeTalk.appName == "bridge" ) {
renamePDFs = MenuElement.create("command", "Rename PDFs", "at the end of Tools");
renamePDFs.onSelect = function () {
app.document.deselectAll();
var thumbs = app.document.getSelection("pdf");
for( var z in thumbs){
var Name = decodeURI(thumbs[z].spec.name);
var parts = Name.toLowerCase().replace(/\s/g,'-').match(/(.*)(-)(.*)(-)(.*)(\.pdf)/);
var NewName = parts[1].replace(/^[a-z]/, function(s){ return s.toUpperCase() });
NewName += parts[2]+parts[3]+parts[4]+parts[5].toUpperCase().replace(/[A-Z]$/, function(s){ return s.toLowerCase() });
NewName += parts[6];
thumbs[z].spec.rename(NewName);

Regular Expression to Locate Words with Character

Similar Messages

Maybe you are looking for