Regular Expression to Locate Words with Character

I want to identify all the words in a document that are followed by the register mark (®) symbol.
I built, what I thought was a regular expression that would search for a register mark preceeded by alpha number characters and a space. So if my text contained the sentence "Adobe InDesign® is a great product.", the regular expression would find "InDesign®"
Below is the regular expression I composed. It grabs anything with a register mark, not just the register marks preceded by a space and alpha numeric characters. Where did I go wrong? I though the \s would restrict the search to complete words with a register mark.
\s[a-zA-Z0-9]|®

\s is the special GREP code for "any kind of space" -- a regular space, a tab, hard return, or any of ID's own white space codes. It has nothing to do with "complete words", because a word can appear at the start of a story, without any preceding space. It would also not find "InDesign®" because there is no space before it, there is a double quote instead.
Your GREP does not work because, well, you got the general idea (words may consist of the set of characters "a-z", "A-Z", and "0-9") but since you use the [..] without any other code, GREP will apply this rule once -- per character. If you want to find words of more than one character, you need to tell GREP "one or more of these, please": with a +.
Second, where did that | come from? It's the OR operator. Essentially, you are looking for
      any space followed by one character from the set "a-z", "A-Z", and "0-9"
OR
      the ® character
The 'word break' you were looking for is this code: \b, so you could search for "\b[a-zA-Z0-9]+" (note the '+' to allow more than one instance) -- but it's not necessary, because by default GREP grabs as much as it can. The set 'a-zA-Z0-9' etc. describes the allowed "word" characters, but you might want to prefer these: \l (ell) and \u for all lowercase and all uppercase characters -- they are shorter, and they automatically include accented characters, Greek, Russian, and a lot more. Similar, \d (for "digits") is the short-cut for "0-9". And even better: \w is the shortcut for "word character", i.e., your set but then shorter and a bit better.
Try this one:
\w+~r

Similar Messages

  • Regular expressions for replacing text with sms language text

    Hi, I'm trying to write a function which converts normal, correctly spelled text into the shorter sms language format but struggling to come up with the regular expressions i need to do so, can anyone help?
    1: remove surplus white space at the beginning of a sentence and at the end of a sentence.
    e.g. " hello." --> "hello." OR "hello ." --> "hello."
    2: remove preceeding and/or proceeding space if there's a word then a number possibly followed by another word
    e.g. "come 2 me" --> "come2me" OR "dnt 4get" --> "dnt4get"
    3: remove "aeiou" if word starts and ends with "!aeiou"
    e.g. "text" --> "txt"

    You can make the whitespace on either side optional:   text = text.replaceAll("\\s*(\\d)\\s*", "$1");1. Use String's trim() method.
    3. This one has to be done in two steps: import java.util.regex.*;
    public class Test
      public static void main(String... args) throws Exception
        String text = "The quick brown fox jumps over the lazy dog.";
        System.out.println(devowelize(text));
      public static String devowelize(String str)
        Pattern p = Pattern.compile(
          "[a-z&&[^aeiou]]++(?:[aeiou]++[a-z&&[^aeiou]]++)+",
          Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(str);
        StringBuffer sb = new StringBuffer();
        while (m.find())
          m.appendReplacement(sb, m.group().replaceAll("[aeiou]+", ""));
        m.appendTail(sb);
        return sb.toString();
    }

  • Regular Expression for non-words

    hello all!
    can you help me construct a regular expression that will match non-word strings say "������". I will be needing this to filter words from a Microsoft Word Document.
    Thanx!

    hello all!
    can you help me construct a regular expression that
    will match non-word strings say "������". I will
    be needing this to filter words from a Microsoft Word
    Document. I don't think this is a problem that should be solved with regex. You would have to convert your Word document to a String and use replaceAll() with "\\W" as the regex.
    Correct me if I am wrong but I thought that Word files were binary so your first problem will be to convert the file(s) to a String.

  • Regular Expression to Check number with at least one decimal point

    Hi,
    I would like to use the REGEX_LIKE to check a number with up to two digits and at least one decimal point:
    Ex.
    10.1
    1.1
    1
    12
    This is what I have so far. Any help would be appreciated. Thanks.
    if regexp_like(v_expr, '^(\d{0,2})+(\.[0-9]{1})?$') t

    Hi,
    Whenever you have a question, post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) for all the tables involved, and the results you want from that data.
    Explain, using specific examples, how you get those results from that data.
    Always say what version of Oracle you're using (e.g. 11.2.0.2.0).
    See the forum FAQ: https://forums.oracle.com/message/9362002
    SammyStyles wrote:
    Hi,
    I would like to use the REGEX_LIKE to check a number with up to two digits and at least one decimal point:
    Ex.
    10.1
    1.1
    1
    12
    This is what I have so far. Any help would be appreciated. Thanks.
    if regexp_like(v_expr, '^(\d{0,2})+(\.[0-9]{1})?$') t
    Do you really mean "up to two digits", that is, 3 or more digits is unacceptable?  What if there are 0 digits?  (0 is less than 2.)
    Do you really mean "at least one decimal point", that is, 2, 3, 4 or more decimal points are okay?  Include some examples when you post the sample data and results.
    It might be more efficient without regular expressions.  For example
    WHERE   TRANSLATE ( str              -- nothing except digits and dots
                      , 'A.0123456789'
                      , 'A'
                      )   IS NULL
    AND     str           LIKE '%.%'     -- at least 1 dot
    AND     LENGTH ( REPLACE ( str       -- up to 2 digits
                    )     <= 2

  • Regular Expression Abbreviation of Words

    Suppose I have got data in my column like
    Balla Ram Chog Mal College
    Maharishi Dayanand University
    Cambridge Public School
    Now I want to write a query using regular expressions to find out the abbreviations. e.g the resulting data set should be:
    BRCMC
    MDU
    CPS
    How should I write regexp for it ?

    One way, using SUBSTR and INSTR, tested on 10g.
    with data as
      select 'Balla Ram Chog Mal College' col from dual union all
      select 'Maharishi Dayanand University' col from dual union all
      select 'Cambridge Public School' col from dual
    select col, replace(ltrim(max(sys_connect_by_path(str, ',')) keep (dense_rank last order by r), ','), ',') abbr
      from (
    select col, substr(col, decode(level, 1, 1, instr(col, ' ', 1, level - 1) + 1), 1) str, level, row_number() over (partition by col order by level) r
      from data
    connect by level <= length(col) - length(replace(col, ' ')) + 1
           and col = prior col
           and prior sys_guid() is not null
    order by col, level
    group by col
    start with r = 1
    connect by r - 1 = prior r
           and col = prior col
           and prior sys_guid() is not null;
    COL                           ABBR
    Balla Ram Chog Mal College    BRCMC
    Cambridge Public School       CPS 
    Maharishi Dayanand University MDU
    With 11g, you will not require the Outer query to concatenate the results, you can directly use LISTAGG as demonstrated by Hashim.

  • Regular Expression Find and Replace with Wildcards

    Hi!
    For the world of me, I can't figure out the right way to do this.
    I basically have a list of last names, first names. I want the last name to have a different css style than the first name.
    So this is what I have now:
    <b>AAGAARD, TODD, S.</b><br>
    <b>AAMOT, KARI,</b> <br>
    <b>AARON, MARJORIE, C. </b> <br>
    and this is what I need to have:
    <span class="LastName">AAGAARD</span>  <span class="FirstName">, TODD, S. </span> <br />
    <span class="LastName">AAMOT</span> <span class="FirstName">, KARI,</span> <br/>
    <span class="LastName">AARON</span> <span class="FirstName">, MARJORIE, C.</span> <br/>
    Any ideas?
    Thanks!

    Make a backup first.
    In the Find field use:
    <b>(\w+),\s+([^<]+)<\/b>\s*<br>
    In the Replace field use:
    <span class="LastName">$1</span> <span classs="FirstName">$2</span><br />
    Select Use regular expression. Light the blue touch paper, and click Replace All.

  • Regular Expression to spilt words

    Hi all,
    i want to split the last word in string, after found last space the maximum lenght of string is five words.
    i used the follwoing query not working ok .
    SQL> SELECT REGEXP_SUBSTR('system hello sidval',
      2  '[a-z]+\S+') RESULT
      3  FROM DUAL;
    RESULT
    system
    SQL> examples
    1-  if string is
    Daivd  from  uk    
    output is   uk if string is
    David john
    output is
    john the maximum lenght of string is five words
    regards
    Edited by: Ayham on Oct 7, 2012 12:01 PM
    Edited by: Ayham on Oct 7, 2012 12:18 PM

    Ayham wrote:
    Hi all,
    i want to split the last word in string, after found last space the maximum lenght of string is five words.
    i used the follwoing query not working ok .
    Try thisSQL> SELECT REGEXP_SUBSTR('system hello sidval',  '[a-z]+\S*$') RESULT  FROM DUAL; The extra <tt>$</tt> tells the regex to match the end of the line. the <tt>*</tt> instead of the <tt>+</tt> does also match if the line does not ent with a space character.
    bye
    TPD
    Edited by: TPD Opitz-Consulting com on 07.10.2012 21:35

  • Regular expression matches string starts with &

    Hello,
    I am trying to write a Reg Exp that removes any string starts with "&" and Ends with ";" . In other words, I am trying to remove anything similar to:
    & nbsp;  & quot; & lt; & gt;  Any help please.
    This does not work:
    select regexp_replace(ename, '^&[a-z]{2,4}[;]$') from emp;Regards,
    Fateh

    Fateh wrote:
    I am trying to write a Reg Exp that removes any string starts with "&" and Ends with ";" . In other words, I am trying to remove anything similar to:
    & nbsp;  & quot; & lt; & gt; 
    Those are entity references (without the whitespace after '&').
    Do you really want to remove them, or do you actually want to convert them back to their corresponding characters but don't know how to do it?
    SQL> set scan off
    SQL> select utl_i18n.unescape_reference('&#38;quot;Test&#38;quot;:&#38;nbsp;3&#38;gt;2') from dual;
    UTL_I18N.UNESCAPE_REFERENCE('&
    "Test": 3>2

  • Regular Expression - Select two words after specific string

    Hi,
    I am trying to select the two words/strings after the first word "door". I am using the search pattern (?<=door).\w+ but in this case I get the complete text after the word "door". I only want to select the two words after the first "door" in the complete text.
    Can anybody help me?
    Thanks!
    Marco Snels

    Hi Marco,
    I'm relatively handy with RegEx but this seems like a problem where I would employ a little bit of RegEx and CTL, just to make life easier.
    You can use the following RegEx (note: I didn't test this in Integrator, only in a RegEx testing tool) to extract the two words after door (but including door, unfortunately):
    (?:door)[\s]\w+[\s]\w+
    This would give you something like the following in your extracted field:
    door is brown
    You could then pass through a re-formatter to remove "door" and the whitespace and be on your way. Not the best answer but should perform reasonably well and get you up and going.
    Regards,
    Patrick Rafferty
    http://branchbird.com

  • Match Regular Expression won't work with Null

    Is that right? I don't see it in the documentation. I can use it on \01 , just not \00.
    Is there a way around this problem? I know that Match Pattern works, but I want to use it with separate partial matches (a|b) which Match Pattern does not support.

    Here's a possibility:
    If you try to set the constant "\00" to "\0" with the '\' Code Display on, it just converts it back to "\00" on the display.
    The function uses the PCRE library.  From the library documentation (the pcrepattern man page):
    "After \0 up to two further octal digits are read. In both cases, if there
    are fewer than two digits, just those that are present are used. Thus the
    sequence \0\x\07 specifies two binary zeros followed by a BEL character
    (code value 7). Make sure you supply two digits after the initial zero if the
    pattern character that follows is itself an octal digit."
    So, what if, behind the scenes, LV is actually feeding the match function just a "\0"?  I'm guessing (but haven't been able to verify) it would match *any* input string, immediately, with an offset of zero.  Testing with random search strings shows behavior that might indicate this.
    If the above is true, getting around it might be hard, since you're at the mercy of LV as to exactly how it calls that library.
    Fun stuff... okay, back to work with me.  Good luck,
    Joe Z.

  • Find text using regular expression and add highlight annotation

    Hi Friends
                       Is it possible to find text using regular expression and add highlight annotation using plugin

    A plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.

  • Quick regular expression question/help

    Can someone help me with two regular expressions I need. I could spend a while trying to figure it out myself, however times short and I really would like to get a fool proof optimal solution (my attempt would be buggy).
    Sample sentence
    The population, is projected to reach 200,000, or more (by 2020).[7] This is {dummy} text.
    The first regular expression
    I need all brackets and every thing between them to be removed from a sentence.
    Brackets such as: ( ), [ ] and { } .
    I.e. Given the above sentence the following would be returned:
    The population, is projected to reach 200,000, or more. This is text.
    The second regular expression
    If a word has a trailing comma character I need to add a whitespace between the word and the comma.
    I.e. Given the sentence returned from the first regular expression, this regex would return:
    The population *,* is projected to reach 200,000 *,* or more. This is text.
    Many thanks to anyonewho can help me with this!
    Edited by: Myles on Jan 18, 2008 8:12 AM

    http://java.sun.com/docs/books/tutorial/extra/regex/index.html
    http://www.regular-expressions.info

  • Finding Words with more than Two Vowels (Regex)

    Hello all, I've been working on this for quite some time now.  I need to use a regular expression to find words that contain more than two vowels.  I am getting stuck.
    Here is what I have so far.  I am using emacs to find them in a text file.
    I use C-M-s and the expression /<[^aeiou]*[aeiou][^aeiou]/>
    It finds words with one vowel, but I need to find if it has more than two, and I'm not sure how to go about doing that.
    Any help is appreciated!

    alphaniner wrote:
    This better not be a homework question...
    [aeiou].*[aeiou].*[aeiou]
    or, more succinctly (I think...)
    \([aeiou].*\)\{3\}
    I tested it with grep on a file with one word per line.  Seems to work in that context.  More than one word per line and it breaks.  I know nothing of emacs or your data, so I have no idea if it will suffice.
    I'd also suggest you go back over your expression and put into words exactly what you think it is doing.  I'm no regex expert, but it doesn't seem at all fit for what you're trying to do.
    Thanks that seemed to work!

  • Regular Expression Q.

    Dear all,
    I have been try to remove non printable characters from a string but wants to exclude CHR(10). How can it be done?. I have tried below but strips out every non printable character including the line-feed character. I have tried the \0xA0 but it is ignored by oracle since the documentation says that oracle evaluate by byte and not the display character. Any help is much appreciated. Thanks.
    SELECT regex_replace( address, '[[:cntrl:]]','') FROM emp_data;
    Regards,
    Kueh.

    KA Kueh wrote:
    I wanted to strip out all control character except the chr(10) and with the [:cntrl:] character class it will strip out all control characters inclusive of the chr(10). So your solution still does not do the trick. Thanks.
    Oops, I missed that.You could use regexp_replace to produce a list of control characters in address string, then strip CHR(10) and also strip CHR(0) which either has a special meaning to regexp or is a bug:
    SQL> select regexp_replace('A'||chr(0)||'B',chr(0)) from dual;
    REG
    A B
    SQL>
    SQL> select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual;
    select regexp_replace('A'||chr(0)||'B','['||chr(0)||']') from dual
    ERROR at line 1:
    ORA-12726: unmatched bracket in regular expression
    SQL> select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual;
    select regexp_replace('A'||chr(0)||'B','[\'||chr(0)||']') from dual
    ERROR at line 1:
    ORA-12726: unmatched bracket in regular expression
    SQL> Anyway:
    SQL> with t as (
      2             select 'ABC'||CHR(0)||CHR(1)||CHR(10)||CHR(11)||'DEF'||CHR(5)||CHR(1)||'GHI' str from dual
      3            )
      4  select  regexp_replace(
      5                         replace(str,chr(0)),
      6                         '['||replace(regexp_replace(replace(str,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
      7                        )
      8    from  t
      9  /
    REGEXP_REP
    ABC
    DEFGHI
    SQL> So you can try:
    select  regexp_replace(
                           replace(address,chr(0)),
                           '['||replace(regexp_replace(replace(address,chr(0)),'[^[:cntrl:]]'),chr(10))||']'
      from  emp_data
    / SY.

  • I need help renaming a file using regular expressions in Bridge.

    Hi,
    I work at a university, and we are working through files for our Thesis and Dissertations. We have been renaming them to make them more consistent. I am just wondering if there is a regular expression that could help with this process?
    Here is come examples of current file names;
    THESIS 1981 H343G
    Thesis 1981 g996e
    THESIS-1981-A543G
    I don't need to change the actual names of the files. just how they are formatted.
    Proper case on Thesis.
    Hyphens(-) in all white space.
    First letter capital, last letter lowercase on the call no (H343g)
    So the list above should look like;
    Thesis-1981-H343g
    Thesis-1981-G996e
    Thesis-1981-A543g
    I have seen people do some pretty cool things with regular expressions! Any help would be greatly appreciated. Thanks!

    You would be better off using a script to do this as an example as I don't think it would be possible in the Bridge re-name.
    Using ExtendScript Toolkit or a Plain text editor copy the code into either and save it out as Filename.jsx
    This needs to be saved into the correct folder. this is found by going to the preferences in Bridge, selecting Startup Scripts, this will open the folder where the script is to be saved.
    Once this is done close and re-start Bridge.
    To Use: Goto the Tools Menu and select Rename PDFs
    Make sure you test the code with a few copied files into a seperate folder first to make sure it does what you want.
    The script will do all PDF files in the selected folder.
    #target bridge 
    if( BridgeTalk.appName == "bridge" ) { 
    renamePDFs = MenuElement.create("command", "Rename PDFs", "at the end of Tools");
    renamePDFs.onSelect = function () {
    app.document.deselectAll();
    var thumbs = app.document.getSelection("pdf");
    for( var z in thumbs){
    var Name = decodeURI(thumbs[z].spec.name);
    var parts = Name.toLowerCase().replace(/\s/g,'-').match(/(.*)(-)(.*)(-)(.*)(\.pdf)/);
    var NewName = parts[1].replace(/^[a-z]/, function(s){ return s.toUpperCase() });
    NewName += parts[2]+parts[3]+parts[4]+parts[5].toUpperCase().replace(/[A-Z]$/, function(s){ return s.toLowerCase() });
    NewName += parts[6];
    thumbs[z].spec.rename(NewName);

Maybe you are looking for

  • Issue while UNDELETE the PO line item

    Hi MM Sapperu2019s, I am facing an issue where one of my user wrongly activated the deletion for 1 line item in the PO and now when I tried to undelete the PO line item it was giving me the below error, Error as Requisition XXXXXX, item XXXX, assigne

  • No VGA output

    Hi all, I have a small problem I have just purchased a new K7N2 Delta motherboard, from Scan.com.   I am running an Asus GeForce 4 V9480/TVD graphic card,  with 1 Gb of DDR 2700 memory and a 120Gb HDD, with Windows XP Pro. Now the problem I have is t

  • Does Microsoft have a SQL Server driver for Node.js

    I want to create a Mobile Backend with Node.js, and I was wondering whether Microsoft provided a SQL Server driver for Node.js. If not, what driver module should I use? 

  • Disp+work.exe  in yellow!!

    Hi, I have recently installed SAP R/3 4.7 in Windows 2000 Server with SAP DB as database. When i try to client copy from 000, my system got shutdown. When i try to start SAP again, the dispwork.exe is in yellow and says, "dispwork.exe running but dia

  • Website not completely visible in safari

    Website not completely visible in Safari. How to solve this?