Regular expressions for replacing text with sms language text

Hi, I'm trying to write a function which converts normal, correctly spelled text into the shorter sms language format but struggling to come up with the regular expressions i need to do so, can anyone help?
1: remove surplus white space at the beginning of a sentence and at the end of a sentence.
e.g. " hello." --> "hello." OR "hello ." --> "hello."
2: remove preceeding and/or proceeding space if there's a word then a number possibly followed by another word
e.g. "come 2 me" --> "come2me" OR "dnt 4get" --> "dnt4get"
3: remove "aeiou" if word starts and ends with "!aeiou"
e.g. "text" --> "txt"

You can make the whitespace on either side optional:   text = text.replaceAll("\\s*(\\d)\\s*", "$1");1. Use String's trim() method.
3. This one has to be done in two steps: import java.util.regex.*;
public class Test
  public static void main(String... args) throws Exception
    String text = "The quick brown fox jumps over the lazy dog.";
    System.out.println(devowelize(text));
  public static String devowelize(String str)
    Pattern p = Pattern.compile(
      "[a-z&&[^aeiou]]++(?:[aeiou]++[a-z&&[^aeiou]]++)+",
      Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(str);
    StringBuffer sb = new StringBuffer();
    while (m.find())
      m.appendReplacement(sb, m.group().replaceAll("[aeiou]+", ""));
    m.appendTail(sb);
    return sb.toString();
}

Similar Messages

  • Regular Expressions for converting HTML to Structured Plain Text

    I'm writing a PL/SQL function that will convert HTML to plain text, but still preserve some of the formatting/line breaks. One of my challenges is in writing a regular expression to capture the text blocks while ignoring the markup. I'm trying to write an expression that will grab all of the text between start/end tags, but discard the tags. For example, to find all of the text between a start/end paragraph, I want to do something like:
    REGEXP_REPLACE('<p style="text-align:center&#59;">This is the body of the paragraph</p>', '<p.*>(.*)</p>', '\1||v_crlf' )
    where \1 returns the contents of the paragraph and v_crlf (declared earlier in the function) inserts a line break. I know there are more general expressions that will remove all tags, but I want to specifically identify the tags so I can process them appropriately. This way I can easily convert HTML to plain text for email and reporting without having to keep two versions around. Any help would be greatly appreciated. Once I get this worked out, I will repost with the function code for others to use. Thanks.
    Edited by: jritschel on Oct 26, 2010 9:58 AM

    Here's a function I wrote for an app. I'm not making in promises on it's accuracy as the app was just a proof of concept and never made it to production.
    function strip_html( p_clob in clob )
    return clob
    is
        l_out clob;
        l_test  number := 0;
        l_max_loops constant number := 20;
        i   pls_integer := 0;
    begin
        l_out := regexp_replace(p_clob,'<br>|<br />',chr(13)||chr(10),1,0,'imn');
        l_out := regexp_replace(l_out,'<p>',chr(13)||chr(10),1,0,'imn');
        l_out := replace(l_out,'<li>',chr(13)||chr(10)||'*<li>');
        l_out := regexp_replace(l_out,'<b>(.+?)</b>','*\1*',1,0,'imn');
        l_out := regexp_replace(l_out,'<u>(.+?)</u>','_\1_',1,0,'imn');
        loop
            l_test := regexp_instr(l_out,'<([A-Z][A-Z0-9]*)[^>]*>.*?</\1>',1,1,0,'imn');
            exit when l_test = 0 or i > l_max_loops;
            l_out := regexp_replace(l_out,'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>','\2',1,0,'imn');
            i := i + 1;
        end loop;
        return l_out;
    end strip_html;{code}
    The loop is there to handle nested HTML.
    Tyler Muth
    http://tylermuth.wordpress.com
    "Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book
    Edited by: Tyler on Oct 26, 2010 10:03 AM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Regular Expression for replacing Leading Spaces

    I don't claim to be any expert on Regular Expressions and even after reading CD's introduction to Reg Exp, I still can't figure out this one which I'm sure must be very basic.
    I want to replace all the leading spaces in a string with "." chrs. I could do this using the common replace/substr/instr functions, but I reckoned it would be possible in a single regular regexp_replace call.
    So far I've got this...
    SQL> select regexp_replace('      FRED BLOGS    WAS HERE    ', '^([:space:])*', '.')
      2  as result
      3  from dual;
    RESULT
    .      FRED BLOGS    WAS HERE
    SQL>Which is replacing the start of line with a "." and not the spaces.
    But I want my result to be:-
    RESULT
    ......FRED BLOGS    WAS HERE
    SQL>Cheers

    That was very good solution .
    Can you explain me the significance of "| " in the code, other things I could trace out.
    I try to run the code with the 2 cases
    when I give a space after | symbol it prints the * many times
    SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)|','*\1')
    2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
    COL1
    REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
    FRED BLOGS WAS HERE
    *FRED BLOGS    WAS HERE
    SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)| ','*\1')
    2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
    COL1
    REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
    FRED BLOGS WAS HERE
    ******FRED BLOGS WAS HERE

  • Regular expression for replace

    I have the following sample values in a table column ( first name, middle initial, last name)
    Assemblymember Von K. Wright
    Assemblyman Kalvin J. Rowling
    Assemblywoman Debby J. Sanders
    How can I write a regular expression where I can extract the above values as follows
    Wright V
    Rowling K
    Sander D
    Thanks

    possibility of some other ways too...
    with rt as
    (select 'Von K. Wright' str from dual union all
    select 'Kalvin J. Rowling' from dual union all
    select 'Debby J. Sanders' from dual)
    select str,regexp_replace(str, '(.*) (.*) (.*)', '\3 \1') str from rt;
    Row#     STR     STR_1
    1     Von K. Wright     Wright Von
    2     Kalvin J. Rowling     Rowling Kalvin
    3     Debby J. Sanders     Sanders Debby
    with rt as
    (select 'Von K. Wright' str from dual union all
    select 'Kalvin J. Rowling' from dual union all
    select 'Debby J. Sanders' from dual)
    select str,regexp_replace(str, '(.*) (.*) (.*)', '\3 ') || substr(regexp_replace(str, '(.*) (.*) (.*)', '\1'),1,1) str from rt;
    Row#     STR     STR_1
    1     Von K. Wright     Wright V
    2     Kalvin J. Rowling     Rowling K
    3     Debby J. Sanders     Sanders D

  • English language text with Hebrew language text ?

    Hi,
    We are using Hebrew language SAP script. In that if we are using any english text mixed with hebrew then english words are disorder.
    Please let me know if there is any solution for this.
    Thanks
    Venkatesh P

    Hi,
    The english words are disorder because the text is printed from right to left in HE language.
    The system reads from left to right but it prints from right to left.
    And also you can find the hebrew text printed is in reverse. If possible, get images and use in script.
    Regards,
    Raju.

  • Grouping & Back-references with regular expressions on Replace Text window

    I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
    s/\(firstPart\) \(secondPart\) \(oldThirdPart\)/\2 \1 newThirdPart/g
    which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

    duplicate of Grouping & Back-references with regular expressions on Replace Text window

  • Regular expression to replace "emtpy space" ( ) bitween words with +

    Hallo!
    When I wish to find in code something like this:
    12144541 FirstWord SecondWord
    regular expression for that is:
    (\d{1,100})[\s-]\D{1,100}[\s-]\D{1,100}
    Now, please help me tu find regular expression to replace
    "emtpy space" ( ) bitween words with +
    12144541 FirstWord SecondWord to become
    12144541+FirstWord+SecondWord
    Thank you very, very, very much!

    A simple-minded solution is to use \s to match all
    whitespace; e.g. find \s and replace with +. DW CS3, at least, is
    smart enough to not replace end of line characters with the '+'
    character if you limit your search & replace to text.

  • Need a regular expression for the text field

    Hi ,
    I need a regular expression for a text filed.
    if the value is alphanumeric then min 3 char shud be there
    and if the value is numeric then no limit of chars in that field.[0-9].
    Any help is appriciated...
    thanks
    bharathi.

    Try the following in the change event:
    r=/^[a-z]{1,3}$|^\d+$/i;
    if (!r.test(xfa.event.newText))
    xfa.event.change="";
    Kyle

  • Regular expression for BBcode list to html list

    Hi,
    we are migrating BBforum to Jive forum.
    BBforums has data which contains BBcode Strings.i found the follwoing code after googled.
    public static String bbcode(String text) {
    String html = text;
    Map<String, String> bbMap = new HashMap<String, String>();
    bbMap.put("(\r\n|\r|\n|\n\r)", "<br/>");
    bbMap.put("\\[b\\](.+?)\\[b\\]", "<strong>$1</strong>");
    for (Map.Entry entry : bbMap.entrySet()) {
    html =
    html.replaceAll(entry.getKey().toString(), entry.getValue().toString());
    return html;
    i have BBcode with format like
    [list] [*]blue[*]red[*] green[list]
    i have to replace this by <ul><li>blue</li><li>red</li>
    Can any one sugeest me java regular expression which replace as above
    Edited by: 875452 on Jul 31, 2011 8:03 AM

    Moderator advice: Please read the announcement(s) at the top of the forum listings and the FAQ linked from every page. They are there for a purpose.
    Then edit your post and format the code correctly.
    Moderator action: Moved from Development Tools » General Questions
    db

  • What is the regular expression for the end of a story?

    Forgive me if this is wrong forum for asking this, but I'm trying to use the Find command using GREP and I need to know the regular expression for the end of a story. (Or, the last character of a story.) Thanks in advance.

    I'd try search for .\z (that's a dot in front) which ought to find the very last character in the story, and replace with $0 and your additional text.
    You know you can use a keyboard shortcut to move your cursor to the end of any story, right? Ctrl + End on Windows, Cmd + End, I think, on Mac. Unless you want to do this to every single story in the document, I would think you might be just as well off to put your text on the clipboard, put the cursor in the story and hit the key combo followed by Ctrl/Cmd + V to paste.

  • Regular Expression For Dreamweaver

    I still haven't had the time to really become a professional when it comes to regular expressions, and sadly I am in need of one an finding it difficult to wrap my head around.
    In a text file I have hundreds of instances like the following:
    {Click here to visit my website}{http://www.adobe.com/}
    I need a regular expression for Dreamweaver that I can run within the "Find and Replace" window to switch the order of the above elements to:
    {http://www.adobe.com/}{Click here to visit my website}
    Can anyone provide some guidance? I'm coming up short due to my lack of experience with regular expressions.
    Thank you in advance!

    So you have a string that starts { and goes until the first }.  Then you have another string exactly the same.  And you want to swap them.  I'm not making any assumption that the second one has to look like a URL (that's a whole other minefield, but perhaps you could do something simple like it must start with http). 
    You don't specify how your text file is divided up, have you got this as a complete line to itself, or is it just  a huge block of text?  Preferably as individual lines.
    I don't have Dreamweaver, but this worked for me in Notepad++
    Find: ^{(.*?)}{(.*?)}$
    Replace with: {\2}{\1}
    My file looked like this:
    {Click here to visit my website}{http://www.adobe.com/}
    {some other site}{http://www.example.com/foo}
    And doing a Replace All ended up like this:
    {http://www.adobe.com/}{Click here to visit my website}
    {http://www.example.com/foo}{some other site}

  • Regular Expression for Invalid Number

    Hi everyone,
    I am using oracle version as follows:
    SQL> select * from v$version;
    BANNER
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
    PL/SQL Release 10.2.0.4.0 - Production
    CORE    10.2.0.4.0      Production
    TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
    NLSRTL Version 10.2.0.4.0 - Production
    I am using regular expression to replace invalid values from a table.
    I received oracle error stating "ORA-01722 invalid number"
    My query looks like this:
    SELECT DISTINCT
    MRC_KEY,
    PURPOSE_CD,
    RESIDENCE_DESC,
    to_number(regexp_replace(ICAP_GEN_MADAPTIVE,'[+. ]?0?0?(\d+)[-.]?','\1')) as ICAP_GEN_MADAPTIVE,
    From
    MRRC_INT
    I am not sure what are the invalid values in the table so I can write regexp accordingly.
    Any guidance is highly appreciated!
    Thanks in advance
    J

    Or use DML error logging:
    create table t1
      (col1 number);
    exec dbms_errlog.create_error_log ('t1','t1_errors')
    insert into t1
    with t as
      (select '1' col from dual union all
       select '1.1' col from dual union all
       select '.11' col from dual union all
       select '0.11' col from dual union all
       select '-1' col from dual union all
       select '1,1' col from dual union all
       select '11a' col from dual union all
       select '1d' col from dual union all
       select '1e6' col from dual union all
       select '1e6.1' col from dual union all
       select '1e' col from dual
    select col
    from t
    log errors into t1_errors
    reject limit 20
    col col1 for 999,999,999.99
    select * from t1;
               COL1
               1.00
               1.10
                .11
                .11
              -1.00
       1,000,000.00
    col col1 for a30
    select * from t1_errors;
    ORA_ERR_NUMBER$ ORA_ERR_MESG$                  ORA_ERR_ROWID$       OR ORA_ERR_TAG$         COL1
               1722 ORA-01722: invalid number                           I                       1,1
               1722 ORA-01722: invalid number                           I                       11a
               1722 ORA-01722: invalid number                           I                       1d
               1722 ORA-01722: invalid number                           I                       1e6.1
               1722 ORA-01722: invalid number                           I                       1e

  • Regular expression for LOV?

    I have a list of strings in an LOV. I tried filtering it by typing in "^disk" in the search bar, which I hope will return a list of strings starting with "disk", but I failed.
    Any idea on how to use regular expression for LOVs? Thanks!

    HI Buffalo,
    i have a select list item in my page1 named :P1_EMPNAME with lov query value
    select ename as d, ename as r from emp WHERE EGEXP_LIKE(ename,:P1_SEARCH) or :P1_SEARCH IS NULL
    i have a Search text box in my page1 name :P1_SEARCH
    When i run the page, by default all the empnames will display in the lov list item
    i have given ^buffalo in the text seach item and clicked the submit button ,it shows the Employee buffalo in my list item lov.
    If you want all the entries that start with S, search for ^s
    End with R, use r$
    please try this link http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28424/adfns_regexp.htm
    Thanks
    Logaa

  • Request some help, over procedure's performance uses regular expressions for its functinality

    Hi All,
            Below is the procedure, having functionalities of populating two tables. For first table, its a simple insertion process but for second table, we need to break the soruce record as per business requirement and then insert into the table. [Have used regular expressions for that]
            Procedure works fine but it takes around 23 mins for processing 1mm of rows.
            Since this procedure would be used, parallely by different ETL processes, so append hint is not recommended.
            Is there any ways to improve its performance, or any suggestion if my approach is not optimized?  Thanks for all help in advance.
    CREATE OR REPLACE PROCEDURE SONARDBO.PRC_PROCESS_EXCEPTIONS_LOGS_TT
         P_PROCESS_ID       IN        NUMBER, 
         P_FEED_ID          IN        NUMBER,
         P_TABLE_NAME       IN        VARCHAR2,
         P_FEED_RECORD      IN        VARCHAR2,
         P_EXCEPTION_RECORD IN        VARCHAR2
        IS
        PRAGMA AUTONOMOUS_TRANSACTION;
        V_EXCEPTION_LOG_ID     EXCEPTION_LOG.EXCEPTION_LOG_ID%TYPE;
        BEGIN
        V_EXCEPTION_LOG_ID :=EXCEPTION_LOG_SEQ.NEXTVAL;
             INSERT INTO SONARDBO.EXCEPTION_LOG
                 EXCEPTION_LOG_ID, PROCESS_DATE, PROCESS_ID,EXCEPTION_CODE,FEED_ID,SP_NAME
                ,ATTRIBUTE_NAME,TABLE_NAME,EXCEPTION_RECORD
                ,DATA_STRUCTURE
                ,CREATED_BY,CREATED_TS
             VALUES           
             (   V_EXCEPTION_LOG_ID
                ,TRUNC(SYSDATE)
                ,P_PROCESS_ID
                ,'N/A'
                ,P_FEED_ID
                ,NULL 
                ,NULL
                ,P_TABLE_NAME
                ,P_FEED_RECORD
                ,NULL
                ,USER
                ,SYSDATE  
            INSERT INTO EXCEPTION_ATTR_LOG
                EXCEPTION_ATTR_ID,EXCEPTION_LOG_ID,EXCEPTION_CODE,ATTRIBUTE_NAME,SP_NAME,TABLE_NAME,CREATED_BY,CREATED_TS,ATTRIBUTE_VALUE
            SELECT
                EXCEPTION_ATTR_LOG_SEQ.NEXTVAL          EXCEPTION_ATTR_ID
                ,V_EXCEPTION_LOG_ID                     EXCEPTION_LOG_ID
                ,REGEXP_SUBSTR(str,'[^|]*',1,1)         EXCEPTION_CODE
                ,REGEXP_SUBSTR(str,'[^|]+',1,2)         ATTRIBUTE_NAME
                ,'N/A'                                  SP_NAME    
                ,p_table_name
                ,USER
                ,SYSDATE
                ,REGEXP_SUBSTR(str,'[^|]+',1,3)         ATTRIBUTE_VALUE
            FROM
            SELECT
                 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
            FROM
                DUAL t1 CROSS JOIN
                        TABLE
                            CAST
                                MULTISET
                                    SELECT LEVEL
                                    FROM DUAL
                                    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
                                AS SYS.odciNumberList
                        ) t2
            WHERE REGEXP_SUBSTR(str,'[^|]*',1,1) IS NOT NULL
            COMMIT;
           EXCEPTION
             WHEN OTHERS THEN
             ROLLBACK;
             RAISE;
        END;
    Many Thanks,
    Arpit

    Regex's are known to be CPU intensive specially when dealing with large number of rows.
    If you have to reduce the processing time, you need to tune the Select statements.
    One suggested change could be to change the following query
    SELECT
                 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
            FROM
                DUAL t1 CROSS JOIN
                        TABLE
                            CAST
                                MULTISET
                                    SELECT LEVEL
                                    FROM DUAL
                                    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
                                AS SYS.odciNumberList
                        ) t2
    to
    SELECT REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,level) str
    FROM DUAL
    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
    Before looking for any performance benefit, you need to ensure that this does not change your output.
    How many substrings are you expecting in the P_EXCEPTION_RECORD? If less than 5, it will be better to opt for SUBSTR and INSTR combination as it might work well with the number of records you are working with. Only trouble is, you will have to write different SUBSTR and INSTR statements for each column to be fetched.
    How are you calling this procedure? Is it not possible to work with Collections? Delimited strings are not a very good option as it requires splitting of the data every time you need to refer to.

  • Using Regular Expressions to replace Quotes in Strings

    I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
    String temp = "\"Hello\" i am a \"variable\"";
    temp = temp.replaceAll("\"","\\\\\"");
    however, this does not work and when i print out the code to the file the resulting code appears as:
    String someVar = ""Hello" i am a "variable"";
    and not as:
    String someVar = "\"Hello\" i am a \"variable\"";
    I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
    Thanks in advance.

    Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

Maybe you are looking for

  • Error during PGR in return sales order

    HI all, I m encountering an error as shown in pic during Post goods receipt of returns delivery by vl02n. message details Message Text Error in creating post goods issue Technical Data Message type__________ E (Error) Message class_________ M7 (Inven

  • Viewing Photo

    How can we see photo in browser with TITLE underneath instead of the file name.

  • Flash ruins photos by making them white

    Hi ! I've got the 4s - everytime I take a photo using the flash the photo turns out nearly all white and it´s impossible to see what´s on it ... Anyone had the same problem ? Is it a setting issue ? Do I need to do anything in particular when taking

  • Can I install Mac OS X Lion on an external FireWire800 drive?

    Can I install Mac OS X Lion on an external FireWire drive? I have bought a Lion Thumb Drive. I'd like to install Lion on the FW drive and then boot from there. The FW drive is a brand-new LaCie D2 Quadra 2 TB drive that has FW 800, 400, USB, and esAT

  • Reading e-mails in Mail app on MacBook Pro

    How can I move to the next mail in my Mailbox without having to go back to my Inbox? I'm looking for an arrow to go to the next e-mail when I'm in a mail but can't find it. Thank you.