Regular expressions for replacing text with sms language text

Hi, I'm trying to write a function which converts normal, correctly spelled text into the shorter sms language format but struggling to come up with the regular expressions i need to do so, can anyone help?
1: remove surplus white space at the beginning of a sentence and at the end of a sentence.
e.g. " hello." --> "hello." OR "hello ." --> "hello."
2: remove preceeding and/or proceeding space if there's a word then a number possibly followed by another word
e.g. "come 2 me" --> "come2me" OR "dnt 4get" --> "dnt4get"
3: remove "aeiou" if word starts and ends with "!aeiou"
e.g. "text" --> "txt"

You can make the whitespace on either side optional: text = text.replaceAll("\\s*(\\d)\\s*", "$1");1. Use String's trim() method.
3. This one has to be done in two steps: import java.util.regex.*;
public class Test
public static void main(String... args) throws Exception
    String text = "The quick brown fox jumps over the lazy dog.";
    System.out.println(devowelize(text));
public static String devowelize(String str)
    Pattern p = Pattern.compile(
      "[a-z&&[^aeiou]]++(?:[aeiou]++[a-z&&[^aeiou]]++)+",
      Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(str);
    StringBuffer sb = new StringBuffer();
    while (m.find())
      m.appendReplacement(sb, m.group().replaceAll("[aeiou]+", ""));
    m.appendTail(sb);
    return sb.toString();
}

Similar Messages

Regular Expressions for converting HTML to Structured Plain Text

I'm writing a PL/SQL function that will convert HTML to plain text, but still preserve some of the formatting/line breaks. One of my challenges is in writing a regular expression to capture the text blocks while ignoring the markup. I'm trying to write an expression that will grab all of the text between start/end tags, but discard the tags. For example, to find all of the text between a start/end paragraph, I want to do something like:
REGEXP_REPLACE('This is the body of the paragraph', '<p.*>(.*)', '\1||v_crlf' )
where \1 returns the contents of the paragraph and v_crlf (declared earlier in the function) inserts a line break. I know there are more general expressions that will remove all tags, but I want to specifically identify the tags so I can process them appropriately. This way I can easily convert HTML to plain text for email and reporting without having to keep two versions around. Any help would be greatly appreciated. Once I get this worked out, I will repost with the function code for others to use. Thanks.
Edited by: jritschel on Oct 26, 2010 9:58 AM

Here's a function I wrote for an app. I'm not making in promises on it's accuracy as the app was just a proof of concept and never made it to production.
function strip_html( p_clob in clob )
return clob
is
 l_out clob;
 l_test number := 0;
 l_max_loops constant number := 20;
 i pls_integer := 0;
begin
 l_out := regexp_replace(p_clob,' | ',chr(13)||chr(10),1,0,'imn');
 l_out := regexp_replace(l_out,'',chr(13)||chr(10),1,0,'imn');
 l_out := replace(l_out,'<li>',chr(13)||chr(10)||'*<li>');
 l_out := regexp_replace(l_out,'(.+?)','*\1*',1,0,'imn');
 l_out := regexp_replace(l_out,'(.+?)','_\1_',1,0,'imn');
 loop
 l_test := regexp_instr(l_out,'<([A-Z][A-Z0-9]*)[^>]*>.*?</\1>',1,1,0,'imn');
 exit when l_test = 0 or i > l_max_loops;
 l_out := regexp_replace(l_out,'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>','\2',1,0,'imn');
 i := i + 1;
 end loop;
 return l_out;
end strip_html;{code}
The loop is there to handle nested HTML.
Tyler Muth
http://tylermuth.wordpress.com
"Applied Oracle Security: Developing Secure Database and Middleware Environments": http://sn.im/aos.book
Edited by: Tyler on Oct 26, 2010 10:03 AM

Regular Expression for replacing Leading Spaces

I don't claim to be any expert on Regular Expressions and even after reading CD's introduction to Reg Exp, I still can't figure out this one which I'm sure must be very basic.
I want to replace all the leading spaces in a string with "." chrs. I could do this using the common replace/substr/instr functions, but I reckoned it would be possible in a single regular regexp_replace call.
So far I've got this...
SQL> select regexp_replace('      FRED BLOGS    WAS HERE    ', '^([:space:])*', '.')
2 as result
3 from dual;
RESULT
.      FRED BLOGS    WAS HERE
SQL>Which is replacing the start of line with a "." and not the spaces.
But I want my result to be:-
RESULT
......FRED BLOGS    WAS HERE
SQL>Cheers

That was very good solution .
Can you explain me the significance of "| " in the code, other things I could trace out.
I try to run the code with the 2 cases
when I give a space after | symbol it prints the * many times
SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)|','*\1')
2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
COL1
REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
FRED BLOGS WAS HERE
*FRED BLOGS    WAS HERE
SQL> SELECT col1, REGEXP_REPLACE(col1, ' ([^ ]+.*)| ','*\1')
2 FROM (SELECT ' FRED BLOGS WAS HERE ' col1 FROM dual );
COL1
REGEXP_REPLACE(COL1,'([^]+.*)|','*\1')
FRED BLOGS WAS HERE
******FRED BLOGS WAS HERE

Regular expression for replace

I have the following sample values in a table column ( first name, middle initial, last name)
Assemblymember Von K. Wright
Assemblyman Kalvin J. Rowling
Assemblywoman Debby J. Sanders
How can I write a regular expression where I can extract the above values as follows
Wright V
Rowling K
Sander D
Thanks

possibility of some other ways too...
with rt as
(select 'Von K. Wright' str from dual union all
select 'Kalvin J. Rowling' from dual union all
select 'Debby J. Sanders' from dual)
select str,regexp_replace(str, '(.*) (.*) (.*)', '\3 \1') str from rt;
Row#     STR     STR_1
1     Von K. Wright     Wright Von
2     Kalvin J. Rowling     Rowling Kalvin
3     Debby J. Sanders     Sanders Debby
with rt as
(select 'Von K. Wright' str from dual union all
select 'Kalvin J. Rowling' from dual union all
select 'Debby J. Sanders' from dual)
select str,regexp_replace(str, '(.*) (.*) (.*)', '\3 ') || substr(regexp_replace(str, '(.*) (.*) (.*)', '\1'),1,1) str from rt;
Row#     STR     STR_1
1     Von K. Wright     Wright V
2     Kalvin J. Rowling     Rowling K
3     Debby J. Sanders     Sanders D

English language text with Hebrew language text ?

Hi,
We are using Hebrew language SAP script. In that if we are using any english text mixed with hebrew then english words are disorder.
Please let me know if there is any solution for this.
Thanks
Venkatesh P

Hi,
The english words are disorder because the text is printed from right to left in HE language.
The system reads from left to right but it prints from right to left.
And also you can find the hebrew text printed is in reverse. If possible, get images and use in script.
Regards,
Raju.

Grouping & Back-references with regular expressions on Replace Text window

I really appreciate the inclusion of the Regular Expressions in the search & replace feature. One thing I am missing is back-references in the replacement expression. For instance, in the unix tools vi or sed, I might do something like this:
s/$firstPart$ $secondPart$ $oldThirdPart$/\2 \1 newThirdPart/g
which would allow me to switch the places of firstPart and secondPart, and totally replace thirdPart. If grouping and back-references are already present in the Replace Text window, how does one correctly invoke them?

duplicate of Grouping & Back-references with regular expressions on Replace Text window

Regular expression to replace "emtpy space" ( ) bitween words with +

Hallo!
When I wish to find in code something like this:
12144541 FirstWord SecondWord
regular expression for that is:
(\d{1,100})[\s-]\D{1,100}[\s-]\D{1,100}
Now, please help me tu find regular expression to replace
"emtpy space" ( ) bitween words with +
12144541 FirstWord SecondWord to become
12144541+FirstWord+SecondWord
Thank you very, very, very much!

A simple-minded solution is to use \s to match all
whitespace; e.g. find \s and replace with +. DW CS3, at least, is
smart enough to not replace end of line characters with the '+'
character if you limit your search & replace to text.

Need a regular expression for the text field

Hi ,
I need a regular expression for a text filed.
if the value is alphanumeric then min 3 char shud be there
and if the value is numeric then no limit of chars in that field.[0-9].
Any help is appriciated...
thanks
bharathi.

Try the following in the change event:
r=/^[a-z]{1,3}$|^\d+$/i;
if (!r.test(xfa.event.newText))
xfa.event.change="";
Kyle

Regular expression for BBcode list to html list

Hi,
we are migrating BBforum to Jive forum.
BBforums has data which contains BBcode Strings.i found the follwoing code after googled.
public static String bbcode(String text) {
String html = text;
Map<String, String> bbMap = new HashMap<String, String>();
bbMap.put("(\r\n|\r|\n|\n\r)", " ");
bbMap.put("\\[b\\](.+?)\\[b\\]", "$1");
for (Map.Entry entry : bbMap.entrySet()) {
html =
html.replaceAll(entry.getKey().toString(), entry.getValue().toString());
return html;
i have BBcode with format like
[list] [*]blue[*]red[*] green[list]
i have to replace this by <ul><li>blue</li><li>red</li>
Can any one sugeest me java regular expression which replace as above
Edited by: 875452 on Jul 31, 2011 8:03 AM

Moderator advice: Please read the announcement(s) at the top of the forum listings and the FAQ linked from every page. They are there for a purpose.
Then edit your post and format the code correctly.
Moderator action: Moved from Development Tools » General Questions
db

What is the regular expression for the end of a story?

Forgive me if this is wrong forum for asking this, but I'm trying to use the Find command using GREP and I need to know the regular expression for the end of a story. (Or, the last character of a story.) Thanks in advance.

I'd try search for .\z (that's a dot in front) which ought to find the very last character in the story, and replace with $0 and your additional text.
You know you can use a keyboard shortcut to move your cursor to the end of any story, right? Ctrl + End on Windows, Cmd + End, I think, on Mac. Unless you want to do this to every single story in the document, I would think you might be just as well off to put your text on the clipboard, put the cursor in the story and hit the key combo followed by Ctrl/Cmd + V to paste.

Regular Expression For Dreamweaver

I still haven't had the time to really become a professional when it comes to regular expressions, and sadly I am in need of one an finding it difficult to wrap my head around.
In a text file I have hundreds of instances like the following:
{Click here to visit my website}{http://www.adobe.com/}
I need a regular expression for Dreamweaver that I can run within the "Find and Replace" window to switch the order of the above elements to:
{http://www.adobe.com/}{Click here to visit my website}
Can anyone provide some guidance? I'm coming up short due to my lack of experience with regular expressions.
Thank you in advance!

So you have a string that starts { and goes until the first }. Then you have another string exactly the same. And you want to swap them. I'm not making any assumption that the second one has to look like a URL (that's a whole other minefield, but perhaps you could do something simple like it must start with http).
You don't specify how your text file is divided up, have you got this as a complete line to itself, or is it just a huge block of text? Preferably as individual lines.
I don't have Dreamweaver, but this worked for me in Notepad++
Find: ^{(.*?)}{(.*?)}$
Replace with: {\2}{\1}
My file looked like this:
{Click here to visit my website}{http://www.adobe.com/}
{some other site}{http://www.example.com/foo}
And doing a Replace All ended up like this:
{http://www.adobe.com/}{Click here to visit my website}
{http://www.example.com/foo}{some other site}

Regular Expression for Invalid Number

Hi everyone,
I am using oracle version as follows:
SQL> select * from v$version;
BANNER
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
I am using regular expression to replace invalid values from a table.
I received oracle error stating "ORA-01722 invalid number"
My query looks like this:
SELECT DISTINCT
MRC_KEY,
PURPOSE_CD,
RESIDENCE_DESC,
to_number(regexp_replace(ICAP_GEN_MADAPTIVE,'[+. ]?0?0?(\d+)[-.]?','\1')) as ICAP_GEN_MADAPTIVE,
From
MRRC_INT
I am not sure what are the invalid values in the table so I can write regexp accordingly.
Any guidance is highly appreciated!
Thanks in advance
J

Or use DML error logging:
create table t1
(col1 number);
exec dbms_errlog.create_error_log ('t1','t1_errors')
insert into t1
with t as
(select '1' col from dual union all
   select '1.1' col from dual union all
   select '.11' col from dual union all
   select '0.11' col from dual union all
   select '-1' col from dual union all
   select '1,1' col from dual union all
   select '11a' col from dual union all
   select '1d' col from dual union all
   select '1e6' col from dual union all
   select '1e6.1' col from dual union all
   select '1e' col from dual
select col
from t
log errors into t1_errors
reject limit 20
col col1 for 999,999,999.99
select * from t1;
           COL1
           1.00
           1.10
            .11
            .11
          -1.00
   1,000,000.00
col col1 for a30
select * from t1_errors;
ORA_ERR_NUMBER$ ORA_ERR_MESG$                  ORA_ERR_ROWID$       OR ORA_ERR_TAG$         COL1
           1722 ORA-01722: invalid number                           I                       1,1
           1722 ORA-01722: invalid number                           I                       11a
           1722 ORA-01722: invalid number                           I                       1d
           1722 ORA-01722: invalid number                           I                       1e6.1
           1722 ORA-01722: invalid number                           I                       1e

Regular expression for LOV?

I have a list of strings in an LOV. I tried filtering it by typing in "^disk" in the search bar, which I hope will return a list of strings starting with "disk", but I failed.
Any idea on how to use regular expression for LOVs? Thanks!

HI Buffalo,
i have a select list item in my page1 named :P1_EMPNAME with lov query value
select ename as d, ename as r from emp WHERE EGEXP_LIKE(ename,:P1_SEARCH) or :P1_SEARCH IS NULL
i have a Search text box in my page1 name :P1_SEARCH
When i run the page, by default all the empnames will display in the lov list item
i have given ^buffalo in the text seach item and clicked the submit button ,it shows the Employee buffalo in my list item lov.
If you want all the entries that start with S, search for ^s
End with R, use r$
please try this link http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28424/adfns_regexp.htm
Thanks
Logaa

Request some help, over procedure's performance uses regular expressions for its functinality

Hi All,
 Below is the procedure, having functionalities of populating two tables. For first table, its a simple insertion process but for second table, we need to break the soruce record as per business requirement and then insert into the table. [Have used regular expressions for that]
 Procedure works fine but it takes around 23 mins for processing 1mm of rows.
 Since this procedure would be used, parallely by different ETL processes, so append hint is not recommended.
 Is there any ways to improve its performance, or any suggestion if my approach is not optimized? Thanks for all help in advance.
CREATE OR REPLACE PROCEDURE SONARDBO.PRC_PROCESS_EXCEPTIONS_LOGS_TT
 P_PROCESS_ID IN NUMBER,
 P_FEED_ID IN NUMBER,
 P_TABLE_NAME IN VARCHAR2,
 P_FEED_RECORD IN VARCHAR2,
 P_EXCEPTION_RECORD IN VARCHAR2
 IS
 PRAGMA AUTONOMOUS_TRANSACTION;
 V_EXCEPTION_LOG_ID EXCEPTION_LOG.EXCEPTION_LOG_ID%TYPE;
 BEGIN
 V_EXCEPTION_LOG_ID :=EXCEPTION_LOG_SEQ.NEXTVAL;
 INSERT INTO SONARDBO.EXCEPTION_LOG
 EXCEPTION_LOG_ID, PROCESS_DATE, PROCESS_ID,EXCEPTION_CODE,FEED_ID,SP_NAME
 ,ATTRIBUTE_NAME,TABLE_NAME,EXCEPTION_RECORD
 ,DATA_STRUCTURE
 ,CREATED_BY,CREATED_TS
 VALUES
 ( V_EXCEPTION_LOG_ID
 ,TRUNC(SYSDATE)
 ,P_PROCESS_ID
 ,'N/A'
 ,P_FEED_ID
 ,NULL
 ,NULL
 ,P_TABLE_NAME
 ,P_FEED_RECORD
 ,NULL
 ,USER
 ,SYSDATE
 INSERT INTO EXCEPTION_ATTR_LOG
 EXCEPTION_ATTR_ID,EXCEPTION_LOG_ID,EXCEPTION_CODE,ATTRIBUTE_NAME,SP_NAME,TABLE_NAME,CREATED_BY,CREATED_TS,ATTRIBUTE_VALUE
 SELECT
 EXCEPTION_ATTR_LOG_SEQ.NEXTVAL EXCEPTION_ATTR_ID
 ,V_EXCEPTION_LOG_ID EXCEPTION_LOG_ID
 ,REGEXP_SUBSTR(str,'[^|]*',1,1) EXCEPTION_CODE
 ,REGEXP_SUBSTR(str,'[^|]+',1,2) ATTRIBUTE_NAME
 ,'N/A' SP_NAME
 ,p_table_name
 ,USER
 ,SYSDATE
 ,REGEXP_SUBSTR(str,'[^|]+',1,3) ATTRIBUTE_VALUE
 FROM
 SELECT
 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
 FROM
 DUAL t1 CROSS JOIN
 TABLE
 CAST
 MULTISET
 SELECT LEVEL
 FROM DUAL
 CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
 AS SYS.odciNumberList
 ) t2
 WHERE REGEXP_SUBSTR(str,'[^|]*',1,1) IS NOT NULL
 COMMIT;
 EXCEPTION
 WHEN OTHERS THEN
 ROLLBACK;
 RAISE;
 END;
Many Thanks,
Arpit

Regex's are known to be CPU intensive specially when dealing with large number of rows.
If you have to reduce the processing time, you need to tune the Select statements.
One suggested change could be to change the following query
SELECT
 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
 FROM
 DUAL t1 CROSS JOIN
 TABLE
 CAST
 MULTISET
 SELECT LEVEL
 FROM DUAL
 CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
 AS SYS.odciNumberList
 ) t2
to
SELECT REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,level) str
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
Before looking for any performance benefit, you need to ensure that this does not change your output.
How many substrings are you expecting in the P_EXCEPTION_RECORD? If less than 5, it will be better to opt for SUBSTR and INSTR combination as it might work well with the number of records you are working with. Only trouble is, you will have to write different SUBSTR and INSTR statements for each column to be fetched.
How are you calling this procedure? Is it not possible to work with Collections? Delimited strings are not a very good option as it requires splitting of the data every time you need to refer to.

Using Regular Expressions to replace Quotes in Strings

I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
String temp = "\"Hello\" i am a \"variable\"";
temp = temp.replaceAll("\"","\\\\\"");
however, this does not work and when i print out the code to the file the resulting code appears as:
String someVar = ""Hello" i am a "variable"";
and not as:
String someVar = "\"Hello\" i am a \"variable\"";
I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
Thanks in advance.

Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

Regular expressions for replacing text with sms language text

Similar Messages

Maybe you are looking for