Spliting a large string using regular expression which contain special char

I have huge sting(xml) containing normal character a-z,A-Z and 0-9 as well as special char( <,>,?,&,',",;,/ etc.)
I need to split this sting where it ends with </document>
for e.g.
Original String:
<document>
<item>sdf</item>
<item><text>sd</text</item>
</document>
<document>hi</document>
The above sting has to be splited in to two parts since it is having two document tag.
Can any body help me to resolve this issue. I can use StringTokenizer,String split method or Regular expression api too.

manas589 wrote:
I used DOM and sax parser and got few exception. Again i don't have right to change xml. so i thought to go with RegularExpression or some other way where i can do my job.If the file actually comes in lines like what you posted, you should just be able to compare the contents of each line to see if it contains "</document>" or whatever you're looking for. I wouldn't use regex unless I needed another problem.
I got excpetion like: Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
     at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
     at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
     at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)So then it isn't even XML.
Edit: sorry, I just realized why you're considering all of these heavy-duty ideas. It's just that you don't know how to break the string into lines. You do it like this:
BufferedReader br = new BufferedReader(new StringReader(theNotXMLString));

Similar Messages

Format string using Regular Expression

Input string output format...
SELECT q'<select ab_c "ABC", efg "EFG" from dual>' str FROM DUAL
Output:
STR
select ab_c "ABC", efg "EFG" from dual
Required output format using regular expression...
STR
select 'ab_c' "ABC", 'efg' "EFG" from dual

Regular expressions have many limitations as parsing tools, and you didn't specify the rules you wanted. This expression puts quotes around the non blank string before a quoted string:
SELECT regexp_replace(q'<select ab_c "ABC", efg "EFG" from dual>',
                      '([^" ]+)( +"[^ ]*")' , '''\1''\2' ) str FROM DUAL;
STR
select 'ab_c' "ABC", 'efg' "EFG" from dual
{code}
It is not robust - a missing " will confuse it, and you should be using bind variables anyway.

Getting non numeric strings using regular expression

Hi Guys ,
I want to get list of string values in table which contains no numeric values .....
I have a string column name A and table name B .
I have written following code , but it seems it is incorrect .
Plz help me out .....
SELECT
A FROM
B
WHERE
regexp_like(A, '([^[:digit:]])'
Thanks in advance ....

96097f0e-f165-463a-a0a2-3d15214c8a3d wrote:
Hi Guys ,
I want to get list of string values in table which contains no numeric values .....
I have a string column name A and table name B .
I have written following code , but it seems it is incorrect .
Plz help me out .....
SELECT
A FROM
B
WHERE
regexp_like(A, '([^[:digit:]])'
Thanks in advance ....
That will give you every one that has at least one non-numeric character, if you want ones which contain no numeric characters then it should be
regexp_like(A,'^[^0-9]*$')

Filter String using Regular Expression

Hello,
I have an application that monitors serial communication between a PC and device. The message protocol is a byte stream that I convert to a string to parse into pretty messages. The start of the string is always "10 02", but if the string is preceded with another "10" like this "10 10 02" it is part of a message. I've been trying to use a regular expression with the Search and Replace VI. My regex is "[^10]\s10\s02" which almost works but it cuts off part of the message:
Before:
10 03 10 02
After:
10 0 <= missing the "3"
10 02
Here's what I'm doing:
Any ideas on what I'm missing? I've attached a simple example.
Thanks
Message Edited by Derek Price on 02-14-2008 08:37 PM
Attachments:
Filter Beginning Message1.vi ‏14 KB
FilterMessageRegex1.png ‏7 KB

Try this approach.
Do search and replace on '10\s02' and replace with '\r\n10\s20'
Then do another search and replace on '10\r\n\10\s20' with '10\s10\s20'
See attached.
Randall Pursley
Attachments:
Message Filter.PNG ‏18 KB

How to Capture Multiple Line String using Regular Expression?

Hi,
I have a simple program like this:
What I want to accomplish is to capture everything between >>start and >>end using a single Match Regular Expression node. It seems that setting multiple? to True or False does not help.
I am using LabVIEW 2012.
If it is impossible to capture it using a single node, that is fine. But I want to make sure that I can make full use of this node without combining serveral others.
Thank you!
TailOfGon
Certified LabVIEW Architect 2013
Solved!
Go to Solution.

Thank you for the fast response! Your solution worked in the example case
After I saw your post, I was finally able to step forward. But I still wanted to make use of dot notation due to the limitation of characters that match with \w.
I made some more modification to your regular expression then now it seems working for all characters:
>>start((?:\s|.)*)>>end
Thanks!
TailOfGon
Certified LabVIEW Architect 2013

Dumbfounded by Scanner processing String using regular expression

I was reading Bruce Eckel's book when I came across something interesting: extending Scanner with regular expressions. Unfortunately, I was confronted with an issue that doesn't make much sense to me: if the String that I am scanning contains a hyphen, the Scanner doesn't produce anything. As soon as I take it out, it all works like a charm. Here is my example:
import java.util.Scanner;
import java.util.regex.*;
public class StringScan {
public static void main (String [] args){
     String input = "there's one caveat when scanning with regular expressions";
     Scanner scanner = new Scanner (input);
     String pattern = "[a-z]\\w+";
     while (scanner.hasNext(pattern)){
          scanner.next(pattern);
          MatchResult match = scanner.match();
          String output = match.group();
          System.out.println(output);
}What could be the reason? I imagined it could be because the hyphen for some reason gets given a special meaning but when I tried escaping it, it still didn't work.

Thanks for your prompt reply.
I have figured out what was wrong with my code, by the way. Since a single quote is not a word character, it does not match w+. And as the very first input token does not match, the scanner stops immediately. I rewrote my regex to "[a-z].*" and now it does work.

Replace a string using regular expression from powershell

I want to replace the following:
'browserName': 'firefox'
with :
'browserName': 'chrome'
then I tried this:
(get-content $conffile) -replace "^('browserName': ')\S+","browserName': 'chrome' |set-content $conffile
But nothing happened.
Could someboby tell me how to write the regular expression here? Thanks a lot.

Second person today with the same question.
get-content $conffile |%{$_ -replace "'browserName':\s+'firefox'","'browserName': 'chrome'" | set-content $conffile
\_(ツ)_/

Filter Strings using regular expressions

Requirements.
1.I have a table with different names.
2.I input a word(string) through a text box.
3.I filter table using the input string through text box using the code
((DefaultRowSorter)table_customer.getRowSorter()).setRowFilter(RowFilter.regexFilter(regex, indices));
4.regex is obtained as follows.
String regex = "";
String text = txtFilterText.getText();
regex = "^(?i)"text".*"; //for starts with filter
regex = "." + text + ".";//for contains filter
regex = "(?i)["text".*]";//for doesnt start with filter
regex =".*(?i)"text"$";//for end with filter
I need help for doesnt contain and doesnt end with filters.Plz help me out..
Anees

h2. {color:ff0000}Double post{color}
Reply here: http://forum.java.sun.com/thread.jspa?threadID=5231406

Using Regular Expressions to replace Quotes in Strings

I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
String temp = "\"Hello\" i am a \"variable\"";
temp = temp.replaceAll("\"","\\\\\"");
however, this does not work and when i print out the code to the file the resulting code appears as:
String someVar = ""Hello" i am a "variable"";
and not as:
String someVar = "\"Hello\" i am a \"variable\"";
I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
Thanks in advance.

Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

String extract using regular expression

Hi
I have text like this "<a>45</a><ct>Hi</ct><R>45 85</R><H>Here</H>" .I want to extract using regular expression or any techniques the text between <R> and </R> also need to replace the space with pipe between 45 and 85 like "45|85"
Edited by: vishnu prakash on Mar 2, 2012 4:42 AM

Hi,
Here's one way:
REPLACE ( REGEXP_REPLACE ( txt
                , '.*<R>(.*)</R>.*'
                , '\1'
     , '|'
     )This assumes there is only one <R> tag in txt.
Always say which version of Oracle you're using. The expression above will work in Oralce 10 and up, but starting in Oracle 11 you can use REGEXP_SUBSTR rather than the less intuitive REGEXP_REPLACE.
Edited by: Frank Kulash on Mar 2, 2012 7:48 AM

Changeparticular characters in a string by using regular expressions ...

Hello Everyone,
I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
Is it possible by using regular expressions in oracle ?
Can you give me some clues ?
Thank you

SSU wrote:
Hello Everyone,
I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
Is it possible by using regular expressions in oracle ?
Can you give me some clues ?
Thank you
SQL> SELECT
2 regexp_replace('sdfsdf Sdfdfs Sdfd','s|S','X') from dual;
REGEXP_REPLACE('SD
XdfXdf XdfdfX XdfdRegards,
Achyut

How to define a regular expression using regular expressions

Hi,
I am looking for some regular expression pattern which will identify a regular expression.
Also, is it possible to know how does the compile method of Pattern class in java.util.regex package work when it is given a String containing a regex. ie. is there any mechanism to validate regular expression using regular expression pattern.
Regards,
Abhisek

I am looking for some regular expression pattern which will identify a regular
expression. Also, is it possible to know how does the compile method of
Pattern class in java.util.regex package work when it is given a String
containing a regex. ie. is there any mechanism to validate regular
expression using regular expression pattern.It is impossble to recognize an (in)valid regular expression string using a
regular expression. Google for 'pumping lemma' for a formal proof.
kind regards,
Jos

Using Regular Expressions for Completion

I'm trying to build a text completer for a simple little editor. The general idea is that I have a regular expression which describes the syntax of an expression and a set of strings which are all semantically valid cases of the expression (the latter of which is not particularly important to my problem). I would like to be able to determine, using the expression described, whether or not a section of text is capable of beginning a syntactically valid expression, not matching it.
For example, given the expression
"#[A-Za-z0-9]#" the string "#name#" is syntactically valid, whereas the string "#_blarg" is not. What I would like to do is be able to determine that "#partial" has the potential to match the pattern with more input, even if it doesn't yet. Specifically, the eventual use will be in such a case as the string X=#partial+3. If the cursor is positioned before the "+" and my user presses the completion keystroke, I want to recognize that "#partial" is what I need to recognize. Also, positioning the cursor immediately after the "=" and pressing the keystroke will do nothing, since nothing before the "=" is capable of matching the pattern properly.
Is this possible? I don't have to use this exact approach, but it is important that I be able to use the regular expression in detecting a partially completed expression. If I can, the set of regular expressions which already exist in the code can be used to drive the auto completer. Otherwise, I'll have to write a special recognition module for each case; that wouldn't be pretty.
Thanks for your time! I'll provide other information upon request, if it'd help. :)

Thank you both for discussing this; it has definitely helped me in reaching a better understanding of uncle_alice's answer to my problem. I've adjusted my code to use this approach and, for the most part, it seems to work.
I say "for the most part" because I am compiling Patterns with the case insensitivity flag. This appears to do horrible, horrible things. Take a look at the following code, modified from uncle_alice's example:
String[] str = {"#test#hello", "#tes", "blargblarg", "", "#test#", "S"};
String rgx = "#[A-Za-z0-9]+#";
Pattern pc = Pattern.compile(rgx);
Pattern pi = Pattern.compile(rgx, Pattern.CASE_INSENSITIVE);
for (String s : str)
    System.out.println("    For string: "+s);
    for (Pattern p : new Pattern[]{pc, pi}) // once for each pattern
        Matcher m = p.matcher(s);
        if (m.matches())
            System.out.printf("Matched '%s'", m.group());
        } else
            System.out.print("No match");
        System.out.println("; hitEnd = " + m.hitEnd());
}That produces the following output:
    For string: #test#hello
No match; hitEnd = false
No match; hitEnd = true
    For string: #tes
No match; hitEnd = true
No match; hitEnd = true
    For string: blargblarg
No match; hitEnd = false
No match; hitEnd = true
    For string:
No match; hitEnd = true
No match; hitEnd = true
    For string: #test#
Matched '#test#'; hitEnd = false
Matched '#test#'; hitEnd = false
    For string: S
No match; hitEnd = false
No match; hitEnd = trueIt would seem that, with the case-insensitive flag set, hitEnd always returns true unless a match is found. Why is this? I find it quite confusing.
I can adjust my design to accomodate if this problem cannot be circumvented; however, I'd like to understand what has going wrong here. :)
Cheers! Thanks so much for all your help!

Finding URLs using regular expression.

I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
"Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
My acutal problem is to find the URL irrespective its position within the string.
Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
Matcher matcher = urlPattern.matcher(plainText);
Map stringIndexMap = new HashMap();
//Searching the input string for urlPattern...
while(matcher.find()) {
String urlString = matcher.group();
//Storing the urls in a hashmap with their indices as keys....
stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
Set keySet = stringIndexMap.keySet();
Iterator it = keySet.iterator();
//Iterating over the hashmap containing urls...
while(it.hasNext()) {
String urlString = (String) stringIndexMap.get(it.next());
* Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
* using String index
clickableURLString.replace(clickableURLString.indexOf(urlString),
clickableURLString.indexOf(urlString) + urlString.length(),
"<a href=\"#\" onclick=\"window.open('" + urlString
+ "')\">" + urlString + "</a>");
return clickableURLString.toString();

The end of the input is '$' as a regex.
import java.util.regex.*;
public class Prasanna{
public static void main(String[] args){
    String text
= "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
//    String regex = "(http|https)://.+?(?:\\s|$)"; // this works
    String regex = "(http|https)://[^ ]+";          // this also works
    Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    Matcher mat = pat.matcher(text);
    while (mat.find()){
      System.out.println(mat.group());
}

Pattern matching using Regular expression

Hi,
I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
I used the condition
AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
First of all, is this possible to match multiple patterns in one condition?
If yes, please suggest.
Thanks

user2544469 wrote:
Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
- query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
WITH     got_token_cnt     AS
     SELECT     cat
     ,     book                                        -- Added March 22
     ,     REPLACE (codes, '+', '*') AS codes                    -- If desired. Changed March 22
     ,     LENGTH (codes) - LENGTH ( TRANSLATE ( codes
                                                   , 'x*+-'
                                  , 'x'
                         ) AS token_cnt
     FROM    test_cat
,     cntr     AS
     SELECT     LEVEL     AS n
     FROM     ( SELECT MAX (token_cnt)     AS max_token_cnt
             FROM        got_token_cnt
     CONNECT BY     LEVEL     <= max_token_cnt
,     got_tokens     AS
     SELECT     t.cat
     ,     t.book                                        -- Added March 22
     ,     REGEXP_SUBSTR ( t.codes
                     , '[*+-]'
                     , 1
                     , c.n
                     )          AS token_type
     ,     SUBSTR ( REGEXP_SUBSTR ( t.codes
                                   , '[*+-][^*+-]*'
                           , 1
                           , c.n
               , 2
               )          AS token
     FROM     got_token_cnt     t
     JOIN     cntr          c ON     c.n     <= t.token_cnt
,     got_must_have_cnt     AS
     SELECT       cat, book                                   -- Changed March 22
     ,       COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
     FROM       got_tokens
     GROUP BY cat, book                                   -- Changed March 22
SELECT       mh.cat
,       vn.vn_no
FROM       got_must_have_cnt     mh
JOIN                    vn ON mh.book     = vn.book               -- Changed March 22
LEFT OUTER JOIN      got_tokens     gt ON     mh.cat                  = gt.cat
                                 AND INSTR (vn.codes, gt.token) > 1
GROUP BY mh.cat
,            mh.must_have_cnt
,            vn.vn_no
HAVING       COUNT (CASE WHEN gt.token_type = '*' THEN 1 END)     = mh.must_have_cnt
AND       COUNT (CASE WHEN gt.token_type = '-' THEN 1 END)     = 0
ORDER BY mh.cat
- query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
When your query takes too long ...
HOW TO: Post a SQL statement tuning request - template posting
Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
Edited by: Frank Kulash on Mar 22, 2011 12:04 PM

Spliting a large string using regular expression which contain special char

Similar Messages

Maybe you are looking for