Spliting a large string using regular expression which contain special char

I have huge sting(xml) containing normal character a-z,A-Z and 0-9 as well as special char( <,>,?,&,',",;,/ etc.)
I need to split this sting where it ends with </document>
for e.g.
Original String:
<document>
<item>sdf</item>
<item><text>sd</text</item>
</document>
<document>hi</document>
The above sting has to be splited in to two parts since it is having two document tag.
Can any body help me to resolve this issue. I can use StringTokenizer,String split method or Regular expression api too.

manas589 wrote:
I used DOM and sax parser and got few exception. Again i don't have right to change xml. so i thought to go with RegularExpression or some other way where i can do my job.If the file actually comes in lines like what you posted, you should just be able to compare the contents of each line to see if it contains "</document>" or whatever you're looking for. I wouldn't use regex unless I needed another problem.
I got excpetion like: Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
     at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
     at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
     at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)So then it isn't even XML.
Edit: sorry, I just realized why you're considering all of these heavy-duty ideas. It's just that you don't know how to break the string into lines. You do it like this:
BufferedReader  br = new BufferedReader(new StringReader(theNotXMLString));

Similar Messages

  • Format string using Regular Expression

    Input string output format...
    SELECT q'<select ab_c "ABC", efg "EFG" from dual>' str FROM DUAL
    Output:
    STR                                 
    select ab_c "ABC", efg "EFG" from dual
    Required output format using regular expression...
    STR                                 
    select 'ab_c' "ABC", 'efg' "EFG" from dual

    Regular expressions have many limitations as parsing tools, and you didn't specify the rules you wanted. This expression puts quotes around the non blank string before a quoted string:
    SELECT regexp_replace(q'<select ab_c "ABC", efg "EFG" from dual>',
                          '([^" ]+)( +"[^ ]*")' , '''\1''\2' ) str FROM DUAL;
    STR
    select 'ab_c' "ABC", 'efg' "EFG" from dual
    {code}
    It is not robust - a missing " will confuse it, and you should be using bind variables anyway.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

  • Getting non numeric strings using regular expression

    Hi Guys ,
    I  want to get list of string values in table which contains no numeric values  .....
    I have a   string column name A and table name B  .
    I have written following code , but it seems it is incorrect  .
    Plz help me out  .....
    SELECT
    A FROM
    B
    WHERE
    regexp_like(A, '([^[:digit:]])'
    Thanks in advance ....

    96097f0e-f165-463a-a0a2-3d15214c8a3d wrote:
    Hi Guys ,
    I  want to get list of string values in table which contains no numeric values  .....
    I have a   string column name A and table name B  .
    I have written following code , but it seems it is incorrect  .
    Plz help me out  .....
    SELECT
    A FROM
    B
    WHERE
    regexp_like(A, '([^[:digit:]])'
    Thanks in advance ....
    That will give you every one that has at least one non-numeric character, if you want ones which contain no numeric characters then it should be
    regexp_like(A,'^[^0-9]*$')

  • Filter String using Regular Expression

    Hello,
    I have an application that monitors serial communication between a PC and device.  The message protocol is a byte stream that I convert to a string to parse into pretty messages.  The start of the string is always "10 02", but if the string is preceded with another "10" like this "10 10 02" it is part of a message.  I've been trying to use a regular expression with the Search and Replace VI.  My regex is "[^10]\s10\s02" which almost works but it cuts off part of the message:
    Before:
    10 03 10 02
    After:
    10 0   <= missing the "3"
    10 02
    Here's what I'm doing:
    Any ideas on what I'm missing?  I've attached a simple example.
    Thanks
    Message Edited by Derek Price on 02-14-2008 08:37 PM
    Attachments:
    Filter Beginning Message1.vi ‏14 KB
    FilterMessageRegex1.png ‏7 KB

    Try this approach.
    Do search and replace on '10\s02' and replace with '\r\n10\s20'
    Then do another search and replace on '10\r\n\10\s20'  with '10\s10\s20'
    See attached.
    Randall Pursley
    Attachments:
    Message Filter.PNG ‏18 KB

  • How to Capture Multiple Line String using Regular Expression?

    Hi, 
    I have a simple program like this:
    What I want to accomplish is to capture everything between >>start and >>end using a single Match Regular Expression node. It seems that setting multiple? to True or False does not help.
    I am using LabVIEW 2012.
    If it is impossible to capture it using a single node, that is fine. But I want to make sure that I can make full use of this node without combining serveral others.
    Thank you!
    TailOfGon
    Certified LabVIEW Architect 2013
    Solved!
    Go to Solution.

    Thank you for the fast response! Your solution worked in the example case
    After I saw your post, I was finally able to step forward. But I still wanted to make use of dot notation due to the limitation of characters that match with \w. 
    I made some more modification to your regular expression then now it seems working for all characters:
    >>start((?:\s|.)*)>>end
    Thanks!
    TailOfGon
    Certified LabVIEW Architect 2013

  • Dumbfounded by Scanner processing String using regular expression

    I was reading Bruce Eckel's book when I came across something interesting: extending Scanner with regular expressions. Unfortunately, I was confronted with an issue that doesn't make much sense to me: if the String that I am scanning contains a hyphen, the Scanner doesn't produce anything. As soon as I take it out, it all works like a charm. Here is my example:
    import java.util.Scanner;
    import java.util.regex.*;
    public class StringScan {
    public static void main (String [] args){
         String input = "there's one caveat when scanning with regular expressions";
         Scanner scanner = new Scanner (input);
         String pattern = "[a-z]\\w+";
         while (scanner.hasNext(pattern)){
              scanner.next(pattern);
              MatchResult match = scanner.match();
              String output = match.group();
              System.out.println(output);
    }What could be the reason? I imagined it could be because the hyphen for some reason gets given a special meaning but when I tried escaping it, it still didn't work.

    Thanks for your prompt reply.
    I have figured out what was wrong with my code, by the way. Since a single quote is not a word character, it does not match w+. And as the very first input token does not match, the scanner stops immediately. I rewrote my regex to "[a-z].*" and now it does work.

  • Replace a string using regular expression from powershell

    I want to replace the following:
    'browserName': 'firefox'
    with :
    'browserName': 'chrome'
    then I tried this:
    (get-content $conffile) -replace "^('browserName': ')\S+","browserName': 'chrome' |set-content $conffile
    But nothing happened.
    Could someboby tell me how to write the regular expression here? Thanks a lot.

    Second person today with the same question.
    get-content $conffile |%{$_ -replace "'browserName':\s+'firefox'","'browserName': 'chrome'"  | set-content $conffile
    \_(ツ)_/

  • Filter Strings using regular expressions

    Requirements.
    1.I have a table with different names.
    2.I input a word(string) through a text box.
    3.I filter table using the input string through text box using the code
    ((DefaultRowSorter)table_customer.getRowSorter()).setRowFilter(RowFilter.regexFilter(regex, indices));
    4.regex is obtained as follows.
    String regex = "";
    String text = txtFilterText.getText();
    regex = "^(?i)"text".*"; //for starts with filter
    regex = "." + text + ".";//for contains filter
    regex = "(?i)["text".*]";//for doesnt start with filter
    regex =".*(?i)"text"$";//for end with filter
    I need help for doesnt contain and doesnt end with filters.Plz help me out..
    Anees

    h2. {color:ff0000}Double post{color}
    Reply here: http://forum.java.sun.com/thread.jspa?threadID=5231406

  • Using Regular Expressions to replace Quotes in Strings

    I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
    String temp = "\"Hello\" i am a \"variable\"";
    temp = temp.replaceAll("\"","\\\\\"");
    however, this does not work and when i print out the code to the file the resulting code appears as:
    String someVar = ""Hello" i am a "variable"";
    and not as:
    String someVar = "\"Hello\" i am a \"variable\"";
    I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
    Thanks in advance.

    Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.

  • String extract using regular expression

    Hi
    I have text like this "<a>45</a><ct>Hi</ct><R>45 85</R><H>Here</H>" .I want to extract using regular expression or any techniques the text between <R> and </R> also need to replace the space with pipe between 45 and 85 like "45|85"
    Edited by: vishnu prakash on Mar 2, 2012 4:42 AM

    Hi,
    Here's one way:
    REPLACE ( REGEXP_REPLACE ( txt
                    , '.*<R>(.*)</R>.*'
                    , '\1'
         , '|'
         )This assumes there is only one <R> tag in txt.
    Always say which version of Oracle you're using. The expression above will work in Oralce 10 and up, but starting in Oracle 11 you can use REGEXP_SUBSTR rather than the less intuitive REGEXP_REPLACE.
    Edited by: Frank Kulash on Mar 2, 2012 7:48 AM

  • Changeparticular characters in a string by using regular expressions ...

    Hello Everyone,
    I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
    My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
    Is it possible by using regular expressions in oracle ?
    Can you give me some clues ?
    Thank you

    SSU wrote:
    Hello Everyone,
    I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
    My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
    Is it possible by using regular expressions in oracle ?
    Can you give me some clues ?
    Thank you
    SQL> SELECT
      2  regexp_replace('sdfsdf Sdfdfs Sdfd','s|S','X') from dual;
    REGEXP_REPLACE('SD
    XdfXdf XdfdfX XdfdRegards,
    Achyut

  • How to define a regular expression using  regular expressions

    Hi,
    I am looking for some regular expression pattern which will identify a regular expression.
    Also, is it possible to know how does the compile method of Pattern class in java.util.regex package work when it is given a String containing a regex. ie. is there any mechanism to validate regular expression using regular expression pattern.
    Regards,
    Abhisek

    I am looking for some regular expression pattern which will identify a regular
    expression. Also, is it possible to know how does the compile method of
    Pattern class in java.util.regex package work when it is given a String
    containing a regex. ie. is there any mechanism to validate regular
    expression using regular expression pattern.It is impossble to recognize an (in)valid regular expression string using a
    regular expression. Google for 'pumping lemma' for a formal proof.
    kind regards,
    Jos

  • Using Regular Expressions for Completion

    I'm trying to build a text completer for a simple little editor. The general idea is that I have a regular expression which describes the syntax of an expression and a set of strings which are all semantically valid cases of the expression (the latter of which is not particularly important to my problem). I would like to be able to determine, using the expression described, whether or not a section of text is capable of beginning a syntactically valid expression, not matching it.
    For example, given the expression
    "#[A-Za-z0-9]#" the string "#name#" is syntactically valid, whereas the string "#_blarg" is not. What I would like to do is be able to determine that "#partial" has the potential to match the pattern with more input, even if it doesn't yet. Specifically, the eventual use will be in such a case as the string X=#partial+3. If the cursor is positioned before the "+" and my user presses the completion keystroke, I want to recognize that "#partial" is what I need to recognize. Also, positioning the cursor immediately after the "=" and pressing the keystroke will do nothing, since nothing before the "=" is capable of matching the pattern properly.
    Is this possible? I don't have to use this exact approach, but it is important that I be able to use the regular expression in detecting a partially completed expression. If I can, the set of regular expressions which already exist in the code can be used to drive the auto completer. Otherwise, I'll have to write a special recognition module for each case; that wouldn't be pretty.
    Thanks for your time! I'll provide other information upon request, if it'd help. :)

    Thank you both for discussing this; it has definitely helped me in reaching a better understanding of uncle_alice's answer to my problem. I've adjusted my code to use this approach and, for the most part, it seems to work.
    I say "for the most part" because I am compiling Patterns with the case insensitivity flag. This appears to do horrible, horrible things. Take a look at the following code, modified from uncle_alice's example:
    String[] str = {"#test#hello", "#tes", "blargblarg", "", "#test#", "S"};
    String rgx = "#[A-Za-z0-9]+#";
    Pattern pc = Pattern.compile(rgx);
    Pattern pi = Pattern.compile(rgx, Pattern.CASE_INSENSITIVE);
    for (String s : str)
        System.out.println("    For string: "+s);
        for (Pattern p : new Pattern[]{pc, pi}) // once for each pattern
            Matcher m = p.matcher(s);
            if (m.matches())
                System.out.printf("Matched '%s'", m.group());
            } else
                System.out.print("No match");
            System.out.println("; hitEnd = " + m.hitEnd());
    }That produces the following output:
        For string: #test#hello
    No match; hitEnd = false
    No match; hitEnd = true
        For string: #tes
    No match; hitEnd = true
    No match; hitEnd = true
        For string: blargblarg
    No match; hitEnd = false
    No match; hitEnd = true
        For string:
    No match; hitEnd = true
    No match; hitEnd = true
        For string: #test#
    Matched '#test#'; hitEnd = false
    Matched '#test#'; hitEnd = false
        For string: S
    No match; hitEnd = false
    No match; hitEnd = trueIt would seem that, with the case-insensitive flag set, hitEnd always returns true unless a match is found. Why is this? I find it quite confusing.
    I can adjust my design to accomodate if this problem cannot be circumvented; however, I'd like to understand what has going wrong here. :)
    Cheers! Thanks so much for all your help!

  • Finding URLs using regular expression.

    I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
    "Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
    I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
    For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
    My acutal problem is to find the URL irrespective its position within the string.
    Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
    Matcher matcher = urlPattern.matcher(plainText);
    Map stringIndexMap = new HashMap();
    //Searching the input string for urlPattern...
    while(matcher.find()) {
    String urlString = matcher.group();
    //Storing the urls in a hashmap with their indices as keys....
    stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
    Set keySet = stringIndexMap.keySet();
    Iterator it = keySet.iterator();
    //Iterating over the hashmap containing urls...
    while(it.hasNext()) {
    String urlString = (String) stringIndexMap.get(it.next());
    * Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
    * using String index
    clickableURLString.replace(clickableURLString.indexOf(urlString),
    clickableURLString.indexOf(urlString) + urlString.length(),
    "<a href=\"#\" onclick=\"window.open('" + urlString
    + "')\">" + urlString + "</a>");
    return clickableURLString.toString();

    The end of the input is '$' as a regex.
    import java.util.regex.*;
    public class Prasanna{
      public static void main(String[] args){
        String text
    = "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
    //    String regex = "(http|https)://.+?(?:\\s|$)"; // this works
        String regex = "(http|https)://[^ ]+";          // this also works
        Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher mat = pat.matcher(text);
        while (mat.find()){
          System.out.println(mat.group());
    }

  • Pattern matching using Regular expression

    Hi,
    I am working on pattern matching using regular expression. I the table, I have 2 columns A and B
    A has value 'A499BPAU4A32A386KBCZ4C13C41D20E'
    B has value like '*CZ4*M11*7NQ+RDR+RSM-R9A-R9B'
    the requirement is that I have to match the columns of B in A. If there is a value with * sign, this must be present in A like 'CZ4' should exit in string A.
    The issue I am facing is that there are 2 values with * sign. The code works fine for first match (CZ4) but it does not look further as M11 does not exist in A.
    I used the condition
    AND instr(A,substr(REGEXP_SUBSTR(B, '*[^*]{3}'),2) ,1)=0
    First of all, is this possible to match multiple patterns in one condition?
    If yes, please suggest.
    Thanks

    user2544469 wrote:
    Thanks a lot Frank. This query worked wonderful for the test data I have provided however I have some concerns:
    - query doesnot include the column BOOK which is a mandatory check.Sorry, that was my mistake. It was a very easy mistake to make, since you posted sample data where it didn't matter. Instead of doing a cross-join between vn and got_must_have_cnt, do an inner join, using book. That means book will have to be in got_must_have_cnt, and all the sub-queries from which it descends. Look for comments that say "March 22".
    If you want to treat '+' in test_cat.codes as '*', then the simplest thing is probably just to use REPLACE, so that when the table has '+', you use '*' instead.
    WITH     got_token_cnt     AS
         SELECT     cat
         ,     book                                        -- Added March 22
         ,     REPLACE (codes, '+', '*') AS codes                    -- If desired.  Changed March 22
         ,     LENGTH (codes) - LENGTH ( TRANSLATE ( codes
                                                       , 'x*+-'
                                      , 'x'
                             ) AS token_cnt
         FROM    test_cat
    ,     cntr     AS
         SELECT     LEVEL     AS n
         FROM     (  SELECT  MAX (token_cnt)     AS max_token_cnt
                 FROM        got_token_cnt
         CONNECT BY     LEVEL     <= max_token_cnt
    ,     got_tokens     AS
         SELECT     t.cat
         ,     t.book                                        -- Added March 22
         ,     REGEXP_SUBSTR ( t.codes
                         , '[*+-]'
                         , 1
                         , c.n
                         )          AS token_type
         ,     SUBSTR ( REGEXP_SUBSTR ( t.codes
                                       , '[*+-][^*+-]*'
                               , 1
                               , c.n
                   , 2
                   )          AS token
         FROM     got_token_cnt     t
         JOIN     cntr          c  ON     c.n     <= t.token_cnt
    ,     got_must_have_cnt     AS
         SELECT       cat, book                                   -- Changed March 22
         ,       COUNT (CASE WHEN token_type = '*' THEN 1 END) AS must_have_cnt
         FROM       got_tokens
         GROUP BY  cat, book                                   -- Changed March 22
    SELECT       mh.cat
    ,       vn.vn_no
    FROM       got_must_have_cnt     mh
    JOIN                    vn  ON  mh.book     = vn.book               -- Changed March 22
    LEFT OUTER JOIN      got_tokens     gt  ON     mh.cat                  = gt.cat
                                     AND INSTR (vn.codes, gt.token) > 1
    GROUP BY  mh.cat
    ,            mh.must_have_cnt
    ,            vn.vn_no
    HAVING       COUNT (CASE WHEN gt.token_type = '*' THEN 1 END)     = mh.must_have_cnt
    AND       COUNT (CASE WHEN gt.token_type = '-' THEN 1 END)     = 0
    ORDER BY  mh.cat
    - query is very slow with 60000 records in vn table. Cost is somewhere around 36000.See these threads:
    When your query takes too long ...
    HOW TO: Post a SQL statement tuning request - template posting
    Relational databases were designed to have (at most) one piece of information in each column. If you decide to have multiple items in the same column (as you have a variable number of tokens in the codes column), don't be surprised if that makes things slower and more complicated. Most of the query I posted, and perhaps most of the time needed, is jsut to normalize the data. If you stored the data in a narmalized form, perhaps something like got_tokens, then you wouldn't need the first 3 sub-queries that I posted.
    Edited by: Frank Kulash on Mar 22, 2011 12:04 PM

Maybe you are looking for

  • I have iphone 5. Have no sound and home button is not working pls help., I have iphone 5. Have no sound and home button is not working pls help.

    My iphone 5 has just stopped giving any sound no ringtones, no key clicking or button sounds nothing at all. And the home button has stopped working. Has anyone heard of software 6.1.4 being corrupt Im just updating software at the moment to 7.1. If

  • Sort order problem

    I have a minor, albeit annoying, problem that I can't seem to solve. I have one recently taken photo in my Aperture library which for whatever reason seems to have a preview that is dated as the oldest photo in my library. It displays in the correct

  • Can 10.4 included with my new computer be installed on my old?

    Hi, I just bought a new iMac 20" about a week ago. Along with that, I have my Powerbook 12" which I owned previous to this purchase. I have installation disks with my new iMac and want to know if I can install the OS on my Powerbook so they're both o

  • I can't make my first download in AppStore!

    When i try to make my first App Store download on my new iPhone 4, I'm asked to fill in my Password for Apple ID. When I'm done and press OK, I get the same message and I have to fill it in again, and again.. if I fill in wrong pass, it says "Wrong p

  • I can't open .doc files attached to the mail.

    After upgrading to the last version (31.6.00); I cannot open .doc files. I use Open Office 4.1.1. No problem with .xls or .pdf files. But neither the attachments in my sent mails nor the attachments in incoming mails type .doc can be opened by clicki