Regex pattern extraction

Hi,
I want to know how regex can be used
to extract a part of the sentece within tags.
for eg,
</p> example text </p>
i want only the info within the /p tags. ie "example text" only.
-Mani
Message was edited by:
a.mani24
Message was edited by:
a.mani24

i want only the info within the /p tags. ie "example
text" only.If you want to parse html then you should use an html parser unless at least on the following is true
1. The htms is very strictly structured (almost always machine generated.)
2. The searched for item is very unique and very simple.

Similar Messages

  • Searching Site Content Using REGEX Patterns

    Intent: Detect content in SharePoint 2013 lists and libraries that matches a REGEX pattern, like social security numbers.
    SharePoint 2013 only exposes KQL and FQL languages. 
    http://msdn.microsoft.com/en-us/library/office/jj163973.aspx
    I am comfortable writing this as an App.  I do not know how to pass the search index through a REGEX match.  Perhaps there is a way to access the data on more of a server model instead of through the client APIs?

    Hi  Eric,
    For achieving your demand, you can write a Content Enrichment Web Service to extract regex patterns from a managed property.
    Here is a blog you can refer to:
    SharePoint 2013 Content Enrichment: Regular Expression Data Extraction:
    http://blogs.technet.com/b/peter_dempsey/archive/2013/12/04/sharepoint-2013-content-enrichment-regular-expression-data-extraction.aspx
    Reference:
    http://msdn.microsoft.com/en-us/library/office/jj163982.aspx
    http://blogs.msdn.com/b/richard_dizeregas_blog/archive/2013/06/19/advanced-content-enrichment-in-sharepoint-2013-search.aspx
    Thanks,
    Eric
    Forum Support
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support,
    contact [email protected]
    Eric Tao
    TechNet Community Support

  • Regex pattern, filter delimiter in sql code

    Hi,
    The problem I'm having is that the regex pattern below is not catching the beginning "go" and ending "go" of a string.
    "(?iu)[(?<=\\s)]\\bgo\\b(?=\\s)"
    The idea is catching the "whole word", in this case the word is "go" so if the word is at the beginning of the string or at the end, i still want to include it.
    So, for example:
    "go select * from table1 go" -> should catch 2 "go"s but catches 0
    "go go# select * from table1 --go go" -> should also catch 2 "go"s but catches 0
    "go go select * from table1 go go" -> should catch 4 "go"s but catches 2
    I have the "[(?<=\\s)]" and the "(?=\\s)" so that the word "go" when next to a special character is not included, for example "--go".
    The problem is that this also negates the beginning and ending of the string.
    Code to test example: It should split at 1st, 2nd and last "go", but only splits at the 2nd "go".
    String s = "go go select * from table1 --go go";
    String delimiter = "go";
    String[] queries = s.split("(?iu)[(?<=\\s)]\\b" + delimiter + "\\b(?=\\s)");
    for (int i = 0; i < queries.length; i++) {
         System.out.println(queries[i]);
    I really need to fix this but I'm not having much success.
    Any help will be appreciated, thanks in advance.

    Yes,
    I prefer this one: Regex Powertoy (interactive regular expressions)
    It's not 100% perfect, but you can see with my example and this online tester that the 1st "go" is not matched. And this is a problem for me.
    I want to eliminate the special characters like "#go" or "-go" but i don't want to eliminate the end and start of string.

  • Regex Pattern For this String ":=)"

    Hello All
    True, this isn't the place to ask this but here it is anyway, if you can help please do.
    Can anybody tell me the regex pattern for this String:
    ":=)"
    Thanks
    John

    Yep, cheers it's ":=\\)"
    public class Test {
         public static void main( String args[] ) {
              String s = "one:=)two:=)three:=)four";
              String ss[] = s.split( ":=\\)" );
              for( int i=0; i<ss.length; i++ )
                   System.out.println( "ss["+i+"] = {" + ss[i] + "}" );
    }resulting in:
    ss[0] = {one}
    ss[1] = {two}
    ss[2] = {three}
    ss[3] = {four}

  • Java Regex Pattern

    Hello,
    I have parsed a text file and want to use a java regex pattern to get the status like "warning" and "ok" ("ok" should follow the "warning" then need to parser it ), does anyone have idea? How to find ok that follows the warning status? thanks in advance!
    text example
    121; test test; test0; ok; test test
    121; test test; test0; ok; test test
    123; test test; test1; warning; test test
    124; test test; test1; ok; test test
    125; test test; test2; warning; test test
    126; test test; test3; warning; test test
    127; test test; test4; warning; test test
    128; test test; test2; ok; test test
    129; test test; test3; ok; test testjava code:
    String flag= "warning";
              while ((line= bs.readLine()) != null) {
                   String[] tokens = line.split(";");
                   for(int i=1; i<tokens.length; i++){
                        Pattern pattern = Pattern.compile(flag);
                        Matcher matcher = pattern.matcher(tokens);
                        if(matcher.matches()){
    // save into a list

    sorry, I try to expain it in more details. I want to parse this text file and save the status like "warning" and "ok" into a list. The question is I only need the "ok" that follow the "warning", that means if "test1 warning" then looking for "test1 ok".
    121; content; test0; ok; 12444      <-- that i don't want to have
    123; content; test1; warning; 126767
    124; content; test1; ok; 1265        <-- that i need to have
    121; content; test9; ok; 12444      <-- that i don't want to have
    125; content; test2; warning; 2376
    126; content; test3; warning; 78787
    128; content; test2; ok; 877666    <-- that i need to have
    129; content; test3; ok; 877666    <-- that i need to have
    // here maybe a regex pattern could be deal with my problem
    // if "warning|ok" then list all element with the status "warning and ok"
    String flag= "warning";
              while ((line= bs.readLine()) != null) {
                   String[] tokens = line.split(";");
                   for(int i=1; i<tokens.length; i++){
                        Pattern pattern = Pattern.compile(flag);
                        Matcher matcher = pattern.matcher(tokens);
                        if(matcher.matches()){
    // save into a list

  • Util.regex.Pattern documentation

    The 1.5 documentation for util.regex.Pattern defines quantifiers that are greedy, reluctant, or possessive. The definitions of these quantifiers seem to be the same. For example, X?, X??, and X?+ are each defined as "X, once or not at all." Is this a mistake? If not, what's that difference among greedy, reluctant, and possessive?

    It's not a mistake, it's just incomplete. A normal (greedy) quantifier matches as many times as it can, but will back off if necessary to achieve an overall match. A reluctant quantifier matches the minimum number of times that it has to, and only tries to match more if that's the only way to achieve an overall match. A greedy quantifier matches as many times as it can and never backs off, even if that makes an overall match impossible. Here's a demonstration:import java.util.regex.*;
    public class Test
      public static void main(String[] args)
        String input = "XXXXX";
        Pattern p1 = Pattern.compile("(X+)(X+)");
        Pattern p2 = Pattern.compile("(X+?)(X+)");
        Pattern p3 = Pattern.compile("(X++)(X+)");
        Matcher m = p1.matcher(input);
        if (m.matches())
           System.out.println("p1:\t" + m.group(1) + "\t" + m.group(2));
        m = p2.matcher(input);
        if (m.matches())
           System.out.println("p2:\t" + m.group(1) + "\t" + m.group(2));
        m = p3.matcher(input);
        if (m.matches())
           System.out.println("p3:\t" + m.group(1) + "\t" + m.group(2));
    p1:     XXXX    X
    p2:     X       XXXXIn p1, the X+ in the first group initially matches all five X's, then hands off to the second group. The X+ there has to match at least one X, but there are none left. So the first group gives up one of its X's, the second group matches it, and Bob's your uncle.
    In p2, the X+? has to match at least one X, so it does, then hands off to the second group, which happily gobbles up the rest of the input.
    In p3, the X++ matches all the X's, but refuses to back off and give the X+ in the second group the one X it needs, so the match fails.

  • Applying REGEX-pattern into XML File

    I have the following problem:
    I have an xml-file. let's say...
    <NODE><NODE1 attr1="a1" attr2="a2">
         <NAME> abc</NAME>
         <VERSION> 1.0</VERSION>
    </NODE1>
    <NODE2 attr1="a3" attr2="a4">
         <NAME> xyz</NAME>
         <VERSION> 3.1</VERSION>
    </NODE2></NODE>I need to know "HOW can I get the values of <NAME></NAME> and <VERSION></VERSION> without using DOM.
    Since my xml-file is pretty big and DOM will take much Memory, i want to avoid it.
    Can anybody suggest some "Regex pattern" so that i can apply it on the xml-file (after converting into String)
    Thanks in Advance

    That worked perfectly. I assumed ( insert comment here ) that the members of the Properties objects were Strings, and therefore followed the same rules where "\" characters are concerned.
    Thank you for pointing out the difference between the two objects, I am not sure how long it would have taken me to figure that out.
    Regards,
    John Gooch

  • Regex - Pattern for positive numbers

    Hi,
    I wanna check for positive numbers.
    My code so far:
    Pattern p = Pattern.compile("\\d+");
    Matcher m = p.matcher(str);
    boolean b = m.matches(); But I don't know how to check for positive numbers (including 0).
    Thanks
    Jonny

    Just to make your life easier:
    package samples;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    * @author notivago
    public class Positive {
        public static void main(String[] args) {
            String input = "- 12 +10 10 -12 15 -12,000 10,000 5,000.42";
            Pattern p = Pattern.compile( "\\b(?<!-\\s?|\\.|,)([0-9]+(?:,?[0-9]{3})*(?:\\.[0-9]*)?)" );
            Matcher matcher = p.matcher( input );
            while( matcher.find() ) {
                System.out.println( "Match: " + matcher.group(1) );
    }

  • Help with regex pattern matching.

    Hi everyone
    I am trying to write a regex that will extract each of the links from a piece of HTML code. The sample piece of HTML is as follows:
    <td class="content" valign="top">
         <!-- BODY CONTENT -->
    <script language="JavaScript"
    src="http://chat.livechatinc.net/licence/1023687/script.cgi?lang=en&groups=0"></script>
    <a href="makeReservation.html">Making a reservation</a><br/>
    <a href="changeAccount.html">Changing my account</a><br/>
    <a href="viewBooking.html">Viewing my bookings</a><br/>I am interested in extracting each link and the corrresponding text for that link into groups.
    So far I have the following regex <td class="content" valign="top">.*?<a href="(.*?)">(.*?)</a><br>However this regex only matches the first line in the block of links, but I need to match each line in the block of links.
    Any ideas? Any suggestions are appeciated as always.
    Thanks.

    Hi sabre,
    thanks for the reply.
    I am already using a while loop with matcher.find(), but it still only returns the first link based on my regex.
    the code is as follows.
    private static final Pattern MENU_ITEM_PATTERN = compilePattern("<td class=\"content\" valign=\"top\">.*?<a href=\"(.*?)\">(.*?)</a><br>");
    private LinkedHashMap<String,String> findHelpLinks(String body) {
        LinkedHashMap<String, String> helpLinks = new LinkedHashMap<String,String>();
        String link;
        String linkText;
          Matcher matcher = MENU_ITEM_PATTERN.matcher(body);
          while(matcher.find()){
            link = matcher.group(1);
            linkText = matcher.group(2);
            if(link != null && linkText != null){
              helpLinks.put(link,linkText);
        return helpLinks;
    private static Pattern compilePattern(String pattern) {
        return Pattern.compile(pattern, Pattern.DOTALL + Pattern.MULTILINE
            + Pattern.CASE_INSENSITIVE);
      }Any ideas?

  • Java Regex Question extract Substring

    Hello
    I've readt the regex course on http://www.regenechsen.de/regex_de/regex_1_de.html but the regex rules described in the course and its behavior in the "real world" doesn't makes sense. For sample: in the whole string: <INPUT TYPE="Text" name="Input_Vorname">
    the matcher should extract only the fieldname so "Input_Vorname" i tried a lot of patterns so this:
    "name="(.*?)\"";
    "<.*name=\"(.*)\".?>";
    "<.*?name=\"(w+)\".*>";
    "name=\".*\"";
    and so on. But nothing (NOTHING) works. Either it finds anything or nothing. Whats wrong ?
    Can somebody declare me what I've made wrong and where my train of thought was gone wrong?
    Roland

    When you use the matches() method, the regex has to match the entire input, but if you use find(), the Matcher will search for a substring that matches the regex. Here's how you would use it:  String nameRegex = "name=\"(.*?)\"";
      Pattern namePattern = Pattern.compile(nameRegex,Pattern.CASE_INSENSITIVE);
      Matcher nameMatcher = namePattern.matcher(token);
      if (nameMatcher.find()) {
        String fieldName = nameMatcher.group(1);
      }But the main issue is that you're using the wrong method(s) to retrieve the name. The start() and end() methods return the start and end positions of the entire match, but you're only interested in whatever was matched by the subexpression inside the parentheses (round brackets). That's called a capturing group, and groups are numbered according to the order in which they appear, so you should be using start(1) and end(1) instead of start() and end(). Or you can just use group(1), as I've done here, which returns the same thing as your substring() call.
    Knowing that, you could go ahead and use matches(), with an appropriate regex:  String nameRegex = "<.*?name=\"(\\w+)\".*?>";
      Pattern namePattern = Pattern.compile(nameRegex,Pattern.CASE_INSENSITIVE);
      Matcher nameMatcher = namePattern.matcher(token);
      if (nameMatcher.matches()) {
        String fieldName = nameMatcher.group(1);
      }

  • Regex Pattern help.

    Me and my friend pedrofire, that�s probably around forums somewhere, are newbies. We are trying to get a log file line and process correctly but we are found some dificculties to create the right expression pattern.
    My log have lines like:
    User 'INEXIST' with session 'ax1zjd8yEeHh' added content '769' with uri 'http://mail.yahoo.com/'.
    User 'INEXIST' with session 'ax1zjd8yEeHh' changed folder from 'E-mails' to 'Milhagem'.
    User 'INEXIST' with session 'a8jXrY_N38ja' updated all content of folder 'Bancos'.
    i need to get the following data
    USER - [INEXIST]
    SESSION - [ax1zjd8yEeHh]
    ACTION - [added] or [changed] or [updated].
    Getting the user and the session is easy, but i am having difficulties grabing the action, because i need to take just the action word without blank spaces igonring the words content or folder or all.
    I m trying this for hours, but to a newbie is a little dificult
    Any help is welcome
    Thanks
    Peter Redman

    Hi,
    How about something like:
    import java.util.regex.*;
    public class RegexpTest
       private static final Pattern p = Pattern.compile(
             "^User '([^']+)' with session '([^']+)' ([^ ]+) .*$" );
       public static void main( String[] argv )
          find( "User 'INEXIST' with session 'ax1zjd8yEeHh' added content '769' with uri 'http://mail.yahoo.com/'." );
          find( "User 'INEXIST' with session 'ax1zjd8yEeHh' changed folder from 'E-mails' to 'Milhagem'." );
          find( "User 'INEXIST' with session 'a8jXrY_N38ja' updated all content of folder 'Bancos'." );
       public static void find( String text )
          System.out.println( "Text: " + text );
          Matcher m = p.matcher( text );
          if ( ! m.matches() ) return;
          String user = m.group(1);
          String session = m.group(2);
          String action = m.group(3);
          System.out.println( "User: " + user );
          System.out.println( "Session: " + session );
          System.out.println( "Action: " + action );
       }which results in:
    Text: User 'INEXIST' with session 'ax1zjd8yEeHh' added content '769' with uri 'http://mail.yahoo.com/'.
    User: INEXIST
    Session: ax1zjd8yEeHh
    Action: added
    Text: User 'INEXIST' with session 'ax1zjd8yEeHh' changed folder from 'E-mails' to 'Milhagem'.
    User: INEXIST
    Session: ax1zjd8yEeHh
    Action: changed
    Text: User 'INEXIST' with session 'a8jXrY_N38ja' updated all content of folder 'Bancos'.
    User: INEXIST
    Session: a8jXrY_N38ja
    Action: updatedYou should probably change the Pattern to be less explicit about what it matches. i.e. changes spaces to \\s+ or something similar.
    Ol.

  • Regex pattern question

    Hi,
    I'm trying to get my feet wet wtih java and regular expressions, done a lof of it in perl, but need some help with java.
    I have an xml file (also working through the sax tutorial, but this question is related to regex)that has multiple elements, each element has a title tag:
    <element lev1>10<title>element title</title>
    <element lev2>20<title>another element title</title>
    </element lev2>
    </element lev1>If I have the following pattern:
    Pattern Title = Pattern.compile("(?i)<title>([^<]*)</title>");that picks up the titles, but I can't distinguish which title belongs to which element. Basically what I want to have is:
    Pattern coreTitle = Pattern.compile("(?i)<element lev1>(**any thing that isn't an </title> tag**)</title>");looked through the tutorials, will keep looking, I'm sure it's in there somewhere, but if someone could point me in the right direction, that would be great.
    thanks,
    bp

    Just guessing, but maybe...
    Pattern.compile("(?i)<element lev1>*<title>([^<]*)</title>");
    But it seems that things like parsing with SAX (or loading to a DOM) or XPath would be much better suited to parsing out XML then regexp.

  • Ava.util.regex.pattern and * - + /

    hi...
    i'm korean... so I can't speak english.. sorry..^^
    but i hava a problem..
    import java.util.regex.*;
    public class Operator
    /     public static void main(String args[])
              String operator="/";
    ////////////////////////////////////////////////////////////// error point..
              Pattern pattern=Pattern.compile(operator);
              Matcher m=pattern.matcher("- ----* / */* /+");
              int count=0;
              while(m.find()) {
         count++;
              System.out.println(count);
    Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
    +
    operator : / - : ok...
    operator : * + : error...
    i had to use + *..
    what's problem??

    Are you using matches()? Then keep in mind that it requires that the entire String is matched by the RE.
    pattern.matcher("about:foobar").matches(); //will return false, as "foobar" is not matched by your pattern
    pattern.matcher("about:").matches(); //will return true
    pattern.matcher("about:foobar").find(); //will return true
    pattern.matcher("notabout:foobar").find(); // will return false

  • Regex Pattern Match in an extremely long string

    I need to search a file containing 1 extremely long line (approximately 1 million characters), The pattern I want to search is "ABC" as long as it appears at least n times whatever user input as n. I need the position of where this pattern is found. How to best do this? I tried to break the input into blocks of 100000 characters at a time as too many characters read cause the 'java out of memory' error to occur.
    Then I converted this to a string in order to use REGEX to search. My problem is how to ensure that the last few characters of the current block is also being searched too? How to write the regex expression to do this? Will breaking the input file into multiple lines help?
    eg:
    Searching for ABC as long as it appears at least 3 times continuously ie (ABCABCABC)
    Original Line = XXXXXXXABCABCABCXXXXXXABCX
    The first block of 10 characters read is XXXXXXXABCABC
    The second block of 10 characters read is ABCXXXXXXABCX
    The search result should be position 7 and position 22

    If the sequence of characters is longer than a few hundred KB, then turning it into a String requires you to have enough heap space available in the JVM to store the entire String.
    If that is a problem, an alternative solution is to have a while loop over an InputStream that reads from the source of characters (a file, a network connection, stdin etc.) and looks for the string. Keep a ring buffer the size of the query string, and read the data from the InputStream into it. Then for each character read, compare the content of the ring buffer to the query string.
    This way you will not use more heap space than the size of the query-string, and the size of whatever buffer you use in your InputStream (8KB for the empty constructor of BufferedInputStream at the moment) plus the odds'n ends from the implementation.

  • Regex pattern

    I am using (?!^)\\b which produces all tokens and delimiters from a string.split();
    I would like it to also separate a ');' pair into ')' and ';'
    Any way to modify my expression to do so?

    Always Learning wrote:
    I need help with regular expression pattern matching please.
    I would like to separate words, punctuation, parenthesis, and other characters using split()
    So far I have come up with string.split("(?!^)\\b|;") which separates things nicely and lops off the end of line.You can't have tested it very thoroughly since it can't possibly get even close to "separate things nicely" .
    >
    I would like it to ignore spaces and return just the things desired above.I really don't understand this.
    >
    Can you help? Seems hard to find folks with regex experience.There are plenty of us here who can help with regex but most like me expect a better requirement specification. Please spend some time providing a better specification.

Maybe you are looking for

  • Integration Process in Integration Scenario

    Hi All, I am trying to create an Integration scenario which can be used at configuration time to create all the config object. But during configuration time, in Assign Services step in Integration Scenario Configurator, it is only showing Business Sy

  • Pass parameter to all iViews in a portal role

    Hi, we have a portal role containing several iViews for ABAP Web Dynpro applications. Now we want to assign the parameter sap-wd-lightspeed=X to all these applications. Is it possible to do this in the portal application and can the customer set it o

  • BOE send email

    Hi, Do you know what need to be done in BOE Server so that we can schedule the report in BOE and send the report to a particular user via email? We're on BOE XI 3.1 SP2. Is this possible? I have read the documentation but it didn't mention on how to

  • Premiere/AME keeps crashing; CPU/RAM keep spiking. HELP!!!!

    Ok, so in either a rendering, or an exporting Premiere Pro CS5.5 project, or in the case of exporting media, also Media Encoder exporting file,  my PC will unexplicably shut down. Just cut off, then slowly start itself back up. I shoot on the Panason

  • Problema con el suwoofer del I-Trigue 3400

    el sonido si me funcion pero el sonido de lo graves no parece ke se debe a ke el software no lo reconoce bien, pero no se como arreglarlo si alguien me puede ayudar se lo agradezco.... Gracias