[Regular Expressions] Saving a variable number of matches

I'm stuck with the following problem and I don't seem to be able to solve without lots of ifs and else's.
I've got a program that you can pass patterns as parameters to. The program receives patterns as one single string.
The string could look like this:
a:i:foo r::bar t:ei:bark
or like this:
a:i:foo
What I'm hinting at is that the string comprises of several parts of the same structure. Each structure can be matched and saved with:
([art]:[ei]{0,2}:.*)
Now I want my regular expression able to match all the occurences without checking the string containing the pattern for something that could indicate the number of structures inside it. The following does not seem to work:
([art]:[ei]{0,2}:.*)+
So now I'm looking for something that would match one or more occurence of the structure and save it for future use.
I'd be really happy if someone could help me out here
Last edited by n0stradamus (2012-05-03 20:27:02)

Procyon wrote:
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/'
1 r::bar t:ei:bark
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/g'
1 1 1
If [^ ]* is not usable (spaces are allowed arbitrarily), you need a non-greedy .* and non-consuming look-ahead of " [art]:"
In python's re module, this is .*?(?=( [art]:|$))
>>> import re
>>> m=re.findall("([art]:[ei]{0,2}:.*?(?=( [art]:|$)))","a:i:foo r::bar t:ei:bark")
>>> print(m)
[('a:i:foo', ' r:'), ('r::bar', ' t:'), ('t:ei:bark', '')]
Exactly what I was looking for! I didn't know that you could specify .* to stop at a certain sequence of characters.
Could you please point me to some materials where I can read up on the topic?
Back to the regex: It works finde in Python, but sadly that is not the language I'm using
The program I need this for is written in C and until now the regex functions from glibc worked fine for me.
Have I missed a function similar to re.findall in glibc?

Similar Messages

  • Regular expression result into variable??

    Im dealing with log files here...i use regular expression to detect the values...how can i save these values into variable, for example 25-07-2005 into "date"?
    this is how my output statement look like
    myOutput1.print(matcher1.group(1));
    matcher1.group is the way i use to get the value using regular expression

    thanx...actually i have already match it in groups and it works fine..the problem is im dealing with log files...what im trying to do is reformat back the log files collected...example.."8 Nov" change it to 8-11-2005
    When i match it into the group its actually just finding the pattern, not saving it into variable..if its save in the variable would be easier..
    how can i change "8 Nov" to 8-11-2005?? can anyone help me with a simple code???
    thanks

  • Regular Expressions and String variables

    Hi,
    I am attempting to implement a system for searching text files for regular expression matches (similar to something like TextPad, etc.).
    Looking at the regular expression API, it appears that you can only match using string variables. I just wanted to make sure this is true. Some of these files might be large and I feel uneasy about loading them into ginormous Strings. Is this the only way to do it? Can I make a String as big as I want?
    Thanks,
    -Mike

    Newlines are only a problem if you're reading the
    text line-by-line and applying the regexp to each
    line. It wouldn't catch expressions that span
    lines.
    @sabre150: your note re: CharSequence -- so what
    you're suggesting is to implement a CharSequence that
    wraps the file contents, and then use the regexps on
    the whole thing? I like the idea but it seems like
    it would only be easy to implement if the file uses a
    fixed-width character set. Or am I missing
    something...?You are correct for the most basic implementation. It is very easy to create a char sequence for fixed width character sets using RandomAccessFile. Once you go to character sets such as UTF-8 then more effort is required.
    While ever the regex is moving forward thought the CharSequence one char at a time there is no problem because one can wrap a Reader but once it backtracks then one needs random access and one will need to have a buffer. I have used a ring buffer for this which seems to work OK but of course this will not allow the regex to move to any point in the CharSequence.
    'uncle_alice' is the regex king round here so listen to him.
    :-( I should read further ahead next time!
    Message was edited by:
    sabre150
    Message was edited by:
    sabre150

  • Regular Expression Character Sets with Pattern and Matcher

    Hi,
    I am a little bit confused about a regular expressions I am writing, it works in other languages but not in Java.
    The regular expressions is to match LaTeX commands from a file, and is as follows:
    \\begin{command}([.|\n\r\s]*)\\end{command}
    This does not work in Java but does in PHP, C, etc...
    The part that is strange is the . character. If placed as .* it works but if placed as [.]* it doesnt. Does this mean that . cannot be placed in a character range in Java?
    Any help very much appreciated.
    Kind Regards
    Paul Bain

    In PHP it seems that the "." still works as a all character operator inside character classes.
    The regular expression posted did not work, but it does if I do:
    \\begin{command}((.|[\n\r\s])*)?\\end{command}
    Basically what I'm trying to match is a block of LaTeX, so the \\begin{command} and \\end{command} in LaTeX, not regex, although the \\ is a single one in LaTeX. I basically want to match any block which starts with one of those and ends in the end command. so really the regular expression that counts is the bit in the middle, ((.|[\n\r\s])*)?
    Am I right it saying that the "?" will prevent the engine matching the first and last \\bein and \\end in the following example:
    \\begin{command}
    some stuff
    \\end{command}
    \\begin{command}
    some stuff
    \\end{command}

  • Cannot get regular expression to return true in String.matches()

    Hi,
    My String that I'm attempting to match a regular expression against is: value=='ORIG')
    My regular expression is: value=='ORIG'\\) The double backslashes are included as a delimiter for ')' which is a regular expression special character
    However, when I call the String.matches() method for this regular expression it returns false. Where am I going wrong?
    Thanks.

    The string doesn't contain what you think it contains, or you made a mistake in your implementation.
    public class Bar {
       public static void main(final String... args) {
          final String s = "value=='ORIG')";
          System.out.println(s.matches("value=='ORIG'\\)")); // Prints "true"
    }

  • SQL Injection and Java Regular Expression: How to match words?

    Dear friends,
    I am handling sql injection attack to our application with java regular expression. I used it to match that if there are malicious characters or key words injected into the parameter value.
    The denied characters and key words can be " ' ", " ; ", "insert", "delete" and so on. The expression I write is String pattern_str="('|;|insert|delete)+".
    I know it is not correct. It could not be used to only match the whole word insert or delete. Each character in the two words can be matched and it is not what I want. Do you have any idea to only match the whole word?
    Thanks,
    Ricky
    Edited by: Ricky Ru on 28/04/2011 02:29

    Avoid dynamic sql, avoid string concatenation and use bind variables and the risk is negligible.

  • "Match Regular Expression" and "Match Pattern" vi's behave differently

    Hi,
    I have a simple string matching need and by experimenting found that the "Match Regular Expression" and "Match Pattern" vi's behave somewhat differently. I'd assume that the regular expression inputs on both would behave the same. A difference I've discovered is that the "|" character (the "vertical bar" character, commonly used as an "or" operator) is recognized as such in the Match Regular Expression vi, but not in the Match Pattern vi (where it is taken literally). Furthermore, I cannot find any documentation in Help (on-line or in LabVIEW) about the "|" character usage in regular expressions. Is this documented anywhere?
    For example, suppose I want to match any of the following 4 words: "The" or "quick" or "brown" or "fox". The regular expression "The|quick|brown|fox" (without the quotes) works for the Match Regular Expression vi but not the Match Pattern vi. Below is a picture of the block diagram and the front panel results:
    The Help says that the Match Regular Expression vi performs somewhat slower than the Match Pattern vi, so I started with the latter. But since it doesn't work for me, I'll use the former. But does anyone have any idea of the speed difference? I'd assume it is negligible in such a simple example.
    Thanks!
    Solved!
    Go to Solution.

    Yep-
    You hit a point that's frustrated me a time or two as well (and incidentally, caused some hair-pulling that I can ill afford)
    The hint is in the help file:
    for Match regular expression "The Match Regular Expression function gives you more options for matching
    strings but performs more slowly than the Match Pattern function....Use regular
    expressions in this function to refine searches....
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    A plus sign or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [^0-9]+
    The word Level only if it
    appears at the beginning of the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The first string within parentheses but not containing any
    parentheses within it
    \([^()]*\)
    A left bracket
    A right bracket
    cat, cag, cot, cog, dat, dag, dot, and dag
    [cd][ao][tg]
    cat or dog
    cat|dog
    dog, cat
    dog, cat cat dog,cat
    cat cat dog, and so on
    ((cat )*dog)
    One or more of the letter a
    followed by a space and the same number of the letter a, that is, a a, aa aa, aaa aaa, and so
    on
    (a+) \1
    For Match Pattern "This function is similar to the Search and Replace
    Pattern VI. The Match Pattern function gives you fewer options for matching
    strings but performs more quickly than the Match Regular Expression
    function. For example, the Match Pattern function does not support the
    parenthesis or vertical bar (|) characters.
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    All uppercase and lowercase versions of volts, that is, VOLTS, Volts, volts, and so on
    [Vv][Oo][Ll][Tt][Ss]
    A space, a plus sign, or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [~0-9]+
    The word Level only if it begins
    at the offset position in the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The longest string within parentheses but not containing any
    parentheses within it
    ([~()]*)
    A left bracket
    A right bracket
    cat, dog, cot, dot, cog, and so on.
    [cd][ao][tg]
    Frustrating- but still managable.
    Jeff

  • Matches from regular expression into collection

    Hello,
    I need to do the following:
    I have a long string with some similar repeated data. I would like, using a regular expression, to extracts all matches in a collection. Is there a way of performing this task?
    I have look through the owa_pattern package, but as far as I found out, I can extract only a simple match. Here is an exact quote:
    "If multiple overlapping strings can match the regular expression, this function takes the longest match. " - http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/w_patt.htm
    So what can I do if I want to get all the matches?
    Thank you in anticipation. Any help would be appreciated.
    Best regards,
    beroetz

    I think your need a tokenizer-function.
    If the string +:in_str+ is delimited by +:in_delimiter+ you could try this:
    SELECT REGEXP_REPLACE(REGEXP_SUBSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ), :in_delimiter, '') TOKEN
    BULK COLLECT INTO :my_nested_table
    FROM DUAL
    CONNECT BY REGEXP_INSTR( :in_str || :in_delimiter, '(.*?)' || :in_delimiter, 1, LEVEL ) > 0
    ORDER BY LEVEL ASC;
    I wrote a string-to-textarray-tokenizer (and it's pendant) some times ago, being able to cut from certain positions within the string using regular expressions and return the elements into an nested table of varchar2. It looks like:
    TYPE pos_arraytype IS TABLE OF POSITIVE ;
    TYPE text_arraytype IS TABLE OF VARCHAR2(2000);
    FUNCTION stringToTextarray(in_str IN VARCHAR2, in_pos_arr IN pos_arraytype, in_regexp_arr IN text_arraytype DEFAULT NULL, in_trim_strings IN BOOLEAN DEFAULT TRUE)
    RETURN text_arraytype ;
    in_str is the string to be tokenized
    in_pos_arr is a table of positive values of positions in the string to be cut
    in_regexp_arr is a table of regular expressions to use at each position declared by in_pos_arr
    in_trim_strings is a flag, if the cutted element should be trimmed
    using above for example:
    in_str = 'Markus van Muster 347651234XY Musterdaam ABCDE'
    in_pos_arr = (1, 13, 35, 35, 42)
    in_regexp_arr = ('(.?){12}', '([^[:digit:]]?){22}', '[[:digit:]]{4}', '[[:alpha:]]{2}', '(.?){14}')
    in_trim_strings = TRUE
    RETURN collection ('Markus','van Muster','1234','XY','Musterdaam')
    If you need the code, then tell me! I'm looking for....
    Cheers,
    Martin
    Edited by: Nuerni on 17.10.2008 08:49

  • Reprasenting regular expression in XML

    hai,
    i have the following regular expression to reprasent 13digit number enclosed in a pair of brakets
    String r= "\\([0-9]{13,}+\\)"
    however when i reprasented in the XML tag as
    <my>
    <rs>\\([0-9]{13,}+\\)</rs>
    </my>
    and read using DOM parser,.unable to match the seqence is not matching at all.
    will anybody pls. tell me howto reprasent in String r in XML
    Thx

    If the content in the XML file is
    \\([0-9]{13,}+\\)
    to represent this in a Java String, you need to code
    String target = "\\\\([0-9]{13,}+\\\\)";
    The backslash (\) is an escape prefix in Java. So to get a single backslash, you need to specify 2 of them. Since you want two backslashes in the string, you need to specify 4 of them.
    Dave Patterson

  • Regular expressions and back references

    Just wanted to know if anyone else noticed that.
    In the javadoc of java.util.regex.Pattern in the "Back references" section it says that you need to use \n to match capturing group but it does not work. To match a capturing group one need to use a "$" sign which is not standard for this type of operation.
    For example, the following code should work according to the API and most other regular expression engines:
    Pattern.compile("([A-Z])").matcher("ThisIsATestString").replaceAll(" \1");
    But to make this work you need to use:
    Pattern.compile("([A-Z])").matcher("ThisIsATestString").replaceAll(" $1");
    So, is this just a doc bug or am I missing something?
    Someone have any idea why Sun choose to use the "$" sign instead of the regular "\" sign??
    TIA,
    Shaul

    The doc you're referring to is talking about using back-refereneces within the regex, not in the replacement string. For instance, if you wanted to find all instances of things like "foo-foo" or "bar-bar", you would use a Pattern like   Pattern p = Pattern.compile("([a-z]+)-\\1");For the most part, they've made the syntax the same a Perl's regexes, and that's why they use $n instead of \n in the replacement string. The replacement string is described in the Matcher javadoc.

  • Search for a regular expression in a 1D array?

    I've got a 1D array of strings, and I want to find the row with "Date: yy/mm/dd" in it. Since yy/mm/dd would not necessarily be the same between runs, I can't look for the row with "Date: yy/mm/dd".
    I tried using the Search 1D Array with "Date: " and "Date: *" as my element input, but it didn't find either of 'em.
    I don't know where in vi.lib the function would be in, otherwise I'd attempt to mod the function to take regular expressions, and my off-the-cuff search attempt (looping through the array & using Match Regular Expression) had some odd errors and still didn't find the row.
    Thanks!

    What do you define as a "row"? Is each row a single array element? Since your array elements are strings, each array element itself could be a multiline|multirow string itself that might need to be analyzed one row at a time.
    To look for patterns:
    If you have LabVIEW 8.0 or higher, you can use "Match regular expression". Else you can use "Match Pattern". It really depends how specific you need to be. Are there lines that start with "Date:" but don't contain a formatted date afterwards?
    To search for array elements starting with simple string "Date:", use "match first string".
    LabVIEW Champion . Do more with less code and in less time .

  • Exception while using Regular Expressions

    Hi, im using the following regular expression in my code for pattern matching.
    REGEXP_LIKE(LLT_NAME,''[^[:digit:]^[:alpha:]]'||l_temp_token||'[^[:digit:]^[:alpha:]]|^'||l_temp_token||'[^[:digit:]^[:alpha:]]|[^[:digit:]^[:alpha:]]'||l_temp_token||'$|^'||l_temp_token||'$'',''i'')
    here, l_temp_token is the string to be matched.
    The problem is when length(l_temp_token) is > 100, it is throwing the following Exception
    ORA-12733 regular expression too long
    But my requirement is, the length of input string can be upto 250
    Do anyone know what is the maximum size for regular expressions or is there any way to increase it?
    Thanks in Advance

    Could explain what rule you're trying to verify with this regular expression? Maybe there's an alternative.
    C.

  • Search for a regular expression in TextEdit's Find panel?

    Is it possible to search for a regular expression using TextEdit's Find panel? (This used to be possible in 'Step.)
    Help yields nothing, but perhaps there's some hidden technique?
    Thanks.

    What do you define as a "row"? Is each row a single array element? Since your array elements are strings, each array element itself could be a multiline|multirow string itself that might need to be analyzed one row at a time.
    To look for patterns:
    If you have LabVIEW 8.0 or higher, you can use "Match regular expression". Else you can use "Match Pattern". It really depends how specific you need to be. Are there lines that start with "Date:" but don't contain a formatted date afterwards?
    To search for array elements starting with simple string "Date:", use "match first string".
    LabVIEW Champion . Do more with less code and in less time .

  • WebTest - VS2013 - Extract Regular Expression - Required=False

    I have an "Extract Regular Expression" that may or may not be able to extract a value. I set the property value of Required to False. The next block checks for the existence of that Context Parameter. If the Context Parameter is there, it does
    some work, if the Context Parameter is not there, it skips the condition.
    When I run my WebTest, it tells me that it passes the extraction rule but it never created the Context Parameter so the condition is skipped.
    If I set Required=True, everything works as expected.
    I remembered that there was a defect in VS2010 with the Required=False, so I added this Extraction Plug-In 
    [DisplayName("Extract Regular Expression Not Required")]
    [Description("Extract text matching a regex and add it to the test context. If no match is found, do not fail webtest. This gets around a VSTS bug where 'Required' property of the Extract Regular Expression rule is not honored.")]
    public class ExtractRegularExpressionNotRequired : ExtractRegularExpression
    public override void Extract(object sender, ExtractionEventArgs e)
    base.Extract(sender, e);
    e.Success = true;
    That too passes the regular expression but fails to create the Context Parameter.

    By setting the Required parameter to False, I am expecting that a Context Parameter does not get created if the regular expression is not found and that is does get created if the regular expression is found.
    So when I run my test, it tells me that the regular expression was found but my Context Parameter was not created. 
    When I created this webtest in VS2010, I found that there was a bug with the Extract Regular Expression when the Required property was False. This
    thread showed the code for a custom Extraction rule to fix the problem. When I run this webtest in VS2013, both the built-in Extract Regular Expression and the custom Extract Regular Expression Not Required are not creating the Context Parameter if the
    regex is found.
    Try this yourself:
    Create a webtest
    Add "Extract Regular Expression"
    Set the Regular expression to . (this will match any one character)
    Set the Requiredto False
    Set the Context Parameter Name to TEST
    Run the webtest
    Although the result from the extraction rule will be Passed, there is no Context Parameter TEST created. 
    If you create the custom Extract Regular Expression Not Required plugin, the same problem occurs.

  • Regular Expression Escaped Digit "\d" Illegal Escape Character

    Hello,
    I'm trying to write a regular expression to determine if a String matches a date format that is defined as YYYYMMDD. For example, March 11, 2009 would be "20090311"
    For the time being I don't care if an invalid month or day is entered. I've attempted both of the following
    if (date.matches("(19|20)\d{4}")) {
      // warn the user
    }and
    if (java.util.regex.Pattern.matches("(19|20)\d{4}"), date)) {
      // warn the user
    }And both yield Illegal Escape Character compilation errors, for the "\d" part of the regular expression.
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#sum
    Says that "\d" is the predefined digit character class. So at this point, I don't know what I'm doing wrong. I realize I could just define the character class myself, and use a pattern like "(19|20)[0-9]{4}", but I would like to know why "\d" isn't being recognized by the compiler.
    Thanks,
    Paul

    paulwooten wrote:
    Can someone give me an explanation of heuristics, as they might apply to SimpleDateFormat? Does this mean that if the format was similar the parser might figure it out? Say, if instead of "yyyyMMdd", it was "yyyyddMM", or "yyMMdd"?No. Since all of these are valid formats, there's no way for the parser to distinguish this.
    Or does this have to do with rejecting February 29, and other dates like that.That's the one. When setLenient(false) is called, then the 29th February is only accepted in leap years.
    It will also reject the 57th January when lenient is set to false (try parsing that with lenient=true, you'll be surprised).
    I've read some of the wikipedia article about heuristics, but I'm confused as to how it would apply to this example.Don't concentrate to much on the term heuristics. Just remember: lenient=true means that not-really-correct dates will be accepted, lenient=false means more strict checks.

Maybe you are looking for