Regular express excludes an integer

Does some one know if there is a simple regular expression pattern which can be used in an XML schema as a restriction to exclude a few integers from the entire integer set?
For example, if I want to use the schema to validate an xml document which has an element called 'playerId' and its value can be any integers BUT 1000, the schema segment for the validation could be like:
<xsd:restriction base="xsd:integer">
<xsd:pattern value="<<pattern string>>"/>
</xsd:restriction>
what <<pattern string>> can I use to validate the value IS NOT 1000?
I tried a few such as ((\d*)-(1000)), [^(1000)], \d*[^(1000)], none worked. Any help will be greatly appreciated.

Why don't you derive from an integer type instead of a string? I know this seems ridiculously verbose for such a simple restriction, but it should do what you want:<xsd:attribute name="root">
  <xsd:simpleType>
    <xsd:union>
      <xsd:simpleType>
        <xsd:restriction base="xsd:nonNegativeInteger">
          <xsd:maxInclusive value="999"/>
        </xsd:restriction>
      </xsd:simpleType>
      <xsd:simpleType>
        <xsd:restriction base="xsd:positiveInteger">
          <xsd:minInclusive value="1001"/>
        </xsd:restriction>
      </xsd:simpleType>
    </xsd:union>
  </xsd:simpleType>
</xsd:attribute>

Similar Messages

  • Regular Expression Exclude Certain Characters

    I am building this with Apex 3.2
    I have a validation on a page item of type regular expression
    Validation Expression 1 :   P13_TARGET_BMI
    Validation Expression 2:    ^[[:alnum:]-]*$
    This allows all characters, number and the - symbol
    Now I need to exclude certain alpha characters, ie German letters Ä Ü Ö
    and spaces if possible
    Basically if the user types in Ä, Ü, Ö, the validation should fail
    Cheers
    Gus

    Hi, Gus,
    The most efficient thing would be to list all the acceptable characters instead of using [:alnum:].
    REGEXP_LIKE ( str
                , '^[a-z0-9A-Záéíóúæ... -]+$'
    If you can't do that, then use a separate test to check for the exceptional alphabetic characters
         REGEXP_LIKE ( str
                     , '^[[:alnum:] -]+$'
    AND  str = TRANSLATE ( str
                         , 'xÄÜÖ'
                         ,'x'
    You can use regular expressions for the Ä Ü Ö testing, if you want to.
    There's nothing tricky about spaces; by default, they have no special meaning in regular expressions, and they never have any special meaning inside square brackets.

  • Regular Expressions, include and exclude xml-files

    Hi!
    I have a javaprogram that should read xml-files from a directory. The program could contain several types of files but it should only read files with a certain pattern.
    The file names will look like this:
    "resultset_27_23.xml"
    where the numbers will change, but the rest of the file name is the same (resultset_XX_XX.xml).
    But in the same directory it will also be files with the following pattern:
    "resultset_27_23_attachment1.xml"
    Here, the numbers could change in the in the same way as the files above, and the number after the text (attachment) could also differ from file to file.Those files should not be read by the program.
    I have tried to write a regular expression pattern that only reads the first file types, and exlcudes the other ones, but it won�t work.
    Does anyone have a solution to my problem? It is possible to use either just one pattern, or two patterns; one for the files that should be included, and one for the files that should be excluded.

    So you only want files that match resultset_XX_XX.xml? Will the numbers always be two digits each? Assuming so:
    "^resultset_\\d\\d_\\d\\d.xml$"Depending which methods you use, the ^ and $ may or may not be necessary.
    If that's not what you meant, please clarify.

  • Regular expression for excluding something

    Hi,
    I am using Java to implement a parser and I got a problem with a regular expression.
    I want to fetch some file names such as "blabla.txt", "blabla.doc" but not "blabla.jpg".
    I started with
    Now my problem is how to filter out "blabla.jpg".
    Any ideas are welcome.
    Pengyou

    pengyou wrote:
    I did not test it in Java (I have no my java tester right now) but I belive that if it is fine then it should be fine with the on-line tester: http://www.regular-expressions.info/javascriptexample.html
    Could you also test against that tester?No! Why should I? This is a Java forum and I assume you will be using it with Java so why should I waste time testing it outside Java.
    Edited by: sabre150 on Dec 16, 2009 1:06 PM
    I took pity on you and tested in on that site. It works as expected and excludes file names ending in jpg.

  • Unix Log Monitoring regular expression not picking up alerts

    Hi,
    We are moving our unix monitoring to SCOM 2012 SP1 rollup 4.
    What I have got working is indvidual alert logging of Unix Log alerts by exporting the MP and changing the <IndividualAlerts> value to true and removing the suppression xml section then reimporting the MP.
    What I am trying to do is use the regular expression to peform the suppression of specific event (such as event codes).
    The expression is:
    ((?i:warning)(?!(.*1222)|(.*1001)))
    ie Search the log for "warning" (not case sensitive) then check if events 1222 or 1001 exist if so return no match, if they dont exist then return true. 
    I use the built in test function in SCOM when creating the rule and the tests come back as expected but when I inject test lines into the unix log, no alerts get generted.
    I suspect it could be the syntax not being accepted on the system (its running RedHat 6 )
    I have tested this with regex tools and works.
    When I try and test it on the server i get:
    [root@bld02 ~]# grep ((?i:Warning)(?!(.*1222)|(.*1001))) /var/log/messages
    -bash: !: event not found
    [root@bld02 ~]# tail /var/log/messages
    Nov 13 15:07:26 bld02 root: SCOM Test Warning Event ID 1001 Round 18
    Nov 13 15:07:29 bld02 root: SCOM Test Warning Event ID 1000 Round 18
    Nov 13 15:07:35 bld02 root: SCOM Test Warning Event ID 1002 Round 18
    So I am expecting 2 alerts to be generated.
    SCOM tests to show expression working:
    Test 1 Matching
    Test 2 to exclude
    Need some help with this, Thankyou in advance :)

    Hello,
    Here's an example of modifying the MP to exclude particular events.  Firstly, I created a log file rule using the MP template that is fairly inclusive - matching the string Warning (with either a lower or upper case W).
    I then exported the MP, and modified the rule.  I set the IndividualAlerts = true and removed the AlertSuppression element, so that every matched line will fire a unique alert.  You don't have to remove the AlertSuppression, but you should use
    Individual alerts so that the exclusion logic doesn't exclude concurrent events that you actually want to match.
    Implementing the exclusion logic involves the addition of a System.ExpressionFilter definition in the rule. This will use a conditional evaluation of the //row element of the data item.  Here's an example of a dataitem matching an individual row:
    <DataItem type="System.Event.Data"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
    <EventOriginId>{86AB962D-2F44-29FD-A909-B99FF6FEB2C5}</EventOriginId>
    <PublisherId>{EC7EA4B1-0EA5-7E8E-701F-82FEF3367BC4}</PublisherId>
    <PublisherName>WSManEventProvider</PublisherName>
    <EventSourceName>WSManEventProvider</EventSourceName>
    <Channel>WSManEventProvider</Channel>
    <LoggingComputer/>
    <EventNumber>0</EventNumber>
    <EventCategory>3</EventCategory>
    <EventLevel>0</EventLevel>
    <UserName/>
    <RawDescription>Detected Entry: warning 1002</RawDescription>
    <CollectDescription Type="Boolean">true</CollectDescription>
    <EventData>
    <DataItem type="SCXLogProviderDataSourceData"time="2013-11-15T10:33:14.8839662-08:00"sourceHealthServiceId="667FF365-70DD-6607-5B66-F9F95253B29F">
    <SCXLogProviderDataSourceData>
    <row>warning 1002</row>
    </SCXLogProviderDataSourceData>
    </DataItem>
    </EventData>
    <EventDisplayNumber>0</EventDisplayNumber>
    <EventDescription>Detected Entry: warning 1002</EventDescription>
    </DataItem>
    Here is the rule in the MP XML.  The <ConditionDetection>...</ConditionDetection> content was what I added to do the exclusion filtering:
    <Rule ID="LogFileTemplate_66b86eaded094c309ffd2631b8367a32.Alert" Enabled="false" Target="Unix!Microsoft.Unix.Computer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
    <Category>EventCollection</Category>
    <DataSources>
    <DataSource ID="EventDS" TypeID="Unix!Microsoft.Unix.SCXLog.VarPriv.DataSource">
    <Host>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</Host>
    <LogFile>/tmp/test</LogFile>
    <UserName>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/UserName$</UserName>
    <Password>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/Password$</Password>
    <RegExpFilter>warning</RegExpFilter>
    <IndividualAlerts>true</IndividualAlerts>
    </DataSource>
    </DataSources>
    <ConditionDetection TypeID="System!System.ExpressionFilter" ID="Filter">
    <Expression>
    <RegExExpression>
    <ValueExpression>
    <XPathQuery Type="String">//row</XPathQuery>
    </ValueExpression>
    <Operator>DoesNotContainSubstring</Operator>
    <Pattern>1001</Pattern>
    </RegExExpression>
    </Expression>
    </ConditionDetection>
    <WriteActions>
    <WriteAction ID="GenerateAlert" TypeID="Health!System.Health.GenerateAlert">
    <Priority>1</Priority>
    <Severity>2</Severity>
    <AlertName>Log File Alert: ExclusionExample</AlertName>
    <AlertDescription>$Data/EventDescription$</AlertDescription>
    </WriteAction>
    </WriteActions>
    </Rule>
    I traced this with the Workflow Analyzer as I tested, which shows the logic being applied.  Here is the exclusion happening:
    Here's more info on the definition of an ExpressionFilter:
    http://msdn.microsoft.com/en-us/library/ee692979.aspx
    And more information on Regular Expressions in MPs:
    http://support.microsoft.com/kb/2702651/en-us
    You can also have multiple Expressions in the ExpressionFilter joined by OR or AND operators.
    Also, if you are comfortable with the MP authoring, you can just skip the step of creating the rules in the MP template and just author your own MP with the VSAE tool:
    http://social.technet.microsoft.com/wiki/contents/articles/18085.scom-2012-authoring-unixlinux-log-file-monitoring-rules.aspx
    www.operatingquadrant.com

  • Introduction to regular expressions ...

    I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
    Introduction
    Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
    Having fun with regular expressions - Part 1
    Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
    Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
    If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
    Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
    Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
    WITH t AS (SELECT '456' col1
                 FROM dual
                UNION
               SELECT '123x'
                 FROM dual
                UNION  
               SELECT 'x123'
                 FROM dual
                UNION 
               SELECT 'y'
                 FROM dual
                UNION 
               SELECT '+789'
                 FROM dual
                UNION 
               SELECT '-789'
                 FROM dual
                UNION 
               SELECT '159-'
                 FROM dual
                UNION 
               SELECT '-1-'
                 FROM dual
    SELECT t.col1
      FROM t
    WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
    ;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
    To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
    I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
    Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
    Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
    So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
    Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
    Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
    Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
    Wait a minute, what happened to the row with the column value "-1-"?
    You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
    Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
    Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
    Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
    Compare your solution to this one:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
    Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
    I'll just use another SELECT to show what I want to do:
    WITH t AS (SELECT '0' col1
                 FROM dual
                UNION
               SELECT '0.' 
                 FROM dual
                UNION
               SELECT '.0' 
                 FROM dual
                UNION
               SELECT '0.0' 
                 FROM dual
                UNION
               SELECT '-1.0' 
                 FROM dual
                UNION
               SELECT '.1-' 
                 FROM dual
                UNION
               SELECT '.' 
                 FROM dual
                UNION
               SELECT '-1.1-' 
                 FROM dual
    SELECT t.*
      FROM t
    ;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
    NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
    This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
    All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
    Does your solution look a bit like this?
    WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                               '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                           '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                          )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
    Continued in Introduction to regular expressions ... continued.
    C.
    Fixed some embarrassing typos ... and mistakes.
    cd

    Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

  • Regular expressions in Format Definition add-on

    Hello experts,
    I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
    Summary of my problem:
    In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
    :61:071222D208,00N026
    :86:P  12345678BELASTINGDIENST       F8R03782497                $GH
    $0000009                         BETALINGSKENM. 123456789123456
    0 1234567891234560                                            
    :61:071225C758,70N078
    :86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
    CITY 48772-54314                                                  
    :61:071225C425,05N078
    :86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
    LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    
    :61:071225C850,00N078
    :86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
    DERNR. 53846 REF. MAIL 21-02
    - I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
    Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
    - a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
    - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
    - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
    I am looking forward to the right solutions, I can give more info if you need any.

    Hello Hendri,
    Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
    Answer: Format Definition uses .Net regular expression.
    You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
    Example 6
    Description:
    To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
    Regular expression:
    (?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
    Text:
    :61:0711211121C216,08N051NONREF
    Matches:
    1
    Tips:
    1.     All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D  (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
    Example 7
    Description:
    Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
    Regular expression:
    (?<=\b)(?<!\.)\d+(?=\b)(?!\.)
    Text:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    Matches:
    6
    Tips:
    1.     The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
    2.     It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    You may lose some real values like u201C3187u201D before the u201C-u201D.
    Example 8
    Description:
    Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
    Regular expression:
    (?<=^(P|0))\d
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
    2.     u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
    :86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
    The regular expression becomes
    (?<=:86:(P|0))\d
    Example 9
    Description:
    Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
    Regular expression:
    (?<=^(P|0)\d)[a-zA-Z. ]*
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     In this case, put BP account number regular expression into the look behind assertion.
    Example 10
    Description:
    Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc.  A possible document identification sub-record is made up of the following parts:
    u2022     keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
    u2022     an optional group made up of following:
         a separator of either a dot, hyphen or slash, and
         an optional space, and
         an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
         an optional space
    u2022     and finally document identification in digits
    Regular expression:
    (?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
    Kind Regards
    -Yatsea

  • Problems with java regular expressions

    Hi everybody,
    Could someone please help me sort out an issue with Java regular expressions? I have been using regular expressions in Python for years and I cannot figure out how to do what I am trying to do in Java.
    For example, I have this code in java:
    import java.util.regex.*;
    String text = "abc";
              Pattern p = Pattern.compile("(a)b(c)");
              Matcher m = p.matcher(text);
    if (m.matches())
                   int count = m.groupCount();
                   System.out.println("Groups found " + String.valueOf(count) );
                   for (int i = 0; i < count; i++)
                        System.out.println("group " + String.valueOf(i) + " " + m.group(i));
    My expectation is that group 0 would capture "abc", group 1 - "a" and group 2 - "c". Yet, I I get this:
    Groups found 2
    group 0 abc
    group 1 a
    I have tried other patterns and input text but the issue remains the same: no matter what, I cannot capture any paranthesized expression found in the pattern except for the first one. I tried the same example with Jakarta Regexp 1.5 and that works without any problems, I get what I expect.
    I am using Java 1.5.0 on Mac OS X 10.4.
    Thank to all who can help.

    paulcw wrote:
    If the group count is X, then there are X plus one groups to go through: 0 for the whole match, then 1 through X for the individual groups.It does seem confusing that the designers chose to exclude the zero-group from group count, but the documentation is clear.
    Matcher.groupCount():
    Group zero denotes the entire pattern by convention. It is not included in this count.

  • Regular expressions, using pattern and matcher but not include the pattern

    Hi all
    i have a regular expression but in my matcher it is including the text that is in my regular expression.
    ie
    String str="-------------------stuff===========";
    Pattern mainBody = Pattern.compile("-----(.*?)=====", Pattern.MULTILINE);this matches -------------------stuff=====
    now i expect to get some - in there as i match from the start, but i dont want to have the = in the match. how do i do a match that excludes the matching expressions.

    nevermind, figured it out
    when i do myMatch.group(1) it gives me just my match
    sorry for wasting time :)

  • Pattern matching regular expressions

    I'm attempting to determine if a string matches a pattern of containing less than 100 alphanumeric characters a-z or 0-9 case insensitive. So my regular expression string looks like:
    "^[a-zA-Z0-9]{0,100}$"And I use something like...
    Pattern pattern = Pattern.compile( regexString );I'd like to modify my regex string to include the email 'at' symbol "@". So that the at symbol will be allowed. But my understanding of regex is very limited. How do I include an "or at symbol" in my regex expression?
    Thanks for your help.

    * Code by sabre150
    private static final Pattern emailMatcher;
        static
            // Build up the regular expression according to RFC821
            // http://www.ietf.org/rfc/rfc0821.txt
            // <x> ::= any one of the 128 ASCII characters (no exceptions)
            String x_ = "\u0000-\u007f";
            // <special> ::= "<" | ">" | "(" | ")" | "[" | "]" | "\" | "."
            //              | "," | ";" | ":" | "@"  """ | the control
            //              characters (ASCII codes 0 through 31 inclusive and
            //              127)
            String special_ = "<>()\\[\\]\\\\\\.,;:@\"\u0000-\u001f\u007f";
            // <c> ::= any one of the 128 ASCII characters, but not any
            //             <special> or <SP>
            String c_ = "[" + x_ + "&&" + "[^" + special_ + "]&&[^ ]]";
            // <char> ::= <c> | "\" <x>
            String char_ = "(?:" + c_ + "|\\\\[" + x_ + "])";
            // <string> ::= <char> | <char> <string>
            String string_ = char_ + "+";
            // <dot-string> ::= <string> | <string> "." <dot-string>
            String dot_string_ = string_ + "(?:\\." + string_ + ")*";
            // <q> ::= any one of the 128 ASCII characters except <CR>,
            //               <LF>, quote ("), or backslash (\)
            String q_ = "["+x_+"$$[^\r\n\"\\\\]]";
            // <qtext> ::=  "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext>
            String qtext_ = "(?:\\\\[" + x_ + "]|" + q_ + ")+";
            // <quoted-string> ::=  """ <qtext> """
            String quoted_string_ = "\"" + qtext_ + "\"";
            // <local-part> ::= <dot-string> | <quoted-string>
            String local_part_ = "(?:(?:" + dot_string_ + ")|(?:" + quoted_string_ + "))";
            // <a> ::= any one of the 52 alphabetic characters A through Z
            //              in upper case and a through z in lower case
            String a_ = "[a-zA-Z]";
            // <d> ::= any one of the ten digits 0 through 9
            String d_ = "[0-9]";
            // <let-dig> ::= <a> | <d>
            String let_dig_ = "[" + a_ + d_ + "]";
            // <let-dig-hyp> ::= <a> | <d> | "-"
            String let_dig_hyp_ = "[-" + a_ + d_ + "]";
            // <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
            // String ldh_str_ = let_dig_hyp_ + "+";
            // RFC821 looks wrong since the production "<name> ::= <a> <ldh-str> <let-dig>"
            // forces a name to have at least 3 characters and country codes such as
            // uk,ca etc would be illegal! I shall change this to make the
            // second term of <name> optional by make a zero length ldh-str allowable.
            String ldh_str_ = let_dig_hyp_ + "*";
            // <name> ::= <a> <ldh-str> <let-dig>
            String name_ = "(?:" + a_ + ldh_str_ + let_dig_ + ")";
            // <number> ::= <d> | <d> <number>
            String number_ = d_ + "+";
            // <snum> ::= one, two, or three digits representing a decimal
            //              integer value in the range 0 through 255
            String snum_ = "(?:[01]?[0-9]{2}|2[0-4][0-9]|25[0-5])";
            // <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum>
            String dotnum_ = snum_ + "(?:\\." + snum_ + "){3}"; // + Dotted quad
            // <element> ::= <name> | "#" <number> | "[" <dotnum> "]"
            String element_ = "(?:" + name_ + "|#" + number_ + "|\\[" + dotnum_ + "\\])";
            // <domain> ::=  <element> | <element> "." <domain>
            String domain_ = element_ + "(?:\\." + element_ + ")*";
            // <mailbox> ::= <local-part> "@" <domain>
            String mailbox_ = local_part_ + "@" + domain_;
            emailMatcher = Pattern.compile(mailbox_);
            System.out.println("Email address regex = " + emailMatcher);
        }Wow. Sheesh, sabre150 that's pretty impressive. I like it for two reasons. First it avoids some false negatives that I would have gotten using the regex I mentioned. Like, [email protected] is a valid email address which my regex pattern has rejected and yours accepts. It's unusual but it's valid. And second I like the way you have compartmentalized each rule so that changes, if any custom changes are desired, are easier to make. Like if I want to specifically aim for a particular domain for whatever reason. And you've commented it so that it is easier to read, for someone like myself who knows almost nothing about regex.
    Thanks, Good stuff!

  • FM9 SDL Authoring Assitant Regular Expression Syntax?

    I'm trying to trick SDL into identifying words that are not approved by STE.
    Under "Configure|Style and Linguistic Checks|User Defined Rules" the program allows regular expressions to create custom rules.
    I have all other options in the Utility unchecked.
    I am by no means a pro at regular expressions but was able to create a pretty solid command at http://regexlib.com/RETester.aspx.
    The idea is to create an expression that looks for any word other than those seperated by vertical bars.
    For the test text "this is not the way that should work. this is not the way that should work."
    \b(?:(?!should|not|way|this|is|that).)+
    returns: the work the work
    At that website, I can change the excluded words and it works every time. Change the test text, same thing, still works.
    Perfect! I ripped every approved word in STE into the formula and it (SDL) only returns words at the end of the sentence that are followed by a periods and question marks. So I added"\." to the exclusion list in the expression and it only found words next to question marks. I excluded question marks and now it finds nothing. I don't understand this as I wasn't aware that I had any criteria in the expression that dictates functionality only at the end of the sentence.
    I have an O'reilly book to refer to, if anyone can give me a shove in the right direction as to which set of rules to adhere to, I would appreciate it. Why did negative word matching have to be my introduction to this subject?

    I tried your expression in a couple of regex tools and it seems to parse as you wanted it to. I suspect that the SDL implementation doesn't follow the unix/linux standards. I haven't used the tool and the usage documentation is non-existant, except for the limited flash-based demo.
    From the SDL knowledgebase, it states that their regex filter uses the .NET regex flavour and I believe that the differences on this are explained in the "Mastering Regular Expressions" book.

  • Checking a number sequence with regular expressions

    Hello,
    Suppose I have a text in the pattern:
    A1=ha,A2=bla,A3=cha,...
    I don't know how many sections of "A#=$" (# denotes number, $ denotes text) will be in the text, but I want to verify that the numbers of the A's form the natural ascending number sequence (i.e 1,2,3,...). I prefer to use regular expressions to do this, but if there's another way, I will be glad to hear it too.
    Therefore my question is: How can I use regular expressions to check for a sequence of numbers? I know I can search for groups I've caught previously in the expression, but how can I compute the next number in the sequence from the group and search for the result?
    Thank you very much!

    What I'd do--and I'm not saying this is optimal, just what pops immediately to mind--is have a regex that matches "A(\\d+)=" (assuming the ha, bla, cha can never be "A1" etc.--if they can, you can still do it, but it's more complicated), then you iterate with the Matcher, and each time, you get the Integer.valueOf what you matched. You keep track of the last value, and compare the current to the last. If current is < last (or <= last, depending on your requirements), fail.
    Something like this. I don't recall Matcher's methods off the top of my head, so you'll have to fix the details.
    Matcher m = Pattern.matcher("A(\\d+)=");
    int maxSoFar = Integer.MIN_VALUE;
    while (m.matches(input)) {
        int current = Integer.parseInt(m.getMatchedField("$1"));
        if (current <= maxSoFar) {
            // fail
        else {
            maxSoFar = current;
    } maxSoFar =

  • Finding URLs using regular expression.

    I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
    "Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
    I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
    For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
    My acutal problem is to find the URL irrespective its position within the string.
    Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
    Matcher matcher = urlPattern.matcher(plainText);
    Map stringIndexMap = new HashMap();
    //Searching the input string for urlPattern...
    while(matcher.find()) {
    String urlString = matcher.group();
    //Storing the urls in a hashmap with their indices as keys....
    stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
    Set keySet = stringIndexMap.keySet();
    Iterator it = keySet.iterator();
    //Iterating over the hashmap containing urls...
    while(it.hasNext()) {
    String urlString = (String) stringIndexMap.get(it.next());
    * Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
    * using String index
    clickableURLString.replace(clickableURLString.indexOf(urlString),
    clickableURLString.indexOf(urlString) + urlString.length(),
    "<a href=\"#\" onclick=\"window.open('" + urlString
    + "')\">" + urlString + "</a>");
    return clickableURLString.toString();

    The end of the input is '$' as a regex.
    import java.util.regex.*;
    public class Prasanna{
      public static void main(String[] args){
        String text
    = "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
    //    String regex = "(http|https)://.+?(?:\\s|$)"; // this works
        String regex = "(http|https)://[^ ]+";          // this also works
        Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        Matcher mat = pat.matcher(text);
        while (mat.find()){
          System.out.println(mat.group());
    }

  • CharAt in a keystroke event using regular expression

    I'm working on a canadian postal code field.  I'm using a regular expression to validate the value.
    //accepts "A9A 9A9", "A9A9A9" or "" as valid entries excluding letters D, F, I, O, Q, U and W as well as Z as the first digit.
    var re = /^[ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ]( )?\d[ABCEGHJKLMNPRSTVWXYZ]\d$|^$/i;
    if(re.test(event.value) == false){
    app.alert("Code postal invalide", 0, 0,"CODE POSTAL");
    event.value = "";
    I also added a keystroke event to prescan every digit and toUpperCase() them.
    //accepts letters, numbers, backspace or spaces
    var re = /[ABCEGHJKLMNPRSTVWXYZ]|\d|^$|\s/i;
    if (!event.willCommit){
    if(re.test(event.change) == false){
      event.rc = false;
    else{
      event.change = event.change.toUpperCase();
    I want to Replicate the behavior of an arbitrary mask where I can force a character to appear at a specific position.  In this example, instead of letting the user decide wether he adds a "space" or not between the two parts of the code, I want to:
    1-Remove "\s" in the keystroke regexp so the user cannot use "space";
    2-Replace "( )?" for "( )" in the validation regexp;
    3-Force a space at charAt(3).
    Is it possible to have it appear while the user is still typing or is my only option to change the value once committed or as a cutom format script?
    In the same manner, is it possible to specify a different regexp keystroke depending on the position of each character?
    charAt(0), charAt(2), charAt(5) would be letters
    charAt(1), charAt(6), charAt(7), would be numbers
    charAt(3), would automatically add a "space"
    in the end, I want the same behavior as an arbitrary mask without the annoying alert when entering a wrong digit.

    The entire string validation is different from the keystroke validation. The keystroke validation can check each keystroke as it is typed while the string validation can only process the entire string at one time.
    You need to break your RegExp to describe the requirement for each character. You can even specify the possible number of repetitions of the pattern.
    A discussion on to create keystroke validations would be far too long for a post in a forum.
    Acuumnen has an ebook about Acrobat forms and has a couple of chapters to some simple validation and formatting scripts.

  • Search  application that can handle regular expressions

    I am desperately seeking for my PhD an OSX 10.6 application that can search through all my data. That application need to have *efficient search algorithms* for complex pattern searching.
    For example I want to search all my documents having the word cancer inside the file.
    Not only search for the filename cancer BUT search inside the documents with all the extensions (pdf, rtf, doc, etc.)
    In Windows I use +Filelocator Pro+.
    Is there a Mac OSX application like +Filelocator Pro+ for search?
    An application that can handle regular expression support, with any of the following options:
    +Export results to Text, command line options, Network drive searching, Boolean searches (e.g. AND, OR, and NOT), Perl compatible regexp option, Built in file viewer, Word, Excel and PDF searching, Open Office, Word Perfect option using IFilters, Unicode support, support for: ZIP, RAR, CAB, 7-Zip, ARJ, Bzip, CHM, CPIO, DEB, DMG, GZIP, HFS, ISO, LZH, MSI, NSIS, RPM, TAR, UDF, WIM, XAR, Z formats, Active Scripting support, Export as Text, CSV, XML, HTML, or XSLT custom format, File attribute searching , Relative date/time searches, Repositionable contents pane, Search within search, Exclude folders list,+ etc

    You might try the freeware EasyFind. It allows boolean and wildcard searches. How many of the other features in your "wish list" it offers I haven't checked. If EasyFind doesn't offer sufficient power, take a look at FoxTrot Personal Search or FoxTrot Professional Search.
    And of course there's always grep which can be incredibly powerful once you learn all its ins and outs.
    Regards.

Maybe you are looking for

  • When I open a new tab, I want to see a thumbnail view of few recent pages. How to set that?

    My friend uses Google Chrome browser, and when he openes a new tab he can see on it a thumbnail view of 6 or 8 recent visited pages. Vith a simple click on each of them he can revisit that pages. I do not want to change my default browser (Firefox) b

  • HT4865 how i got my contact list back to my Iphone from Icloud?

    This morning I make a misstake to turned of the contact list from my Iphone. then all contact list are gone????

  • Need help in Customizing Workspace

    Hello All, I am customizing the flex based ES4 workspace. I was able to follow all the instructions in the Adobe provided document for Customization. I am using Flash Builder 4.5 and the SDK which comes with the Livecycle ES4 installation. I was able

  • Pages not cropping or rotating in Acrobat Professional 8.3.1

    I am using Acrobat Professional 8.3.1 on Mac 10.5.6. When cropping or rotating pages in Press Quality PDFs I have created in Distiller, the operation is sometimes ignored or says can't find pages when all is selected, choosing the option a number of

  • WRT54GC v2.0 - UDP Traffic issue

    Hi, i've bought last week the wireless router WRT54GC v2.0 and it works good for normal navigation, port forwarding, wireless signal, ecc... I've only a problem, a big problem, it can't handle high UDP traffic, this issue can be experienced, e.g. wit