Ignoring whitespace in regular expression

Dear all,
could anyone advice me how Oracle evaluate the following query:
SQL> select regexp_substr('Baba black','b[a-z]+ [a-z]+',1,1,'x') from dual;
REGEX
blackI thought it would return 'ba', but instead it returns 'black'. How come 'ba' in the first word isn't detected by the pattern?
Best regards,
Val

Hi, Val,
Valerie Debonair wrote:
Dear all,
could anyone advice me how Oracle evaluate the following query:
SQL> select regexp_substr('Baba black','b[a-z]+ [a-z]+',1,1,'x') from dual;
REGEX
blackI thought it would return 'ba', but instead it returns 'black'. How come 'ba' in the first word isn't detected by the pattern?Aside from case-sensitivity, that expression would never return a 2-character string, because you're looking for at least 3 characacters:
'b[a-z]+ [a-z]+'
1 2      3(1) the letter b
(2) any letter(s)
(3) any letter(s) again
If you want Non-Greedy searching (where the + sign matches as little as possible, not as much as possible) us +? instead of +
SELECT      REGEXP_SUBSTR ( 'Baba black'
                , 'b[a-z]+? [a-z]+?'
                , 1
                , 1
                , 'xi'
FROM      dual;Ouptut:
REG
BabYou could get the same results by explicitly saying you want exactly 2 characters after the 'b', like this:
SELECT      REGEXP_SUBSTR ( 'Baba black'
                , 'b[a-z]{2}'
                , 1
                , 1
                , 'xi'
FROM      dual;

Similar Messages

  • What's wrong with the regular expression?

    Hi all,
    For the life of me I can not figure out what is wrong with this regular expression
    .*\bA specific phrase\.\b.*
    This is just an example the actual phrase can be an specific phrase. My problem comes when the specific phrase ends in a period. I've escaped the period but it still gives me an error. The only time I don't get an error is when I take off the end boundry character which will not suffice as a solution. I need to be able to capture all the text before and after said phrase. If the phrase doesn't have a period it would look like this...
    .*\bA specific phrase\b.*
    which works fine. So what is it about the \.\b combination that is not matching?
    I've been banging my head on this for a while and I'm getting nowhere.
    The application highlights text that comes from a server. The user builds custom highlights that have some options. Highlight entire line, match partial word, and ignore case. The code that builds my pattern is here
    String strHighlight = _strHighlight;
            strHighlight = strHighlight.replaceAll("\\*", "\\\\*");
            strHighlight = strHighlight.replaceAll("\\.", "\\\\.");
            String strPattern = strHighlight;
            if(_bEntireParagraph)
                if(_bPartialWord)
                    strPattern = ".*" + strHighlight + ".*";
                else               
                    strPattern = ".*\\b" + strHighlight + "\\b.*";           
            else
                if(_bPartialWord)
                    strPattern = strHighlight;
                else               
                    strPattern = "\\b" + strHighlight + "\\b";  
            if(_bIgnoreCase)
                _patHighlight = Pattern.compile(strPattern, Pattern.CASE_INSENSITIVE);
            else
                _patHighlight = Pattern.compile(strPattern);So for example I matching the phrase: The dog ate the cat. And that phrase came over in the following text: Look there's a dog. The dog ate the cat. "Oh my!"
    And my user has the entire line and ignore case options selected then my regex woud look like this: .*\bThe dog ate the cat\b.*
    That should get highlighted, but for some reason it doesn't. Correct me if I'm wrong but doesn't the regex read as follows:
    any characters
    word boundry
    The dog ate the cat[period]
    word boundry
    any characters until newline.
    Any help will be much appreciated

    A word boundary (in the context of regexes) is a position that is either followed by a word character and not preceded by one (start of word) or preceded by a word character and not followed by one (end of word). A word character is defined as a letter, a digit, or an underscore. Since a period is not a word character, the only way the position following it could be a word boundary is if the next character is a letter, digit or underscore. But a sentence-ending period is always followed by whitespace, if anything, so it makes no sense to look for a word boundary there. I think, instead of \b, you should use negative lookarounds, like so:   strPattern = ".*(?<!\\w)" + strHighlight + "(?!\\w).*";

  • Introduction to regular expressions ... last part.

    Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
    Having fun with regular expressions - Part 3
    In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
    ;I can even use strings composed of values like 'abc, xyz ,  012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
    Ready to try your own solution?
    Does it look like this?
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz ,  012', ' *, *', '|') || ')$')
    ;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
    ; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
    SELECT data
      FROM (SELECT data, LOWER(DATA) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search LIKE 'abc%'
        OR search LIKE 'xyz%'
        OR search LIKE '012%'
    SELECT data
      FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search IN ('abc', 'xyz', '012')
    ;  I'll leave it to your imagination how a complete non regular example with 'abc, xyz ,  012' as search condition would look like.
    As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
    WITH t AS (SELECT '123-4567' phone
                 FROM dual
                UNION
               SELECT '01 345678'
                 FROM dual
                UNION
               SELECT '7 87 8787'
                 FROM dual
    SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
      FROM t
    ;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
    Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
    First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
    Single digit values: 0-9
    Double digit values: 00-99
    Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
    So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
    SELECT data
      FROM TABLE(regex_utils.gen_data('0123456789', 3))
    WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$') 
    ; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
    Using first my brute force generator, I'll check if I've missed any possible combination:
    SELECT data
      FROM TABLE(regex_utils.gen_data('03.', 15))
    WHERE REGEXP_LIKE(data,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  Looks good to me. Let's check on some sample data:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
                UNION 
               SELECT '256.128.64.32'
                 FROM dual            
    SELECT t.ip
      FROM t WHERE REGEXP_LIKE(t.ip,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
    Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
      FROM t
    ;  Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                                '[0-9]*([0-9]{3})(\.?)', '\1\2'
      FROM t
    ;  Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
    Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
    "[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
    Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
    Here's mine:
    SELECT data
      FROM TABLE(regex_utils.gen_data('a1@.', 9))
    WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
    How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
    Does it look like this?
    WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
                 FROM dual
                UNION
               SELECT 'http://x/'
                 FROM dual
    SELECT t.URL
      FROM t
    WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
    Update: Improvements in 10g2
    All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
    Rewriting my example from the first lesson, the WHERE clause would look like this:
    WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
    '^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
    Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
      FROM dual
      ;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
    Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
      FROM dual
      ;This is old style, "greedy" search pattern, returning everything until the last space.
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
      FROM dual
      ;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
      FROM dual
      ;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
    SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
      FROM dual
      ;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
    I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
    C.
    Fixed some typos ...
    Message was edited by:
    cd
    Included 10g2 features
    Message was edited by:
    cd

    Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
    but for better understanding what I'd like to get
    next example:
    a have strings of two blocks separated by space.
    in the first block 5 symbols of [01] in the second block 3 symbols of [01].
    In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
    The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
    with t as (select '10(!)010 10(>)1' num from dual union all
    select '1112(!)0 111' from dual union all --incorrect because of '2'
    select '(!)10010 011' from dual union all
    select '10010(!) 101' from dual union all
    select '10010 100(>)' from dual union all
    select '13001 110' from dual union all -- incorrect because of '3'
    select '100!01 100' from dual union all --incorrect because of ! without (!)
    select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
    select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
    select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
    select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
    select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
    --end of test data

  • Regular Expressions find and replace

    Hi ,
    I have a question on using Regular Expressions in Java(java.util.regex).
    Problem Description:
    I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
    If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
    I have to do it inside a servlet and hence have to use java.
    I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
    //Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
              String ddurl="http://www.google.com/";
    String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
    Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
    Matcher m = p.matcher (strHTML);
    while(m.find())
    m.replaceAll(ddurl+m.group(1));
    what is wrong in this?
    Thanks,
    Rajiv

    Right, here's the full monte (whatever that means):import java.util.regex.*;
    public class Test1
      public static void main(String[] args)
        String domain = "http://www.google.com/";
        String strHTML =
          " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=images/logo.gif >\n" +
          " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
          " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
          " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
        String regex =
          "(<\\s*img.+?src\\s*=\\s*)   # Capture preliminaries in $1.  \n" +
          "(?:                         # First look for URL in quotes. \n" +
          "   ([\"\'])                 #   Capture open quote in $2.   \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?(.+?)                  #    ...capture URL in $3       \n" +
          "   \\2                      #   Match the closing quote     \n" +
          " |                          # Look for non-quoted URL.      \n" +
          "   (?!http:)                #   If it isn't absolute...     \n" +
          "   /?([^\\s>]+)             #    ...capture URL in $4       \n" +
        Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
        Matcher m = p.matcher(strHTML);
        StringBuffer sbuf = new StringBuffer();
        while (m.find())
          String relURL = m.group(3) != null ? m.group(3) : m.group(4);
          m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
        m.appendTail(sbuf);
        System.out.println(sbuf.toString());
    }First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
    As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

  • Help with regular expression needed

    Hi,
    Perhaps someone here can help me with my regular expression I'm trying to build in my Java code.
    The regular expression that I'm looking to build consists of any non-whitespace character up until it finds one or two <>= symbols and then any character thereafter. So both these Strings would match the expression:
    City 1==London
    Age>=18
    The regular expression that I'm using is as follows:
    (\\S+)([><=]){1,2}(.+)However, group 1 always retrieves the first <>= symbol as in "City 1=". How can I make the <>= part greedy so that it retrieves both operator symbols?
    Thanks.

    Make the first group, the non-spaces, reluctant:
    "(\\S+?)([<>=]{1,2})(.+)"

  • [SOLVED]ZSH and regular expressions

    Hi
    I am getting into regular expressions and i have noticed that with my .zshrc file i have some problem. In bash this expression works:
    \^\[^#]
    but not also in zsh. I have also noted that regular expression works fine with other zshrc configurations found in archwiki (like grml) but i want to have my configuration. And i really can't find what command make a difference
    My .zshrc file is pulled from this site https://github.com/slashbeast/things/bl … s/DOTzshrc.
    # .zshrc
    # Author: Piotr Karbowski <[email protected]>
    # License: beerware.
    # Basic zsh config.
    umask 077
    ZDOTDIR=${ZDOTDIR:-${HOME}}
    ZSHDDIR="${HOME}/.config/zsh.d"
    HISTFILE="${ZDOTDIR}/.zsh_history"
    HISTSIZE='10000'
    SAVEHIST="${HISTSIZE}"
    export EDITOR="/usr/bin/vim"
    export TMP="$HOME/tmp"
    export TEMP="$TMP"
    export TMPDIR="$TMP"
    export TMPPREFIX="${TMPDIR}/zsh"
    if [ ! -d "${TMP}" ]; then mkdir "${TMP}"; fi
    if ! [[ "${PATH}" =~ "^${HOME}/bin" ]]; then
    export PATH="${HOME}/bin:${PATH}"
    fi
    # Not all servers have terminfo for rxvt-256color. :<
    if [ "${TERM}" = 'rxvt-256color' ] && ! [ -f '/usr/share/terminfo/r/rxvt-256color' ] && ! [ -f '/lib/terminfo/r/rxvt-256color' ] && ! [ -f "${HOME}/.terminfo/r/rxvt-256color" ]; then
    export TERM='rxvt-unicode'
    fi
    # Colors.
    red='\e[0;31m'
    RED='\e[1;31m'
    green='\e[0;32m'
    GREEN='\e[1;32m'
    yellow='\e[0;33m'
    YELLOW='\e[1;33m'
    blue='\e[0;34m'
    BLUE='\e[1;34m'
    purple='\e[0;35m'
    PURPLE='\e[1;35m'
    cyan='\e[0;36m'
    CYAN='\e[1;36m'
    NC='\e[0m'
    # Functions
    if [ -f '/etc/profile.d/prll.sh' ]; then
    . "/etc/profile.d/prll.sh"
    fi
    run_under_tmux() {
    # Run $1 under session or attach if such session already exist.
    # $2 is optional path, if no specified, will use $1 from $PATH.
    # If you need to pass extra variables, use $2 for it as in example below..
    # Example usage:
    # torrent() { run_under_tmux 'rtorrent' '/usr/local/rtorrent-git/bin/rtorrent'; }
    # mutt() { run_under_tmux 'mutt'; }
    # irc() { run_under_tmux 'irssi' "TERM='screen' command irssi"; }
    # There is a bug in linux's libevent...
    # export EVENT_NOEPOLL=1
    command -v tmux >/dev/null 2>&1 || return 1
    if [ -z "$1" ]; then return 1; fi
    local name="$1"
    if [ -n "$2" ]; then
    local file_path="$2"
    else
    local file_path="command ${name}"
    fi
    if tmux has-session -t "${name}" 2>/dev/null; then
    tmux attach -d -t "${name}"
    else
    tmux new-session -s "${name}" "${file_path}" \; set-option status \; set set-titles-string "${name} (tmux@${HOST})"
    fi
    t() { run_under_tmux rtorrent; }
    irc() { run_under_tmux irssi "TERM='screen' command irssi"; }
    over_ssh() {
    if [ -n "${SSH_CLIENT}" ]; then
    return 0
    else
    return 1
    fi
    reload () {
    exec "${SHELL}" "$@"
    confirm() {
    local answer
    echo -ne "zsh: sure you want to run '${YELLOW}$@${NC}' [yN]? "
    read -q answer
    echo
    if [[ "${answer}" =~ ^[Yy]$ ]]; then
    command "${=1}" "${=@:2}"
    else
    return 1
    fi
    confirm_wrapper() {
    if [ "$1" = '--root' ]; then
    local as_root='true'
    shift
    fi
    local runcommand="$1"; shift
    if [ "${as_root}" = 'true' ] && [ "${USER}" != 'root' ]; then
    runcommand="sudo ${runcommand}"
    fi
    confirm "${runcommand}" "$@"
    poweroff() { confirm_wrapper --root $0 "$@"; }
    reboot() { confirm_wrapper --root $0 "$@"; }
    hibernate() { confirm_wrapper --root $0 "$@"; }
    detox() {
    if [ "$#" -ge 1 ]; then
    confirm detox "$@"
    else
    command detox "$@"
    fi
    has() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [ "${string}" = "${element}" ]; then
    return 0
    fi
    done
    return 1
    begin_with() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [[ "${string}" =~ "^${element}" ]]; then
    return 0
    fi
    done
    return 1
    termtitle() {
    case "$TERM" in
    rxvt*|xterm|nxterm|gnome|screen|screen-*)
    local prompt_host="${(%):-%m}"
    local prompt_user="${(%):-%n}"
    local prompt_char="${(%):-%~}"
    case "$1" in
    precmd)
    printf '\e]0;%s@%s: %s\a' "${prompt_user}" "${prompt_host}" "${prompt_char}"
    preexec)
    printf '\e]0;%s [%s@%s: %s]\a' "$2" "${prompt_user}" "${prompt_host}" "${prompt_char}"
    esac
    esac
    git_check_if_worktree() {
    # This function intend to be only executed in chpwd().
    # Check if the current path is in git repo.
    # We would want stop this function, on some big git repos it can take some time to cd into.
    if [ -n "${skip_zsh_git}" ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # The : separated list of paths where we will run check for git repo.
    # If not set, then we will do it only for /root and /home.
    if [ "${UID}" = '0' ]; then
    # running 'git' in repo changes owner of git's index files to root, skip prompt git magic if CWD=/home/*
    git_check_if_workdir_path="${git_check_if_workdir_path:-/root:/etc}"
    else
    git_check_if_workdir_path="${git_check_if_workdir_path:-/home}"
    git_check_if_workdir_path_exclude="${git_check_if_workdir_path_exclude:-${HOME}/_sshfs}"
    fi
    if begin_with "${PWD}" ${=git_check_if_workdir_path//:/ }; then
    if ! begin_with "${PWD}" ${=git_check_if_workdir_path_exclude//:/ }; then
    local git_pwd_is_worktree_match='true'
    else
    local git_pwd_is_worktree_match='false'
    fi
    fi
    if ! [ "${git_pwd_is_worktree_match}" = 'true' ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # todo: Prevent checking for /.git or /home/.git, if PWD=/home or PWD=/ maybe...
    # damn annoying RBAC messages about Access denied there.
    if [ -d '.git' ] || [ "$(git rev-parse --is-inside-work-tree 2> /dev/null)" = 'true' ]; then
    git_pwd_is_worktree='true'
    git_worktree_is_bare="$(git config core.bare)"
    else
    unset git_branch git_worktree_is_bare
    git_pwd_is_worktree='false'
    fi
    git_branch() {
    git_branch="$(git symbolic-ref HEAD 2>/dev/null)"
    git_branch="${git_branch##*/}"
    git_branch="${git_branch:-no branch}"
    git_dirty() {
    if [ "${git_worktree_is_bare}" = 'false' ] && [ -n "$(git status --untracked-files='no' --porcelain)" ]; then
    git_dirty='%F{green}*'
    else
    unset git_dirty
    fi
    precmd() {
    # Set terminal title.
    termtitle precmd
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    preexec() {
    # Set terminal title along with current executed command pass as second argument
    termtitle preexec "${(V)1}"
    chpwd() {
    git_check_if_worktree
    man() {
    if command -v vimmanpager >/dev/null 2>&1; then
    PAGER="vimmanpager" command man "$@"
    else
    command man "$@"
    fi
    # Are we running under grsecurity's RBAC?
    rbac_auth() {
    local auth_to_role='admin'
    if [ "${USER}" = 'root' ]; then
    if ! grep -qE '^RBAC:' "/proc/self/status" && command -v gradm > /dev/null 2>&1; then
    echo -e "\n${BLUE}*${NC} ${GREEN}RBAC${NC} Authorize to '${auth_to_role}' RBAC role."
    gradm -a "${auth_to_role}"
    fi
    fi
    #rbac_auth
    # Check if we started zsh in git worktree, useful with tmux when your new zsh may spawn in source dir.
    git_check_if_worktree
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    # Le features!
    # extended globbing, awesome!
    setopt extendedGlob
    # zmv - a command for renaming files by means of shell patterns.
    autoload -U zmv
    # zargs, as an alternative to find -exec and xargs.
    autoload -U zargs
    # Turn on command substitution in the prompt (and parameter expansion and arithmetic expansion).
    setopt promptsubst
    # Control-x-e to open current line in $EDITOR, awesome when writting functions or editing multiline commands.
    autoload -U edit-command-line
    zle -N edit-command-line
    bindkey '^x^e' edit-command-line
    # Include user-specified configs.
    if [ ! -d "${ZSHDDIR}" ]; then
    mkdir -p "${ZSHDDIR}" && echo "# Put your user-specified config here." > "${ZSHDDIR}/example.zsh"
    fi
    for zshd in $(ls -A ${HOME}/.config/zsh.d/^*.(z)sh$); do
    . "${zshd}"
    done
    # Completion.
    autoload -Uz compinit
    compinit
    zstyle ':completion:*' matcher-list 'm:{a-z}={A-Z}'
    zstyle ':completion:*' completer _expand _complete _ignored _approximate
    zstyle ':completion:*' menu select=2
    zstyle ':completion:*' select-prompt '%SScrolling active: current selection at %p%s'
    zstyle ':completion::complete:*' use-cache 1
    zstyle ':completion:*:descriptions' format '%U%F{cyan}%d%f%u'
    # If running as root and nice >0, renice to 0.
    if [ "$USER" = 'root' ] && [ "$(cut -d ' ' -f 19 /proc/$$/stat)" -gt 0 ]; then
    renice -n 0 -p "$$" && echo "# Adjusted nice level for current shell to 0."
    fi
    # Fancy prompt.
    if over_ssh && [ -z "${TMUX}" ]; then
    prompt_is_ssh='%F{blue}[%F{red}SSH%F{blue}] '
    elif over_ssh; then
    prompt_is_ssh='%F{blue}[%F{253}SSH%F{blue}] '
    else
    unset prompt_is_ssh
    fi
    case $USER in
    root)
    PROMPT='%B%F{cyan}%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{blue}%1~${git_prompt}%F{blue} %# %b%f%k'
    PROMPT='%B%F{blue}%n@%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{cyan}%1~${git_prompt}%F{cyan} %# %b%f%k'
    esac
    # Ignore lines prefixed with '#'.
    setopt interactivecomments
    # Ignore duplicate in history.
    setopt hist_ignore_dups
    # Prevent record in history entry if preceding them with at least one space
    setopt hist_ignore_space
    # Nobody need flow control anymore. Troublesome feature.
    #stty -ixon
    setopt noflowcontrol
    # Fix for tmux on linux.
    case "$(uname -o)" in
    'GNU/Linux')
    export EVENT_NOEPOLL=1
    esac
    # Aliases
    alias cp='cp -iv'
    alias rcp='rsync -v --progress'
    alias rmv='rsync -v --progress --remove-source-files'
    alias mv='mv -iv'
    alias rm='rm -iv'
    alias rmdir='rmdir -v'
    alias ln='ln -v'
    alias chmod="chmod -c"
    alias chown="chown -c"
    if command -v colordiff > /dev/null 2>&1; then
    alias diff="colordiff -Nuar"
    else
    alias diff="diff -Nuar"
    fi
    alias grep='grep --colour=auto'
    alias egrep='egrep --colour=auto'
    alias ls='ls --color=auto --human-readable --group-directories-first --classify'
    # Keys.
    case $TERM in
    rxvt*|xterm*)
    bindkey "^[[7~" beginning-of-line #Home key
    bindkey "^[[8~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    linux)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward
    bindkey "^[[B" history-beginning-search-forward
    screen|screen-*)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    esac
    bindkey "^R" history-incremental-pattern-search-backward
    bindkey "^S" history-incremental-pattern-search-forward
    if [ -f ~/.alert ]; then cat ~/.alert; fi
    Thanks for all the help.
    Last edited by Shark (2013-05-11 22:32:24)

    Raynman wrote:
    "This expression doesn't work", "It doesn't work" ...
    Could you try being a bit more specific?
    Firstly, i am sorry i didn't post the output. I should have know better.
    Secondly, chill out.
    I have used above regex with grep command. Output from terminal is:
    zsh: bad pattern: ^[^#]
    In bash it works perfectly.
    If i issue "setopt re_match_pcre" i have the same ouput as above.
    EDIT: If i issue "unsetopt no_match" it actually works but i have to change the regex from "\^\[^#]" to "\^[^#]" otherwise i get the same output as above. In bash both options work.
    Last edited by Shark (2013-05-11 22:07:21)

  • Regular expression question (should be an easy one...)

    i'm using java to build a parser. im getting an expression, which i split on a white-space.
    how can i build a regular-expression that will enable me to split only on unquoted space? example:
    for the expression:
    (X=33 AND Y=44) OR (Z="hello world" AND T=2)
    I will get the following values split:
    (X=33
    AND
    Y=34)
    OR
    (Z="hello world"
    AND
    T=2)
    and not:
    (Z="
    hello
    world"
    thank you very much!

    Instead of splitting on whitespace to get a list of tokens, use Matcher.find() to match the tokens themselves: import java.util.*;
    import java.util.regex.*;
    public class Test
      public static void main(String[] args) throws Exception
        String str = "(X=33 AND Y=44) OR (Z=\"hello world\" AND T=2)";
        List<String> tokens = new ArrayList<String>();
        Matcher m = Pattern.compile("[^\\s\"]+(?:\".*?\")?").matcher(str);
        while (m.find())
          tokens.add(m.group());
        System.out.println(tokens);
    }{code} The regex I used is based on the assumptions that there will be at most one run of quoted text per token, that it will always appear in the right hand side of an expression, and that the closing quote will always mark the end of the token.  If the rules are more complicated (as sabre150 suggested), a more complicated regex will be needed.  You might be better off doing the parsing the old-fashioned way, with out regexes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Introduction to regular expressions ...

    I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
    Introduction
    Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
    Having fun with regular expressions - Part 1
    Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
    Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
    If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
    Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
    Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
    WITH t AS (SELECT '456' col1
                 FROM dual
                UNION
               SELECT '123x'
                 FROM dual
                UNION  
               SELECT 'x123'
                 FROM dual
                UNION 
               SELECT 'y'
                 FROM dual
                UNION 
               SELECT '+789'
                 FROM dual
                UNION 
               SELECT '-789'
                 FROM dual
                UNION 
               SELECT '159-'
                 FROM dual
                UNION 
               SELECT '-1-'
                 FROM dual
    SELECT t.col1
      FROM t
    WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
    ;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
    To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
    I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
    Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
    Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
    So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
    Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
    Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
    Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
    Wait a minute, what happened to the row with the column value "-1-"?
    You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
    Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
    Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
    Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
    Compare your solution to this one:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
    Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
    I'll just use another SELECT to show what I want to do:
    WITH t AS (SELECT '0' col1
                 FROM dual
                UNION
               SELECT '0.' 
                 FROM dual
                UNION
               SELECT '.0' 
                 FROM dual
                UNION
               SELECT '0.0' 
                 FROM dual
                UNION
               SELECT '-1.0' 
                 FROM dual
                UNION
               SELECT '.1-' 
                 FROM dual
                UNION
               SELECT '.' 
                 FROM dual
                UNION
               SELECT '-1.1-' 
                 FROM dual
    SELECT t.*
      FROM t
    ;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
    NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
    This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
    All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
    Does your solution look a bit like this?
    WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                               '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                           '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                          )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
    Continued in Introduction to regular expressions ... continued.
    C.
    Fixed some embarrassing typos ... and mistakes.
    cd

    Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

  • Regular expressions and input streams

    Hello,
    I am trying to find a way to read characters from a stream and find matches in them with regular expressions. The problem is that the stream may contain a big amount of characters, so I can't read all of them, keep them in a big string, and then perform the searches. Is there a way to perform searches while I read the characters? I scanned the documentation of Pattern and Matcher and found nothing that can help.
    Any ideas?
    Thanks.

    Looking at the method
    public Matcher Pattern.matcher(CharSequence input);
    one could be excused for assuming that one could turn an InputStream into a CharSequence object since a 'stream' implies sequencial access and CharSequence sounds like it provides sequential access.
    This would just require the implementation of 4 method of which two (toString() and subSequence()) could probably be ignored BUT the method you wouild have to implement are charAt() and length() which imply random access rather than sequencial access!
    Using the built in regex your problem looks difficult. If you are just looking for 'equality' matches (simple searching) then you could use Knuth-Morris-Pratt algorithm or the Boyer-Moore algorithm.

  • Regular Expressions and comments

    Hello,
    I've got a problem with regular expressions. I Want to find special words like "todo" in java comments, but wasn't able to manage that satisfactory. I hope someone of you can give me an advice! :-)
    example of a comment:
    * comment text. todo: what is still todo.
    First of all, I tried this:
    /\*(.*?)todo(.*?)\*/
    but this version seems to ignore the borders of the comment and shows me also parts of the code.
    Then I tried this:
    /\*([^/]*?)todo(.*?)\*/
    this version returns good results, but doesn't notice html formatting like
    * <code> .... </code> text. todo: text
    So my idea now is to check whether a star precedes the slash, but I don't really know how to combine it.
    Or is there another simplier solution? Thanks for your hints!
    Greetz Jan

    Thanks for your answer, but when I take this expression the programm seems to hang-up. An operation that usually finishs in 3 secs didn't even in 10 minutes. :-( Do you have any idea what could be wrong?
    btw. can I take this one:
    pattern = Pattern.compile("/\\*(?:[^\\*]+|\\*(?!/))*todo.*?\\*/", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
    matcher = pattern.matcher(text);
    text = matcher.replaceAll("");or do I have to take a this instead of replaceAll():
    while (matcher.find()) {
        text = text.substring(0, matcher.start(1)) + text.substring(matcher.end(1), text.length());
        matcher = pattern.matcher(text);
    }any hints?
    Greetz Jan

  • Regular Expressions and Numbers in Strings

    Looking for someone who has had experience in using Regular Expressions in the Classification Rule Builder.
    We have an eVar that is collecting the number of search results in this fashion:
    <Total Results>_<# of Item 1>_<# of Item 2>_<# of Item 3>_<# of Item 4>
    Example output would look like this:
    150_50_0_25_75
    What we've done is initially create a Regular Expression that looks like this:
    ^(.+)\_(.+)\_(.+)\_(.+)\_(.+)$
    The problem is, it appears in situations where the output contains a zero in one of the slots, the value is ignored and it receives the value in the next place over.  Using the example output shown above, I would end up with values like this:
    $0 150_50_0_25_75
    $1 150
    $2 50
    $3 25
    $4 75
    $5 {null}
    Here's the weird part.  When I perform a test of a single record, it appears like it will work just fine, but when it actually runs in Omniture, it's not working as expected.  Here's something else I'd like to know if it's possible to address.  The five-place string is only the newest iteration of this approach.  In the past, we started out with a two-place version, then three-place and then four.  Any recommendations for handling all scenarios?
    Any and all advice is welcome.  Thanks in advance!

    Doing some playing around on rubular.com and thinking the Regular Expression should be build this way instead:
    ^(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)$
    Again, still looking for any additional guidance from more experienced individuals.  Thanks!

  • Regular Expressions - Logical AND

    I know this isn't Java, and it's not really an algorithm, but I can't figure this out. I hope amongst this bright group of developers someone can help me.
    I am searching for a regular expressions that will match a series of words.
    Example:
    Given the words: "ship book"
    What regular expression could be used to find both the word "ship" and the "book"?
    I have found one expression that will do it ... ship.*book|book.*ship
    But that expression doesn't scale. Does anyone know of a better way?
    Thanks,
    BacMan

    Hi,
    How about something like:
    public class Regexp1
       private static final Pattern all = Pattern.compile(
             "^(\\s*\\b(monkey|turnip|ship|book)\\b\\s*)*$" );
       private static final Pattern p = Pattern.compile(
             "\\s*\\b(monkey|turnip|ship|book)\\b\\s*" );
       public static void main( String[] argv )
          find( "ship book turnip monkey" );
          find( "monkey giraffe mango" );
          find( "ship shipship ship" );
       public static void find( String text )
          System.out.println( "Text: " + text );
          Matcher m = p.matcher( text );
          System.out.println( "Matches all: " + all.matcher( text ).matches() );
          while ( m.find() )
             System.out.println( "Matching word: '" + m.group(1) + "'" );
       }which will produce this when run:
    Text: ship book turnip monkey
    Matches all: true
    Matching word: 'ship'
    Matching word: 'book'
    Matching word: 'turnip'
    Matching word: 'monkey'
    Text: monkey giraffe mango
    Matches all: false
    Matching word: 'monkey'
    Text: ship shipship ship
    Matches all: false
    Matching word: 'ship'
    Matching word: 'ship'Pattern all tests to see if all the words are present.
    Pattern p finds each matching word and ignores others.
    Ol.

  • Using Regular Expressions to Find Quoted Text

    I have run into a couple problems with the following code.
    1) Slash-Star and Slash-Slash commented text must be ignored.
    2) It does not detect backslashed quotes, or if that backslash is backslashed.
    Can this be accomplished with Regular Expressions, or should I implement this using if/indexOf logic?
    Thank You in advance,
    Brian
        * Finds position of next quoted string in a line
        * of source code.
        * If no strings exist, then a Pointer position of
        * (0,0) is returned.
        * @param startPos position to start search from
        * @param argText  the line of text to search
        * @returns next string position
       public Pointer getQuotedStringPosition(int startPos, String aString) {
          String argText = new String( aString );
          Pattern p = Pattern.compile("[\"][^\"]+[\"]");
          Matcher m = p.matcher( argText.substring(startPos); );
          if( m.find() )
             return new Pointer( m.start() + startPos, m.end() + startPos );
          else
             return new Pointer( 0, 0 ); // indicates nothing was found
       }

    YATArchivist was right about the regular expressions.
    I think I've got it but somebody test it if you want. Let me know what you find.
    I've included a barebones Position class as well...
    import java.util.regex.*;
    import java.io.*;
    import java.util.*;
      @author Joshua A. Logan, Jr.
    public class RegexTest
       private static final String SLASH_SLASH = "(//.*)";
       private static final String SLASH_STAR =
                               "(/\\*(?:[^\\*]|(?:\\*(?!/)))+(\\*/)?)";
       private static final Pattern COMMENT_PATTERN =
                         Pattern.compile( SLASH_SLASH + "|" + SLASH_STAR );
       private static final Pattern QUOTED_STRING_PATTERN =
                      Pattern.compile( "\"  ( (?:(\\\\.) | [^\\\"])*+  )     \"",
                                       Pattern.COMMENTS );
       // Breaking the above regular expression down, you'd have:
       //   "  ( (?: (\\ .)  |  [^\\ "]  ) *+ )   "
       //   ^          ^     ^     ^       ^      ^
       //   |          |     |     |       |      |
       //   1          2     3     4       5      6
       // which matches:
       // 1) The starting quote...
       // Followed by something that is either:
       // 2) some escaped sequence ( e.g. _\n_  or even _\"_ ),
       // 3)                ...or...
       // 4) a character that is neither a _\_ nor a _"_ .
       // 5) Keep searching this as much as possible, w/o giving up
       //                    any found text at the end.
       //        Note: the text found would be in group(1)
       // 6) Finally, find the ending quote!!
       public static Position [] getQuotedStringPosition( final String text )
          Matcher cm = COMMENT_PATTERN.matcher( text ),
                  qm = QUOTED_STRING_PATTERN.matcher( text );
          final int len = text.length();
          int startPos = 0;
          List positions = new ArrayList();
          while ( startPos < len )
             if ( cm.find(startPos) )
                int commStart = cm.start(),
                    commEnd   = cm.end();
                // are we starting @ a comment?
                if ( commStart == startPos )
                   startPos = commEnd;
                else if ( qm.find(startPos) )
                   // Search for unescaped strings in here.
                   int stringStart = qm.start(1),
                       stringEnd   = qm.end(1);
                   // Is the quote start after comment start?
                   if ( stringStart > commStart )
                      startPos = commEnd; // restart search after comment end...
                   else if ( (stringEnd > commEnd) ||
                             (stringEnd < commStart) )
                      // In this case, the "comment" is actually part of
                      // the quoted string. We found a match.
                      positions.add( new Position(text, qm.group(1),
                                                  stringStart,
                                                  stringEnd) );
                      int quoteEnd = qm.end();
                      startPos = quoteEnd;
                   else
                      throw new IllegalStateException( "illegal case" );
                else
                   startPos = commEnd;
             else
                // no comments were found. Search for unescaped strings.
                int quoteEnd = len;
                if ( qm.find( startPos ) ) {
                   quoteEnd = qm.end();
                   positions.add( new Position(text,
                                               qm.group(1),
                                               qm.start(1),
                                               qm.end(1)) );
                startPos = quoteEnd;
          return positions.isEmpty() ? Position.EMPTY_ARRAY
                                     : (Position[])positions.toArray(
                                              Position.EMPTY_ARRAY);
       public static void main( String [] args )
          try
             BufferedReader br = new BufferedReader(
                      new InputStreamReader(System.in) );
             String input = null;
             final String prompt = "\nText (q to quit): ";
             System.out.print( prompt );
             while ( (input = br.readLine()) != null )
                if ( input.equals("q") ) return;
                Position [] matches = getQuotedStringPosition( input );
                // What does it do?
                for ( int i = 0, max = matches.length; i < max; i++ )
                   System.out.println( "-->" + matches[i] );
                System.out.print( prompt );
          catch ( Exception e )
             System.out.println ( "Exception caught: " + e.getMessage () );
    class Position
       public Position( String target,
                        String match,
                        int start,
                        int end )
          this.target = target;
          this.match = match;
          this.start = start;
          this.end = end;
       public String toString()
          return "match==" + match + ",{" + start + "," + end + "}";
       final String target;
       final int start;
       final int end;
       final String match;
       public static final Position [] EMPTY_ARRAY = { };
    }

  • Using regular expressions

    Hi Experts,
    After going through some documentation on regular expressions in Oracle I have tried to draw some conclusions about the same. As I wasn’t much confident on how the patterns are built, I have tried to interpret them by looking at the output. It’s basically a reverse engineering I have tried to do.
    Please let me know if my interpretations are correct. Any additions /suggestions/corrections are most welcome.
    Some of the examples may lack conclusions, please ignore those.
    select regexp_substr('1PSN/231_3253/ABc','^([[:alnum:]]*)') from dual;
    Output: 1PSN
    Interpreted as:
    ^ From the start of the source string
    ([[:alnum:]]*) zero or more occurrences of alphanumeric characters
    select regexp_substr('@@/231_3253/ABc','@*([[:alnum:]]+)') from dual;
    Output: 231
    Interpreted as:
    @* Search for zero or more occurrences of @
    ([[:alnum:]]+) followed by one or more occurrences of alphanumeric characters
    Note: In the above example oracle looks for @(zero times or more) immediately followed by alphanumeric characters.
    Since a '/' comes between @ and 231 the o/p is 0 occurences of @ + one or more occurrences of alphanumerics.
    select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]*)') from dual;
    Output: @
    Interpreted as:
    @+ one or more ocurrences of @
    ([[:alnum:]]*) followed by 0 or more occurrences of alphanumerics
    select regexp_substr('1@/231_3253/ABc','@+([[:alnum:]]+)') from dual;
    Output: Null
    Interpreted as:
    @+ one or more occurences of @
    ([[:alnum:]]+) followed by one or more occurences of aplhanumerics
    select regexp_substr('@1PSN/231_3253/ABc125','([[:digit:]]+)$') from dual;
    Output: 125
    Interpreted as:
    ([[:digit:]]+) one or more occurences of digits only
    $ at the end of the string
    select regexp_substr('@1PSN/231_3253/ABc','([^[:digit:]]+)$') from dual;
    output: /ABc
    Interpreted as:
    ([^[:digit:]]+)$ one or more occurrences of non-digit literals at the end of the string
    '^' inside square brackets marks the negation of the class
    Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.)
    SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database','http://([[:alnum:]]+\.?){3,4}/?') RESULT
    FROM dual;
    Output: http://www.oracle.com
    Interpreted as:
    [[:alnum:]]+ one or more occurences of alplanumeric characters
    \.? dot optionally (backslash represents escape sequence,? represents optionally)
    {3,4} 3 or 4 times
    /? followed by forward slash optionally
    If you have www.oracle.co.uk; {3,4} extracts it for you as well
    Validate email:
    select case  when
           REGEXP_LIKE('[email protected]',
                       '^([[:alnum:]]+(\_?|\.))([[:alnum:]]*)@([[:alnum:]]+)(.([[:alnum:]]+)){1,2}$') then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Interpreted as:
    ([[:alnum:]]+(\_?|\.)) one or more occurrences of alpha numerics optionally followed by an underscore or dot
    ([[:alnum:]]*) followed by 0 or more occurrences of alplhanumerics
    @ followed by @
    ([[:alnum:]]+) followed by one or more occurrences of alplhanumerics
    (.([[:alnum:]]+)){1,2} followed by a dot followed by alphanumerics from once till max of twice (Ex- .com or .co.uk)
    Output: Match Found
    Input: [email protected]
    Output: Match Found
    Input: [email protected]
    Output: No Match Found
    Truncate the part, ending with digits
    select regexp_substr('Yahoo11245@US','^.*[[:digit:]]',1) from dual;
    Output: Yahoo11245
    select regexp_substr('*Yahoo*11245@US','^.*[[:digit:]]',1) from dual;
    Output: *Yahoo*11245
    Interpreted as:
    .* zero or more occurrences of any characters (dot signifies any character)
    Replace 2 to 8 spaces with single space
    select regexp_replace('Hello   you      OPs       there','[[:space:]]{2,8}',' ')
    from dual;
    Search for control characters
    select case  when
           regexp_like('Super' || chr(13) || 'Star' ,'[[:cntrl:]]')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Search for lower case letters only with a string length varying from a min of 3 to max of 12
    select case  when
           regexp_like('terminator' ,'^[[:lower:]]{3,12}$')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    4th character must be a special character
    select case  when
           regexp_like('ter*minator' ,'^...[^[:alnum:]]')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Ouput: Match Found
    Case Sensitive Search
    select case  when
           regexp_like('Republic Of  Africa' ,'of','c')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: No match found
    c stands for case sensitive
    select case  when
           regexp_like('Republic Of  africa' ,'of','i')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    i stands for case insensitive
    Two consecutive occurences of characters from a to z
    select regexp_substr('Republicc Of Africaa' ,'([a-z])\1', 1,1,'i') from dual;
    Output: cc
    Interpreted as:
    ([a-z]) character set a-z
    \1 consecutive occurence of any character
    1 starting from 1st character in the string
    1 First occurence
    i case insensitive
    Three consecutive occurences of characters from 6 to 9
    select case  when
           regexp_like('Patch 10888 applied' ,'([7-9])\1\1')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Phone validator:
    select case  when
           regexp_like('123-44-5555' ,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$')
                  then 'Match Found'
           else 'No Match Found'
           end
    as output from dual;
    Output: Match Found
    Input: 111-222-3333
    Output: No match found
    Interpreted as:
    ^ start of the string
    [0-9]{3} three ocurrences of digits from 0-9
    - followed by hyphen
    [0-9]{2} two ocurrences of digits from 0-9
    - followed by hyphen
    [0-9]{4} four ocurrences of digits from 0-9
    $ end of the string
    ************************************************************************Source Links:
    http://www.psoug.org/reference/regexp.html
    http://www.oracle.com/technology/obe/obe10gdb/develop/regexp/regexp.htm
    Edited by: Preta on Feb 25, 2010 4:38 PM
    Corrected the example for www.oracle.com
    Edited by: Preta Incorported Logan's comments

    Hi,
    It looks like you have a good understanding of how regular expressions work.
    You can put comments like the ones in your message directly in the code. For example, your validate e-mail code could be re-written
    select      case 
             when REGEXP_LIKE ( '[email protected]'
                        , '^'          || -- Starting from the beginning of the string
                        '('          || -- Begin \1
                          '[[:alnum:]]+'|| --     0 or more alphnumerics
                          '(\_?|\.)'     || --     optional underscore or dot
                        ')'          || -- End \1
                        '([[:alnum:]]*)'|| -- 0 or more alphnumerics
                        '@'          || -- @ sign
                        '([[:alnum:]]+)'|| -- 1 or more alpanumerics
                        '('          || -- Begin \5
                          '\.'          || --   dot
                          '([[:alnum:]]+)'
                                  || --   1 or more alphanumerics
                        ')'          || -- End \5
                        '{1,2}'          || -- \5 can occur 1 or 2 times
                        '$'             -- End of string
             then 'Match Found'
                    else 'No Match Found'
                end          as output
    from      dual;I find this easier to debug and maintain.
    There's no denying, it does make the code very long. You be the judge of when to do this.
    You use parentheses and \ unnceccessarily sometimes. That's not really an error; if you find they make the code easier to develop and maintain, use them as much as you like.
    For example, about the 4th line of the regular expression as I formatted it above:
    '(\_?|\.)'     || --     optional underscore or dotUnderscore has no special meaning in regular expressions (only in LIKE), so you don't have to escape it.
    I might write that line:
    '(_|\.)?'     || --     optional underscore or dotjust because I think it's clearer.
    I think you forgot a \ about 7 lines later:
    '\.'          || --   dotBe very careful about testing patterns that include literal dots; always make sure that a random character, like ~ , fails in a place where a dot is expected.

  • Using regular expressions for validating time fields

    Similar to my problem with converting a big chunk of validation into smaller chunks of functions I am trying to use Regular Expressions to handle the validation of many, many time fields in a flexible working time sheet.
    I have a set of FormCalc scripts to calculate the various values for days, hours and the gain/loss of hours over a four week period. For these scripts to work the time format must be in HH:MM.
    Accessibility guidelines nix any use of message box pop ups so I wanted to get around this by having a hidden/visible field with warning text but can't get it to work.
    So far I have:
    var r = new RegExp(); // Create a new Regular Expression Object
    r.compile ("^[00-99]:\\] + [00-59]");
    var result = r.test(this.rawValue);
    if (result == true){
    true;
    form1.flow.page.parent.part2.part2body.errorMessage.presence = "visible";
    else (result == false){
    false;
    form1.flow.page.parent.part2.part2body.errorMessage.presence = "hidden";
    Any help would be appreciated!

    Date and time fields are tricky because you have to consider the formattedValue versus the rawValue. If I am going to use regular expressions to do validation I find it easier to make them text fields and ignore the time patterns (formattedValue). Something like this works (as far as my very brief testing goes) for 24 hour time where time format is HH:MM.
    // form1.page1.subform1.time_::exit - (JavaScript, client)
    var error = false;
    form1.page1.subform1.errorMsg.rawValue = "";
    if (!(this.isNull)) {
      var time_ = this.rawValue;
      if (time_.length != 5) {
        error = true;
      else {
        var regExp = /^([01]?[0-9]|2[0-3]):[0-5][0-9]$/;
        if (!(regExp.test(time_))) {
          error = true;
    if (error == true) {
      form1.page1.subform1.errorMsg.rawValue = "The time must be in the format HH:MM where HH is 00-23 and MM is 00-59.";
      form1.page1.subform1.errorMsg.presence = "visible";
    Steve

Maybe you are looking for

  • Log entries in Service Consupmtion Layer Application Logs of Duet Enterpr

    Dear all, I need your inputs to solve the below mentioned issue, There is a communication developed between MS sharepoint and SAP. By using Mapper Classes for each and every operations the conversion of data format from Sharepoint to SAP and vicevers

  • Using bind variables in creating new recordgroup

    hi i am creating a new recordgroup and assigning it to the LOV using forms personalization in new item instance trigger . when i use the bind variables in the record group query i am getting errored out. if i remove the bind variables in the query th

  • Activating dynamic date calculation in variant

    Per the link below, I used to be able to activate "dynamic date calculation" in variant. <link to blocked site removed by moderator> Our system has been upgraded to 6.0 and now we do not the same variant attribues screen where I can activate the "D"

  • Summary column value changing to NULL in RDF

    Hello Everybody. For the existing query column in RDF, which is used by one of the summary column as source. If the query is getting changed as per the bussiness requirement, the source of the Summary column is changing to NULL. Is there any way not

  • Adding authorizations to standard proflies

    Dear all, Can anyone tell me how to add authorizations to standard profiles. For example there are system standard proflies like ' f_fico_all ' , s_abap_all , m_all. They have the name of having full auth in a particular module .. but they are empty.