(NEWB) Regular Expressions (SOLVED)

I have a set of shipment identifiers in the following format:
(date, military-style) + (3-digit destination identifier) + (optionally, a hyphen with flags for special circumstances)
Two examples follow:
20070806THO
20070823ANC-1
I'm trying to select the distinct destination identifiers for all shipments we've shipped, so I'm running a select distinct regexp_replace(...) to return the distinct identifiers (so, '20070823ANC-1' should return 'ANC'). I'm running into problems with regexp_replace, and I don't know why:
select regexp_replace('20070807THO', '^\d{8}(.{3}).*$', '\1') from dual;
REGEXP_REPL
20070807THOWhat is wrong with my expression?

how about this
SQL> WITH T AS
  2        (SELECT '20070806THO'  str  FROM dual
  3         UNION ALL
  4         SELECT '20070823ANC-1' FROM dual
  5            )
  6  SELECT SUBSTR(str,9,3) FROM T;
SUB
THO
ANC
i don't have 10g right now, you need regexp_substr                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

Similar Messages

  • Regular expression help to solve sys_refcursor for a record

    In reference to my thread Question on sys_refcursor with record type , I thought it can be solved differently. That is:
    I have a string like '8:1706,1194,1817~1:1217,1613,1215,1250'
    I need to do some manipulation using regular expressions and acheive some thing like
    select * from <table> where
    c1 in (8,1)
    and c2 in (1706,1194,1817,1217,1613,1215,1250);Is it possible using regular expressions in a single select statement?

    Hi,
    Clearance 6`- 8`` wrote:
    Your understanding is absolutely correct. But unfortunately it did not work Frank.
    SQL> SELECT COUNT (*)
    2    FROM (SELECT sp.*
    3            FROM spml sp, spml_assignment spag
    4           WHERE sp.spml_id = spag.spml_id
    5             AND spag.class_of_svc_id = 8
    6             AND spag.service_type_id IN (1706, 1194, 1817)
    7             AND spag.carrier_id = 4445
    8             AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    9             AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    10             AND spag.unit_id = 5
    11             AND sales_org_id = 1
    12          UNION ALL
    13          SELECT sp.*
    14            FROM spml sp, spml_assignment spag
    15           WHERE sp.spml_id = spag.spml_id
    16             AND spag.class_of_svc_id = 1
    17             AND spag.service_type_id IN (1217, 1613, 1215, 1250)
    18             AND spag.carrier_id = 4445
    19             AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    20             AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    21             AND spag.unit_id = 5
    22             AND sales_org_id = 1);
    COUNT(*)
    88
    SQL> SELECT COUNT (*)
    2    FROM spml sp, spml_assignment spag
    3   WHERE sp.spml_id = spag.spml_id
    4     AND spag.carrier_id = 4445
    5     AND NVL (spag.haulage_type_id, -1) = NVL (NULL, -1)
    6     AND spag.effdate = TO_DATE ('01/01/2000', 'mm/dd/yyyy')
    7     AND spag.unit_id = 5
    8     AND sales_org_id = 1
    9     AND REGEXP_LIKE ('8:1706,1194,1817~1:1217,1613,1215,1250',
    10                      '(^|~)' || spag.class_of_svc_id || ':'
    11                     )
    12     AND REGEXP_LIKE ('8:1706,1194,1817~1:1217,1613,1215,1250',
    13                      '(:|,)' || spag.service_type_id || '(,|$)'
    14                     );
    COUNT(*)
    140
    SQL> Edited by: Clearance 6`- 8`` on Aug 11, 2009 8:04 PMJust serving what you ordered!
    Originally, you said you were looking for something that produced the same result as
    where   c1 in (8, 1)
    and      c2 in (1706, 1194, 1817, 1217, 1613, 1215, 1250)that is, any of the c1s could be paired with any of the c2s.
    Now it looks like what you want is
    where     (     c1 = 8
         and     c2 IN (1706, 1194, 1817)
    or     (     c1 = 1
         and     c2 IN (1217, 1613, 1215, 1250)
         )that is, c1=8 and c2=1250 is no good; neither is c1=1 and c2=1706.
    In that case, try
    WHERE     REGEXP_LIKE ( s
                  , '(^|~)' || c1
                         || ':([0-9]+,)*'
                         || c2
                         || '(,|~|$)'
                  )

  • [solved]Issue concerning regular expressions

    Perhaps the problem in it's entirety is a bit more involved that just regular expressions, but for a start, this small problem is what I need help with.
    In short, I have some data which can be in the of some text, an IP address or an URL.
    In case that it's URL, it will usually, if not always, be in the form of a base URL, with or without one or more subdomains.
    I need to strip away these sub domains so that only the base URL remain, thus:
    abc.def.ghi.com
    becomes
    ghi.com
    forgetting for the moment that some top level domains have two parts, it seems so easy, yes I cannot for the life of me figure out how to do it, using only the basic command line tool available such as sed, awk, and so on and so forth, and thus I hope that someone here can and will help me. It is of course also possible that the process is simply to involved that a simple command line will do, in which case I suppose a bahs or python script will have to do, but I really hope not. In any case I will appreciate any help and/or advise you can give me.
    Best regards.
    Last edited by zacariaz (2015-05-07 13:38:47)

    WorMzy wrote:
    Is this a homework exercise?
    What have you tried so far? It should be easy enough to accomplish using a combination of rev and cut, although I'm sure there are more elegant solutions using awk.
    Mod note: Moving to Newbie Corner
    I wish it were homework, but no.
    The rev idea is interesting, and I'll look into it, but as it is, an awk solution would be preferable, as this is only part of a larger issue, and I'd like to keep it as simple as possible. Also, as for topdomain with two parts, it's still somewhat involved.
    As for what I've tried, a lot as I only stopped and went to bed when I realized the sun was raising, but the main problem is that I don't really know how to attack the problem, and of course I'm not exactly a master of regular expressions.
    Anyway thanks.

  • [SOLVED]ZSH and regular expressions

    Hi
    I am getting into regular expressions and i have noticed that with my .zshrc file i have some problem. In bash this expression works:
    \^\[^#]
    but not also in zsh. I have also noted that regular expression works fine with other zshrc configurations found in archwiki (like grml) but i want to have my configuration. And i really can't find what command make a difference
    My .zshrc file is pulled from this site https://github.com/slashbeast/things/bl … s/DOTzshrc.
    # .zshrc
    # Author: Piotr Karbowski <[email protected]>
    # License: beerware.
    # Basic zsh config.
    umask 077
    ZDOTDIR=${ZDOTDIR:-${HOME}}
    ZSHDDIR="${HOME}/.config/zsh.d"
    HISTFILE="${ZDOTDIR}/.zsh_history"
    HISTSIZE='10000'
    SAVEHIST="${HISTSIZE}"
    export EDITOR="/usr/bin/vim"
    export TMP="$HOME/tmp"
    export TEMP="$TMP"
    export TMPDIR="$TMP"
    export TMPPREFIX="${TMPDIR}/zsh"
    if [ ! -d "${TMP}" ]; then mkdir "${TMP}"; fi
    if ! [[ "${PATH}" =~ "^${HOME}/bin" ]]; then
    export PATH="${HOME}/bin:${PATH}"
    fi
    # Not all servers have terminfo for rxvt-256color. :<
    if [ "${TERM}" = 'rxvt-256color' ] && ! [ -f '/usr/share/terminfo/r/rxvt-256color' ] && ! [ -f '/lib/terminfo/r/rxvt-256color' ] && ! [ -f "${HOME}/.terminfo/r/rxvt-256color" ]; then
    export TERM='rxvt-unicode'
    fi
    # Colors.
    red='\e[0;31m'
    RED='\e[1;31m'
    green='\e[0;32m'
    GREEN='\e[1;32m'
    yellow='\e[0;33m'
    YELLOW='\e[1;33m'
    blue='\e[0;34m'
    BLUE='\e[1;34m'
    purple='\e[0;35m'
    PURPLE='\e[1;35m'
    cyan='\e[0;36m'
    CYAN='\e[1;36m'
    NC='\e[0m'
    # Functions
    if [ -f '/etc/profile.d/prll.sh' ]; then
    . "/etc/profile.d/prll.sh"
    fi
    run_under_tmux() {
    # Run $1 under session or attach if such session already exist.
    # $2 is optional path, if no specified, will use $1 from $PATH.
    # If you need to pass extra variables, use $2 for it as in example below..
    # Example usage:
    # torrent() { run_under_tmux 'rtorrent' '/usr/local/rtorrent-git/bin/rtorrent'; }
    # mutt() { run_under_tmux 'mutt'; }
    # irc() { run_under_tmux 'irssi' "TERM='screen' command irssi"; }
    # There is a bug in linux's libevent...
    # export EVENT_NOEPOLL=1
    command -v tmux >/dev/null 2>&1 || return 1
    if [ -z "$1" ]; then return 1; fi
    local name="$1"
    if [ -n "$2" ]; then
    local file_path="$2"
    else
    local file_path="command ${name}"
    fi
    if tmux has-session -t "${name}" 2>/dev/null; then
    tmux attach -d -t "${name}"
    else
    tmux new-session -s "${name}" "${file_path}" \; set-option status \; set set-titles-string "${name} (tmux@${HOST})"
    fi
    t() { run_under_tmux rtorrent; }
    irc() { run_under_tmux irssi "TERM='screen' command irssi"; }
    over_ssh() {
    if [ -n "${SSH_CLIENT}" ]; then
    return 0
    else
    return 1
    fi
    reload () {
    exec "${SHELL}" "$@"
    confirm() {
    local answer
    echo -ne "zsh: sure you want to run '${YELLOW}$@${NC}' [yN]? "
    read -q answer
    echo
    if [[ "${answer}" =~ ^[Yy]$ ]]; then
    command "${=1}" "${=@:2}"
    else
    return 1
    fi
    confirm_wrapper() {
    if [ "$1" = '--root' ]; then
    local as_root='true'
    shift
    fi
    local runcommand="$1"; shift
    if [ "${as_root}" = 'true' ] && [ "${USER}" != 'root' ]; then
    runcommand="sudo ${runcommand}"
    fi
    confirm "${runcommand}" "$@"
    poweroff() { confirm_wrapper --root $0 "$@"; }
    reboot() { confirm_wrapper --root $0 "$@"; }
    hibernate() { confirm_wrapper --root $0 "$@"; }
    detox() {
    if [ "$#" -ge 1 ]; then
    confirm detox "$@"
    else
    command detox "$@"
    fi
    has() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [ "${string}" = "${element}" ]; then
    return 0
    fi
    done
    return 1
    begin_with() {
    local string="${1}"
    shift
    local element=''
    for element in "$@"; do
    if [[ "${string}" =~ "^${element}" ]]; then
    return 0
    fi
    done
    return 1
    termtitle() {
    case "$TERM" in
    rxvt*|xterm|nxterm|gnome|screen|screen-*)
    local prompt_host="${(%):-%m}"
    local prompt_user="${(%):-%n}"
    local prompt_char="${(%):-%~}"
    case "$1" in
    precmd)
    printf '\e]0;%s@%s: %s\a' "${prompt_user}" "${prompt_host}" "${prompt_char}"
    preexec)
    printf '\e]0;%s [%s@%s: %s]\a' "$2" "${prompt_user}" "${prompt_host}" "${prompt_char}"
    esac
    esac
    git_check_if_worktree() {
    # This function intend to be only executed in chpwd().
    # Check if the current path is in git repo.
    # We would want stop this function, on some big git repos it can take some time to cd into.
    if [ -n "${skip_zsh_git}" ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # The : separated list of paths where we will run check for git repo.
    # If not set, then we will do it only for /root and /home.
    if [ "${UID}" = '0' ]; then
    # running 'git' in repo changes owner of git's index files to root, skip prompt git magic if CWD=/home/*
    git_check_if_workdir_path="${git_check_if_workdir_path:-/root:/etc}"
    else
    git_check_if_workdir_path="${git_check_if_workdir_path:-/home}"
    git_check_if_workdir_path_exclude="${git_check_if_workdir_path_exclude:-${HOME}/_sshfs}"
    fi
    if begin_with "${PWD}" ${=git_check_if_workdir_path//:/ }; then
    if ! begin_with "${PWD}" ${=git_check_if_workdir_path_exclude//:/ }; then
    local git_pwd_is_worktree_match='true'
    else
    local git_pwd_is_worktree_match='false'
    fi
    fi
    if ! [ "${git_pwd_is_worktree_match}" = 'true' ]; then
    git_pwd_is_worktree='false'
    return 1
    fi
    # todo: Prevent checking for /.git or /home/.git, if PWD=/home or PWD=/ maybe...
    # damn annoying RBAC messages about Access denied there.
    if [ -d '.git' ] || [ "$(git rev-parse --is-inside-work-tree 2> /dev/null)" = 'true' ]; then
    git_pwd_is_worktree='true'
    git_worktree_is_bare="$(git config core.bare)"
    else
    unset git_branch git_worktree_is_bare
    git_pwd_is_worktree='false'
    fi
    git_branch() {
    git_branch="$(git symbolic-ref HEAD 2>/dev/null)"
    git_branch="${git_branch##*/}"
    git_branch="${git_branch:-no branch}"
    git_dirty() {
    if [ "${git_worktree_is_bare}" = 'false' ] && [ -n "$(git status --untracked-files='no' --porcelain)" ]; then
    git_dirty='%F{green}*'
    else
    unset git_dirty
    fi
    precmd() {
    # Set terminal title.
    termtitle precmd
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    preexec() {
    # Set terminal title along with current executed command pass as second argument
    termtitle preexec "${(V)1}"
    chpwd() {
    git_check_if_worktree
    man() {
    if command -v vimmanpager >/dev/null 2>&1; then
    PAGER="vimmanpager" command man "$@"
    else
    command man "$@"
    fi
    # Are we running under grsecurity's RBAC?
    rbac_auth() {
    local auth_to_role='admin'
    if [ "${USER}" = 'root' ]; then
    if ! grep -qE '^RBAC:' "/proc/self/status" && command -v gradm > /dev/null 2>&1; then
    echo -e "\n${BLUE}*${NC} ${GREEN}RBAC${NC} Authorize to '${auth_to_role}' RBAC role."
    gradm -a "${auth_to_role}"
    fi
    fi
    #rbac_auth
    # Check if we started zsh in git worktree, useful with tmux when your new zsh may spawn in source dir.
    git_check_if_worktree
    if [ "${git_pwd_is_worktree}" = 'true' ]; then
    git_branch
    git_dirty
    git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
    else
    unset git_prompt
    fi
    # Le features!
    # extended globbing, awesome!
    setopt extendedGlob
    # zmv - a command for renaming files by means of shell patterns.
    autoload -U zmv
    # zargs, as an alternative to find -exec and xargs.
    autoload -U zargs
    # Turn on command substitution in the prompt (and parameter expansion and arithmetic expansion).
    setopt promptsubst
    # Control-x-e to open current line in $EDITOR, awesome when writting functions or editing multiline commands.
    autoload -U edit-command-line
    zle -N edit-command-line
    bindkey '^x^e' edit-command-line
    # Include user-specified configs.
    if [ ! -d "${ZSHDDIR}" ]; then
    mkdir -p "${ZSHDDIR}" && echo "# Put your user-specified config here." > "${ZSHDDIR}/example.zsh"
    fi
    for zshd in $(ls -A ${HOME}/.config/zsh.d/^*.(z)sh$); do
    . "${zshd}"
    done
    # Completion.
    autoload -Uz compinit
    compinit
    zstyle ':completion:*' matcher-list 'm:{a-z}={A-Z}'
    zstyle ':completion:*' completer _expand _complete _ignored _approximate
    zstyle ':completion:*' menu select=2
    zstyle ':completion:*' select-prompt '%SScrolling active: current selection at %p%s'
    zstyle ':completion::complete:*' use-cache 1
    zstyle ':completion:*:descriptions' format '%U%F{cyan}%d%f%u'
    # If running as root and nice >0, renice to 0.
    if [ "$USER" = 'root' ] && [ "$(cut -d ' ' -f 19 /proc/$$/stat)" -gt 0 ]; then
    renice -n 0 -p "$$" && echo "# Adjusted nice level for current shell to 0."
    fi
    # Fancy prompt.
    if over_ssh && [ -z "${TMUX}" ]; then
    prompt_is_ssh='%F{blue}[%F{red}SSH%F{blue}] '
    elif over_ssh; then
    prompt_is_ssh='%F{blue}[%F{253}SSH%F{blue}] '
    else
    unset prompt_is_ssh
    fi
    case $USER in
    root)
    PROMPT='%B%F{cyan}%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{blue}%1~${git_prompt}%F{blue} %# %b%f%k'
    PROMPT='%B%F{blue}%n@%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{cyan}%1~${git_prompt}%F{cyan} %# %b%f%k'
    esac
    # Ignore lines prefixed with '#'.
    setopt interactivecomments
    # Ignore duplicate in history.
    setopt hist_ignore_dups
    # Prevent record in history entry if preceding them with at least one space
    setopt hist_ignore_space
    # Nobody need flow control anymore. Troublesome feature.
    #stty -ixon
    setopt noflowcontrol
    # Fix for tmux on linux.
    case "$(uname -o)" in
    'GNU/Linux')
    export EVENT_NOEPOLL=1
    esac
    # Aliases
    alias cp='cp -iv'
    alias rcp='rsync -v --progress'
    alias rmv='rsync -v --progress --remove-source-files'
    alias mv='mv -iv'
    alias rm='rm -iv'
    alias rmdir='rmdir -v'
    alias ln='ln -v'
    alias chmod="chmod -c"
    alias chown="chown -c"
    if command -v colordiff > /dev/null 2>&1; then
    alias diff="colordiff -Nuar"
    else
    alias diff="diff -Nuar"
    fi
    alias grep='grep --colour=auto'
    alias egrep='egrep --colour=auto'
    alias ls='ls --color=auto --human-readable --group-directories-first --classify'
    # Keys.
    case $TERM in
    rxvt*|xterm*)
    bindkey "^[[7~" beginning-of-line #Home key
    bindkey "^[[8~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    linux)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward
    bindkey "^[[B" history-beginning-search-forward
    screen|screen-*)
    bindkey "^[[1~" beginning-of-line #Home key
    bindkey "^[[4~" end-of-line #End key
    bindkey "^[[3~" delete-char #Del key
    bindkey "^[[A" history-beginning-search-backward #Up Arrow
    bindkey "^[[B" history-beginning-search-forward #Down Arrow
    bindkey "^[Oc" forward-word # control + right arrow
    bindkey "^[Od" backward-word # control + left arrow
    bindkey "^H" backward-kill-word # control + backspace
    bindkey "^[[3^" kill-word # control + delete
    esac
    bindkey "^R" history-incremental-pattern-search-backward
    bindkey "^S" history-incremental-pattern-search-forward
    if [ -f ~/.alert ]; then cat ~/.alert; fi
    Thanks for all the help.
    Last edited by Shark (2013-05-11 22:32:24)

    Raynman wrote:
    "This expression doesn't work", "It doesn't work" ...
    Could you try being a bit more specific?
    Firstly, i am sorry i didn't post the output. I should have know better.
    Secondly, chill out.
    I have used above regex with grep command. Output from terminal is:
    zsh: bad pattern: ^[^#]
    In bash it works perfectly.
    If i issue "setopt re_match_pcre" i have the same ouput as above.
    EDIT: If i issue "unsetopt no_match" it actually works but i have to change the regex from "\^\[^#]" to "\^[^#]" otherwise i get the same output as above. In bash both options work.
    Last edited by Shark (2013-05-11 22:07:21)

  • [solved] Need a little help with sed and regular expressions

    Hello!
    I am shure this is something easy for most of you
    I want to make a script, which converts filenames of my ripped MP3s (replaces '_' with spaces, removes leading track numbers...)
    But I have some problems:
    j=$(echo $j | sed 's/_\+/ /g')
    j=$(echo $j | sed 's/^[0-9]{0,3}//g')
    j=$(echo $j | sed 's/[^ ]-[^ ]/ - /g')
    j=$(echo $j | sed 's/_\+/ /g') << this is working fine (converts all "_" to spaces)
    j=$(echo $j | sed 's/^[0-9]{0,3}//g') << is NOT working, why??
    For Example in "01-somebody_feat_someone-somemusic.mp3" the leading "01" number is NOT being removed..
    j=$(echo $j | sed 's/[^ ]-[^ ]/ - /g') << how can I insert spaces before and after the "-"?
    So that "someone-somemusic" becomes "someone - somemusic" (but only where "-" is surrounded by letters)
    Last edited by cyberius (2011-07-27 18:50:54)

    For sed, you must escape { and } to use them as you want (just slap a \ before them).
    For the last expression, capture the letter before/after the dash -- use \( and \) -- and then substitute it for something like "\1 -" and then "- \1". You'll want to split this into two pieces, one for the front and one for the back so you can get "somemusic -someband" the way you want without a bunch of cases.
    Edit: Or, you could just do a replace for "-" to be " - " and then have another expression to reduce spaces. I see you've used \+ before, so I'm guessing you can figure that out
    Also, sed has the -e switch so you can do multiple different expressions with one invocation.
    Also (also), have you looked into something like Picard with automatic track renaming? You can even customize how they are renamed.
    Edit (2): Also^3, check out prename. There are different versions, ones which use PCRE and ones that use other standards, but it is for renaming files based on regular expressions, which is what you're doing. In any case, you might want to put you script into the User made scripts thread when you feel more comfortable and get some more critiquing, if you're interested.
    Last edited by jac (2011-07-26 23:13:27)

  • Regular expressions in Format Definition add-on

    Hello experts,
    I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
    Summary of my problem:
    In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
    :61:071222D208,00N026
    :86:P  12345678BELASTINGDIENST       F8R03782497                $GH
    $0000009                         BETALINGSKENM. 123456789123456
    0 1234567891234560                                            
    :61:071225C758,70N078
    :86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
    CITY 48772-54314                                                  
    :61:071225C425,05N078
    :86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
    LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    
    :61:071225C850,00N078
    :86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
    DERNR. 53846 REF. MAIL 21-02
    - I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
    Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
    - a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
    - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
    - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
    I am looking forward to the right solutions, I can give more info if you need any.

    Hello Hendri,
    Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
    Answer: Format Definition uses .Net regular expression.
    You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
    Example 6
    Description:
    To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
    Regular expression:
    (?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
    Text:
    :61:0711211121C216,08N051NONREF
    Matches:
    1
    Tips:
    1.     All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D  (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
    Example 7
    Description:
    Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
    Regular expression:
    (?<=\b)(?<!\.)\d+(?=\b)(?!\.)
    Text:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    Matches:
    6
    Tips:
    1.     The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
    2.     It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
    :86:GIRO  6890316
    ENERGETICA NATURA BENELU
    AFRIKAWEG 14
    HULST
    3187-A1176
    TRANSACTIEDATUM* 03-07-2007
    You may lose some real values like u201C3187u201D before the u201C-u201D.
    Example 8
    Description:
    Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
    Regular expression:
    (?<=^(P|0))\d
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
    2.     u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
    :86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
    The regular expression becomes
    (?<=:86:(P|0))\d
    Example 9
    Description:
    Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
    Regular expression:
    (?<=^(P|0)\d)[a-zA-Z. ]*
    Text:
    0000006681 FORTIS ASR BETALINGSCENTRUM BV
    Matches:
    1
    Tips:
    1.     In this case, put BP account number regular expression into the look behind assertion.
    Example 10
    Description:
    Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc.  A possible document identification sub-record is made up of the following parts:
    u2022     keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
    u2022     an optional group made up of following:
         a separator of either a dot, hyphen or slash, and
         an optional space, and
         an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
         an optional space
    u2022     and finally document identification in digits
    Regular expression:
    (?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
    Kind Regards
    -Yatsea

  • Urgent!!! Problem in regular expression for matching braces

    Hi,
    For the example below, can I write a regular expression to store getting key, value pairs.
    example: ((abc def) (ghi jkl) (a ((b c) (d e))) (mno pqr) (a ((abc def))))
    in the above example
    abc is key & def is value
    ghi is key & jkl is value
    a is key & ((b c) (d e)) is value
    and so on.
    can anybody pls help me in resolving this problem using regular expressions...
    Thanks in advance

    "((key1 value1) (key2 value2) (key3 ((key4 value4)
    (key5 value5))) (key6 value6) (key7 ((key8 value8)
    (key9 value9))))"
    I want to write a regular expression in java to parse
    the above string and store the result in hash table
    as below
    key1 value1
    key2 value2
    key3 ((key4 value4) (key5 value5))
    key4 value4
    key5 value5
    key6 value6
    key7 ((key8 value8) (key9 value9))
    key8 value8
    key9 value9
    please let me know, if it is not possible with
    regular expressions the effective way of solving itYes, it is possible with a recursive regular expression.
    Unfortunately Java does not provide a recursive regular expression construct.
    $_ = "((key1 value1) (key2 value2) (key3 ((key4 value4) (key5 value5))) (key6 value6) (key7 ((key8 value8) (key9 value9))))";
    my $paren;
       $paren = qr/
               [^()]+  # Not parens
             |
               (??{ $paren })  # Another balanced group (not interpolated yet)
        /x;
    my $r = qr/^(.*?)\((\w+?) (\w+?|(??{$paren}))\)\s*(.*?)$/;
    while ($_) {
         match()
    # operates on $_
    sub match {
         my @v;
         @v = m/$r/;
         if (defined $v[3]) {
              $_ = $v[2];
              while (/\(/) {
                   match();
              print "\"",$v[1],"\" \"",$v[2],"\"";
              $_ = $v[0].$v[3];
         else { $_ = ""; }
    C:\usr\schodtt\src\java\forum\n00b\regex>perl recurse.pl
    "key1" "value1"
    "key2" "value2"
    "key4" "value4"
    "key5" "value5"
    "key3" "((key4 value4) (key5 value5))"
    "key6" "value6"
    "key8" "value8"
    "key9" "value9"
    "key7" "((key8 value8) (key9 value9))"
    C:\usr\schodtt\src\java\forum\n00b\regex>

  • Bracket in Regular Expression constant?

    I am a bit puzzled by the behavior I am experiencing in LV 2011. I hope to get some light from experts out there.
    I am trying to parse a messy ASCII header file and after having split it into individual lines (strings), I use the "Match Regular Expression" function to remove some of the info before the substantial information.
    Some of the strings include square brackets ([, ]), which are special characters for the function, therefore, as documented in the help, one needs to precede them with a backslash.
    Example:
    I want to parse the following line:
       #PR [PR_DEV,I,2]
    One way (which I am using because of considerations related to the rest of the header) is the the following:
    Note that the first string constant is using "Code Display" whereas the second one is using "Normal Display".
    Why did I not put a backslash in front of the bracket in the first string, you may ask? Well, I did, but it disappeared after I typed the other characters. And reverting to "Normal Display" did not restore it.
    Of course, the first version does not parse the input string correctly, whereas the second one does it fine.
    In other words, the custom display string (which is convenient for cryptic codes such as \s* or to distinguish between space and tab...or simply ENTER tabs!) seems to mess up with the \[ combo (likewise with the \] one).
    It is not a huge deal. I can use the "Normal Display" mode, but I tend to think that this qualifies as a hidden "feature". And again, it is still a pain in the ... when dealing with special characters such as tabs, etc...
    Solved!
    Go to Solution.

    I think that [ is a special character which needs to be preceded by a backslash, but it is not one of the defined backslash characters (like \s). So, you need to put in two \\ to get one \ while in '\' Codes Display.
    You can put in any character by using \xx where the xx is a hex character using only upper case letters for A..F.  I converted the strings to byte arrays and tried to see what made the arrays match and the Match work.
    Lynn

  • Litte help with regular expression?

    Greetings all,
    I have a simple regular expression "(\\w+)\\s(\\w+)\\s(.+)"
    Which I want to match against the strings like "Acetobacter pasteurianus LMD22.1"
    But this always fails whenever there is a dot (.) character like "LMD22.1" in above string.
    How to solve this ?
    Thanks in advance.

    Shouldn't that be Acinetobacter?
    edit: nope, I'm wrong, you're right.
    Edited by: Encephalopathic on Apr 7, 2009 7:34 PM

  • In a Regular expressions I can set up an "OR" statement?

    HI, I'm using e-tester version 8.2..My problem is about regular expessions.
    I need to catch a dynamic value from a Form Field (My-TxtBox).. this textbox gets pre populated data ( when the customer has info in the database).. this is the code:
    *<input name="My-TxtBox" type="text" value="XXXX" id="ID-TxtBox"*
    ---- ( XXXX = dynamic number prepopulated by the web app)
    So, I'm using this regular expression:
    *<input name="My-TxtBox" type="text" value="(.+?)" id="ID-TxtBox"*
    -----At this point, everything is fine.. but there is an exception
    I'm getting a big problem when the customer doesn't have data in the server, the code is NOT like this
    <input name="My-TxtBox" type="text" value="" id="ID-TxtBox"
    When the customer doesn't have data in the server, the code is more like this:
    *<input name="My-TxtBox" type="text" id="ID-TxtBox"*
    ---- (please note that there is not a value parameter now)
    So, I think there is no way to create a CDV that will work for both cases? any idea to solve this?
    i was thinking that maybe in the reg exps sintax you can create an "OR" statement.. my idea was to create a CDV
    that works for both cases.. when there is the "value=" string and when there is not.
    something like this
    This CDV returns the dynamic value when there is the "value=" string =
    <input name="My-TxtBox" type="text" value="(.+?)" id="ID-TxtBox"
    And this CDV returns "" when there is no "value=" string = Without value:
    <input name="My-TxtBox" type="text" (.*?)id="ID-TxtBox"
    My idea is to place something like this in some point of the CDV = *{* value="(.+?)" *OR* (.*?) *}*
    so my dream is to create a CDV similar to this:
    <input name="My-TxtBox" type="text" { value="(.+?)" *OR* (.*?) }id="ID-TxtBox
    I was searching on google but I simply don't get an answer....it is posible to place an OR statement into a Reg Exp and how the sintax is? ..
    Regards.. I appreciate your time.

    Hola,
    You can use a regular expression such as:
    <input name="My-TxtBox" type="text" ?v?a?l?u?e?=?"?(.*?)"? id="ID-TxtBox"
    Note that:
    * Note that there is a question mark sign (?) after each letter that may appear in the string. This means that the letter may or may not appear.
    * Note that instead of using (.+?) you should use (.*?).
    This means that will match any character that appears zero or mutliple times. The question mark here means that is non-greedy, meaning that it will not include in the .* matching anything like the rest of the pattern (in this case the rest of the pattern is "? id="ID-TxtBox").
    * Note that the question mark in front of the v of value is there to match a space that may or may not exists.
    Few other facts:
    * In regular expressions the parenthesys determine sequence of operations and mark groups. Such groups can be referenced in code (not in etester but in general). In eTester it will always get the value of the first group (first group = first set of parenthesys).
    * ORs in regular expressions can be expressed with the pipe "|" (without the quotes), but you will need parenthesys in this case which would not allow you to capture the group of characters that you want.
    Regards,
    [Z]{1}uriel C?
    Edited by: Zuriel on Oct 5, 2009 3:07 PM
    Edited to avoid having the text changed by the forum formatting options.

  • Problem in creating a Regular Expression with gnu

    Hi All,
    iam trying to create a regular expression using gnu package api..
    gnu.regex.RE;
    i need to validate the browser's(MSIE) userAgent through my regular expression
    userAgent is like :First one ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
    i wrote an regular expression like this:
    Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)
    Actaully this is validating my userAgent and returns true, my problem is, it is returning true if userAgent is having more words at the end after Windows NT 5.0 like Second One ==> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Testing
    i want the regularExpression pattern to validate the First one and return true for it, and has to return false for the Second one..
    my code is:
    import gnu.regexp.*;
    import gnu.regexp.REException;
    public class TestRegexp
    public static boolean getUserAgentDetails(String userAgent)
         boolean isvalid = false;
         RE regexp = new RE("Mozilla.*(.*)\\s*(.*)compatible;\\s*MSIE(.*)\\s*(.*)([0-9]\\.[0-9])(.*);\\s*(.*)Windows(.*)\\s*NT(.*)\\s*5.0(.*)");
         isvalid = regexp.isMatch(userAgent);
         return isvalid;
    public static void main(String a[])
         String userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)";
         boolean regoutput = getUserAgentDetails(userAgent);
         System.out.println("***** regoutput is ****** " + regoutput);
    }please help me in solving this..
    Thanks in Advance..
    thanx,
    krishna

    Ofcourse, i can do comparision with simple string matching..
    but problem is the userAgent that i want to support is for all the MSIE versions ranging from 5.0 onwards, so there will the version difference of IE like MSIE 6.0..! or MSIE 5.5 some thing like that..
    any ways i will try with StringTokenizer once..!
    seems that will do my work..
    Thanks,
    krishna

  • Introduction to regular expressions ...

    I'm well aware that there are already some articles on that topic, some people asked me to share some of my knowledge on this topic. Please take a look at this first part and let me know if you find this useful. If yes, I'm going to continue on writing more parts using more and more complicated expressions - if you have questions or problems that you think could be solved through regular expression, please post them.
    Introduction
    Oracle has always provided some character/string functions in its PL/SQL command set, such as SUBSTR, REPLACE or TRANSLATE. With 10g, Oracle finally gave us, the users, the developers and of course the DBAs regular expressions. However, regular expressions, due to their sometimes cryptic rules, seem to be overlooked quite often, despite the existence of some very interesing use cases. Beeing one of the advocates of regular expression, I thought I'll give the interested audience an introduction to these new functions in several installments.
    Having fun with regular expressions - Part 1
    Oracle offers the use of regular expression through several functions: REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_LIKE. The second part of each function already gives away its purpose: INSTR for finding a position inside a string, SUBSTR for extracting a part of a string, REPLACE for replacing parts of a string. REGEXP_LIKE is a special case since it could be compared to the LIKE operator and is therefore usually used in comparisons like IF statements or WHERE clauses.
    Regular expressions excel, in my opinion, in search and extraction of strings, using that for finding or replacing certain strings or check for certain formatting criterias. They're not very good at formatting strings itself, except for some special cases I'm going to demonstrate.
    If you're not familiar with regular expression, you should take a look at the definition in Oracle's user guide Using Regular Expressions With Oracle Database, and please note that there have been some changes and advancements in 10g2. I'll provide examples, that should work on both versions.
    Some of you probably already encountered this problem: checking a number inside a string, because, for whatever reason, a column was defined as VARCHAR2 and not as NUMBER as one would have expected.
    Let's check for all rows where column col1 does NOT include an unsigned integer. I'll use this SELECT for demonstrating different values and search patterns:
    WITH t AS (SELECT '456' col1
                 FROM dual
                UNION
               SELECT '123x'
                 FROM dual
                UNION  
               SELECT 'x123'
                 FROM dual
                UNION 
               SELECT 'y'
                 FROM dual
                UNION 
               SELECT '+789'
                 FROM dual
                UNION 
               SELECT '-789'
                 FROM dual
                UNION 
               SELECT '159-'
                 FROM dual
                UNION 
               SELECT '-1-'
                 FROM dual
    SELECT t.col1
      FROM t
    WHERE NOT REGEXP_LIKE(t.col1, '^[0-9]+$')
    ;Let's take a look at the 2nd argument of this REGEXP function: '^[0-9]+$'. Translated it would mean: start at the beginning of the string, check if there's one or more characters in the range between '0' and '9' (also called a matching character list) until the end of this string. "^", "[", "]", "+", "$" are all Metacharacters.
    To understand regular expressions, you have to "think" in regular expressions. Each regular expression tries to "fit" an available string into its pattern and returns a result beeing successful or not, depending on the function. The "art" of using regular expressions is to construct the right search pattern for a certain task. Using functions like TRANSLATE or REPLACE did already teach you using search patterns, regular expressions are just an extension to this paradigma. Another side note: most of the search patterns are placeholders for single characters, not strings.
    I'll take this example a bit further. What would happen if we would remove the "$" in our example? "$" means: (until the) end of a string. Without this, this expression would only search digits from the beginning until it encounters either another character or the end of the string. So this time, '123x' would be removed from the SELECTION since it does fit into the pattern.
    Another change: we will keep the "$" but remove the "^". This character has several meanings, but in this case it declares: (start from the) beginning of a string. Without it, the function will search for a part of a string that has only digits until the end of the searched string. 'x123' would now be removed from our selection.
    Now there's a question: what happens if I remove both, "^" and "$"? Well, just think about it. We now ask to find any string that contains at least one or more digits, so both '123x' and 'x123' will not show up in the result.
    So what if I want to look for signed integer, since "+" is also used for a search expression. Escaping is the name of the game. We'll just use '^\+[0-9]+$' Did you notice the "\" before the first "+"? This is now a search pattern for the plus sign.
    Should signed integers include negative numbers as well? Of course they should, and I'll once again use a matching character list. In this list, I don't need to do escaping, although it is possible. The result string would now look like this: '^[+-]?[0-9]+$'. Did you notice the "?"? This is another metacharacter that changes the placeholder for plus and minus to an optional placeholder, which means: if there's a "+" or "-", that's ok, if there's none, that's also ok. Only if there's a different character, then again the search pattern will fail.
    Addendum: From this on, I found a mistake in my examples. If you would have tested my old examples with test data that would have included multiple signs strings, like "--", "-+", "++", they would have been filtered by the SELECT statement. I mistakenly used the "*" instead of the "?" operator. The reason why this is a bad idea, can also be found in the user guide: the "*" meta character is defined as 0 to multiple occurrences.
    Looking at the values, one could ask the question: what about the integers with a trailing sign? Quite simple, right? Let's just add another '[+-] and the search pattern would look like this: '^[+-]?[0-9]+[+-]?$'.
    Wait a minute, what happened to the row with the column value "-1-"?
    You probably already guessed it: the new pattern qualifies this one also as a valid string. I could now split this pattern into several conditions combined through a logical OR, but there's something even better: a logical OR inside the regular expression. It's symbol is "|", the pipe sign.
    Changing the search pattern again to something like this '^[+-]?[0-9]+$|^[0-9]+[+-]?$' [1] would return now the "-1-" value. Do I have to duplicate the same elements like "^" and "$", what about more complicated, repeating elements in future examples? That's where subexpressions/grouping comes into play. If I want only certain parts of the search pattern using an OR operator, we can put those inside round brackets. '^([+-]?[0-9]+|[0-9]+[+-]?)$' serves the same purpose and allows for further checks without duplicating the whole pattern.
    Now looking for integers is nice, but what about decimal numbers? Those may be a bit more complicated, but all I have to do is again to think in (meta) characters. I'll just use an example where the decimal point is represented by ".", which again needs escaping, since it's also the place holder in regular expressions for "any character".
    Valid decimals in my example would be ".0", "0.0", "0.", "0" (integer of course) but not ".". If you want, you can test it with the TO_NUMBER function. Finding such an unsigned decimal number could then be formulated like this: from the beginning of a string we will either allow a decimal point plus any number of digits OR at least one digits plus an optional decimal point followed by optional any number of digits. Think about it for a minute, how would you formulate such a search pattern?
    Compare your solution to this one:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)?)$'
    Addendum: Here I have to use both "?" and "*" to make sure, that I can have 0 to many digits after the decimal point, but only 0 to 1 occurrence of this substrings. Otherwise, strings like "1.9.9.9" would be possible, if I would write it like this:
    '^(\.[0-9]+|[0-9]+(\.[0-9]*)*)$'Some of you now might say: Hey, what about signed decimal numbers? You could of course combine all the ideas so far and you will end up with a very long and almost unreadable search pattern, or you start combining several regular expression functions. Think about it: Why put all the search patterns into one function? Why not split those into several steps like "check for a valid decimal" and "check for sign".
    I'll just use another SELECT to show what I want to do:
    WITH t AS (SELECT '0' col1
                 FROM dual
                UNION
               SELECT '0.' 
                 FROM dual
                UNION
               SELECT '.0' 
                 FROM dual
                UNION
               SELECT '0.0' 
                 FROM dual
                UNION
               SELECT '-1.0' 
                 FROM dual
                UNION
               SELECT '.1-' 
                 FROM dual
                UNION
               SELECT '.' 
                 FROM dual
                UNION
               SELECT '-1.1-' 
                 FROM dual
    SELECT t.*
      FROM t
    ;From this select, the only rows I need to find are those with the column values "." and "-1.1-". I'll start this with a check for valid signs. Since I want to combine this with the check for valid decimals, I'll first try to extract a substring with valid signs through the REGEXP_SUBSTR function:
    NVL(REGEXP_SUBSTR(t.col1, '^([+-]?[^+-]+|[^+-]+[+-]?)$'), ' ')Remember the OR operator and the matching character collections? But several "^"? Some of the meta characters inside a search pattern can have different meanings, depending on their positions and combination with other meta characters. In this case, the pattern translates into: from the beginning of the string search for "+" or "-" followed by at least another character that is not "+" or "-". The second pattern after the "|" OR operator does the same for a sign at the end of the string.
    This only checks for a sign but not if there also only digits and a decimal point inside the string. If the search string fails, for example when we have more than one sign like in the "-1.1-", the function returns NULL. NULL and LIKE don't go together very well, so we'll just add NVL with a default value that tells the LIKE to ignore this string, in this case a space.
    All we have to do now is to combine the check for the sign and the check for a valid decimal number, but don't forget an option for the signs at the beginning or end of the string, otherwise your second check will fail on the signed decimals. Are you ready?
    Does your solution look a bit like this?
    WHERE NOT REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                               '^([+-]?[^+-]+|[^+-]+[+-]?)$'),
                           '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
                          )Now the optional sign checks in the REGEXP_LIKE argument can be added to both ends, since the SUBSTR won't allow any string with signs on both ends. Thinking in regular expression again.
    Continued in Introduction to regular expressions ... continued.
    C.
    Fixed some embarrassing typos ... and mistakes.
    cd

    Excellent write up CD. Very nice indeed. Hopefully you'll be completing parts 2 and 3 some time soon. And with any luck, your article will encourage others to do the same....I know there's a few I'd like to see and a few I'd like to have a go at writing too :-)

  • Use regular expressions to extract .llb filename from path

    I am trying to be clever (always a dangerous thing) and use a regular expression to extract the name of a library from a filepath converted to a string.   Whilst I appreciate there are other ways to do this, regex would appear to be a very powerful neat way, should I be able to get it to work.
    i.e if I have a string of the type, C:\applications\versions\library.llb\toplevel.vi, I want to be able to extract library.llb from the string, given that it will be of variable length, may include numbers & spaces and may be within a file hierarchy of variable depth.   In other words, I want to extract the portion of the string between the last \ that ends with .llb
    The best I have managed so far, is \\+.*llb which returned everything minus the drive letter and the toplevel.vi
    Can anyone help me achieve this, or am i better using an alternative method (e.g filepath to array string, search for .llb)
    Thanks
    Matt
    Solved!
    Go to Solution.

    Hi Matt,
    attached you'll find two other options.
    Mike
    Message Edited by MikeS81 on 04-13-2010 01:30 PM
    Attachments:
    Path.PNG ‏7 KB

  • Introduction to regular expressions ... continued.

    After some very positive feedback from Introduction to regular expressions ... I'm now continuing on this topic for the interested audience. As always, if you have questions or problems that you think could be solved through regular expression, please post them.
    Having fun with regular expressions - Part 2
    Finishing my example with decimal numbers, I thought about a method to test regular expressions. A question from another user who was looking for a way to show all possible combinations inspired me in writing a small package.
    CREATE OR REPLACE PACKAGE regex_utils AS
      -- Regular Expression Utilities
      -- Version 0.1
      TYPE t_outrec IS RECORD(
        data VARCHAR2(255)
      TYPE t_outtab IS TABLE OF t_outrec;
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED;
    END regex_utils;
    CREATE OR REPLACE PACKAGE BODY regex_utils AS
    -- FUNCTION gen_data returns a collection of generated varchar2 elements
      FUNCTION gen_data(
        p_charset IN VARCHAR2 -- character set that is used for generation
      , p_length  IN NUMBER   -- length of the generated
      ) RETURN t_outtab PIPELINED
      IS
        TYPE t_counter IS TABLE OF PLS_INTEGER INDEX BY PLS_INTEGER;
        v_counter t_counter;
        v_exit    BOOLEAN;
        v_string  VARCHAR2(255);
        v_outrec  t_outrec;
      BEGIN
        FOR max_length IN 1..p_length 
        LOOP
          -- init counter loop
          FOR i IN 1..max_length
          LOOP
            v_counter(i) := 1;
          END LOOP;
          -- start data generation loop
          v_exit := FALSE;
          WHILE NOT v_exit
          LOOP
            -- start generation
            v_string := '';
            FOR i IN 1..max_length
            LOOP
              v_string := v_string || SUBSTR(p_charset, v_counter(i), 1);
            END LOOP;
            -- set outgoing record
            v_outrec.data := v_string;
            -- now pipe the result
            PIPE ROW(v_outrec);
            -- increment loop
            <<inc_loop>>
            FOR i IN REVERSE 1..max_length
            LOOP
              v_counter(i) := v_counter(i) + 1;     
              IF v_counter(i) > LENGTH(p_charset) THEN
                 IF i > 1 THEN
                    v_counter(i) := 1;
                 ELSE
                    v_exit := TRUE;  
                 END IF;
              ELSE
                 -- no further processing required
                 EXIT inc_loop;  
              END IF;  
            END LOOP;        
          END LOOP; 
        END LOOP; 
      END gen_data;
    END regex_utils;
    /This package is a brute force string generator using all possible combinations of a characters in a string up to a maximum length. Together with the regular expressions, I can now show what combinations my solution would allow to pass. But see for yourself:
    SELECT *
      FROM (SELECT data col1
              FROM TABLE(regex_utils.gen_data('+-.0', 5))
           ) t
    WHERE REGEXP_LIKE(NVL(REGEXP_SUBSTR(t.col1,
                                         '^([+-]?[^+-]+|[^+-]+[+-]?)$'
                       '^[+-]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)[+-]?$'
    ;You will see some results, which are perfectly valid for my definition of decimal numbers but haven't been mentioned, like '000' or '+.00'. From now on I will also use this package to verify the solutions I'll present to you and hopefully reduce my share of typos.
    Counting and finding certain characters or words in a string can be a tedious task. I'll show you how it's done with regular expressions. I'll start with an easy example, count all spaces in the string "Having fun with regular expressions.":
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '[^ ]')), 0)
      FROM dual
      ;No surprise there. I'm replacing all characters except spaces with a null string. Since REGEXP_REPLACE assumes a NULL string as replacement argument, I can save on adding a third argument, which would look like this:
    REGEXP_REPLACE('Having fun with regular expressions', '[^ ]', '')So REPLACE will return all the spaces which we can count with the LENGTH function. If there aren't any, I will get a NULL string, which is checked by the NVL function. If you want you can play around by changing the space character to somethin else.
    A variation of this theme could be counting the number of words. Counting spaces and adding 1 to this result could be misleading if there are duplicate spaces. Thanks to regular expressions, I can of course eliminate duplicates.
    Using the old method on the string "Having fun with regular expressions" would return anything but the right number. This is, where Backreferences come into play. REGEXP_REPLACE uses them in the replacement argument, a backslash plus a single digit, like this: '\1'. To reference a string in a search pattern, I have to use subexpressions (remember the round brackets?).
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions', '( )\1*|.', '\1')))
      FROM dual
      ;You may have noticed that I changed from using the "^" as a NOT operator to using the "|" OR operator and the "." any character placeholder. This neat little trick allows to filter all other characters except the one we're looking in the first place. "\1" as backreference is outside of our subexpression since I don't want to count the trailing spaces and is used both in the search pattern and the replacement argument.
    Still I'm not satisfied with this: What about leading/trailing blanks, what if there are any special characters, numbers, etc.? Finally, it's time to only count words. For the purpose of this demonstration, I define a word as one or more consecutive letters. If by now you're already thinking in regular expressions, the solution is not far away. One hint: you may want to check on the "i" match parameter which allows for case insensitive search. Another one: You won't need a back reference in the search pattern this time.
    Let's compare our solutions than, shall we?
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having  fun  with  regular  expressions.  !',
                                     '([a-z])+|.', '\1', 1, 0, 'i')), 0)
      FROM dual;This time I don't use a backreference, the "+" operator (remember? 1 or more) will suffice. And since I want to count the occurences, not the letters, I moved the "+" meta character outside of the subexpression. The "|." trick again proved to be useful.
    Case insensitive search does have its merits. It will only search but not transform the any found substring. If I want, for example, extract any occurence of the word fun, I'll just use the "i" match parameter and get this substring, whether it's written as "Fun", "FUN" or "fun". Can be very useful if you're looking for example for names of customers, streets, etc.
    Enough about counting, how about finding? What if I want to know the last occurence of a certain character or string, for example the postition of the last space in this string "Where is the last space?"?
    Addendum: Thanks to another forum member, I should mention that using the INSTR function can do a reverse search by itself.[i]
    WITH t AS (SELECT 'Where is the last space?' col1
                 FROM dual)
    SELECT INSTR(col1, ' ', -1)
      FROM DUAL;Now regular expressions are powerful, but there is no parameter that allows us to reverse the search direction. However, remembering that we have the "$" meta character that means (until the) end of string, all I have to do is use a search pattern that looks for a combination of space and non-space characters including the end of a string. Now compare the REGEXP_INSTR function to the previous solution:
    SELECT REGEXP_INSTR(t.col1, ' [^ ]*$')                       
      FROM t;So in this case, it'll remain a matter of taste what you want to use. If the search pattern has to look for the last occurrence of another regular expression, this is the way to solve such a requirement.
    One more thing about backreferences. They can be used for a sort of primitive "string swapping". If for example you have to transform column values like swapping first and last name, backreferenc is your friend. Here's an example:
    SELECT REGEXP_REPLACE('John Doe', '^(.*) (.*)$', '\2, \1')
      FROM dual
      ;What about middle names, for example 'John J. Doe'? Look for yourself, it still works.
    You can even use that for strings with delimiters, for example reversing delimited "fields" like in this string '10~20~30~40~50' into '50~40~30~20~10'. Using REVERSE, I would get '05~04~03~02~01', so there has to be another way. Using backreferences however is limited to 9 subexpressions, which limits the following solution a bit, if you need to process strings with more than 9 fields. If you want, you can think this example through and see if your solution matches mine.
    SELECT REGEXP_REPLACE('10~20~30~40~50',
                          '^(.*)~(.*)~(.*)~(.*)~(.*)$',
                          '\5~\4~\3~\2~\1'
      FROM dual;After what you've learned so far, that wasn't too hard, was it? Enough for now ...
    Continued in Introduction to regular expressions ... last part..
    C.
    Fixed some typos and a flawed example ...
    cd

    Thank you very much C. Awaiting other parts.... keep going.
    One german typo :-)
    I'm replacing all characters except spaces mit anull string.I received a functional spec from my Dutch analyst in which it is written
    tnsnames voor EDWH:
    PCESCRD1 = (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
                                                   (HOST=blah.blah.blah.com)
                                                   (PORT=5227)))
               (CONNECT_DATA=(SID=pcescrd1)))
    db user: BW_I2_VIEWER  / BW_I2_VIEWER_SCRD1Had to look for translators.
    Cheers
    Sarma.

  • Introduction to regular expressions ... last part.

    Continued from Introduction to regular expressions ... continued., here's the third and final part of my introduction to regular expressions. As always, if you find mistakes or have examples that you think could be solved through regular expressions, please post them.
    Having fun with regular expressions - Part 3
    In some cases, I may have to search for different values in the same column. If the searched values are fixed, I can use the logical OR operator or the IN clause, like in this example (using my brute force data generator from part 2):
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE data IN ('abc', 'xyz', '012');There are of course some workarounds as presented in this asktom thread but for a quick solution, there's of course an alternative approach available. Remember the "|" pipe symbol as OR operator inside regular expressions? Take a look at this:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)$')
    ;I can even use strings composed of values like 'abc, xyz ,  012' by simply using another regular expression to replace "," and spaces with the "|" pipe symbol. After reading part 1 and 2 that shouldn't be too hard, right? Here's my "thinking in regular expression": Replace every "," and 0 or more leading/trailing spaces.
    Ready to try your own solution?
    Does it look like this?
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(' || REGEXP_REPLACE('abc, xyz ,  012', ' *, *', '|') || ')$')
    ;If I wouldn't use the "^" and "$" metacharacter, this SELECT would search for any occurence inside the data column, which could be useful if I wanted to combine LIKE and IN clause. Take a look at this example where I'm looking for 'abc%', 'xyz%' or '012%' and adding a case insensitive match parameter to it:
    SELECT data
      FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE REGEXP_LIKE(data, '^(abc|xyz|012)', 'i')
    ; An equivalent non regular expression solution would have to look like this, not mentioning other options with adding an extra "," and using the INSTR function:
    SELECT data
      FROM (SELECT data, LOWER(DATA) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search LIKE 'abc%'
        OR search LIKE 'xyz%'
        OR search LIKE '012%'
    SELECT data
      FROM (SELECT data, SUBSTR(LOWER(DATA), 1, 3) search
              FROM TABLE(regex_utils.gen_data('abcxyz012', 4))
    WHERE search IN ('abc', 'xyz', '012')
    ;  I'll leave it to your imagination how a complete non regular example with 'abc, xyz ,  012' as search condition would look like.
    As mentioned in the first part, regular expressions are not very good at formatting, except for some selected examples, such as phone numbers, which in my demonstration, have different formats. Using regular expressions, I can change them to a uniform representation:
    WITH t AS (SELECT '123-4567' phone
                 FROM dual
                UNION
               SELECT '01 345678'
                 FROM dual
                UNION
               SELECT '7 87 8787'
                 FROM dual
    SELECT t.phone, REGEXP_REPLACE(REGEXP_REPLACE(phone, '[^0-9]'), '(.{3})(.*)', '(\1)-\2')
      FROM t
    ;First, all non digit characters are beeing filtered, afterwards the remaining string is put into a "(xxx)-xxxx" format, but not cutting off any phone numbers that have more than 7 digits. Using such a conversion could also be used to check the validity of entered data, and updating the value with a uniform format afterwards.
    Thinking about it, why not use regular expressions to check other values about their formats? How about an IP4 address? I'll do this step by step, using 127.0.0.1 as the final test case.
    First I want to make sure, that each of the 4 parts of an IP address remains in the range between 0-255. Regular expressions are good at string matching but they don't allow any numeric comparisons. What valid strings do I have to take into consideration?
    Single digit values: 0-9
    Double digit values: 00-99
    Triple digit values: 000-199, 200-255 (this one will be the trickiest part)
    So far, I will have to use the "|" pipe operator to match all of the allowed combinations. I'll use my brute force generator to check if my solution works for a single value:
    SELECT data
      FROM TABLE(regex_utils.gen_data('0123456789', 3))
    WHERE REGEXP_LIKE(data, '^(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$') 
    ; More than 255 records? Leading zeros are allowed, but checking on all the records, there's no value above 255. First step accomplished. The second part is to make sure, that there are 4 such values, delimited by a "." dot. So I have to check for 0-255 plus a dot 3 times and then check for another 0-255 value. Doesn't sound to complicated, does it?
    Using first my brute force generator, I'll check if I've missed any possible combination:
    SELECT data
      FROM TABLE(regex_utils.gen_data('03.', 15))
    WHERE REGEXP_LIKE(data,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  Looks good to me. Let's check on some sample data:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
                UNION 
               SELECT '256.128.64.32'
                 FROM dual            
    SELECT t.ip
      FROM t WHERE REGEXP_LIKE(t.ip,
                       '^((25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})$'
    ;  No surprises here. I can take this example a bit further and try to format valid addresses to a uniform representation, as shown in the phone number example. My goal is to display every ip address in the "xxx.xxx.xxx.xxx" format, using leading zeros for 2 and 1 digit values.
    Regular expressions don't have any format models like for example the TO_CHAR function, so how could this be achieved? Thinking in regular expressions, I first have to find a way to make sure, that each single number is at least three digits wide. Using my example, this could look like this:
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2')
      FROM t
    ;  Look at this: leading zeros. However, that first value "00127" doesn't look to good, does it? If you thought about using a second regular expression function to remove any excess zeros, you're absolutely right. Just take the past examples and think in regular expressions. Did you come up with something like this?
    WITH t AS (SELECT '127.0.0.1' ip
                 FROM dual
    SELECT t.ip, REGEXP_REPLACE(REGEXP_REPLACE(t.ip, '([0-9]+)(\.?)', '00\1\2'),
                                '[0-9]*([0-9]{3})(\.?)', '\1\2'
      FROM t
    ;  Think about the possibilities: Now you can sort a table with unformatted IP addresses, if that is a requirement in your application or you find other values where you can use that "trick".
    Since I'm on checking INET (internet) type of values, let's do some more, for example an e-mail address. I'll keep it simple and will only check on the
    "[email protected]", "[email protected]" and "[email protected]" format, where x represents an alphanumeric character. If you want, you can look up the corresponding RFC definition and try to build your own regular expression for that one.
    Now back to this one: At least one alphanumeric character followed by an "@" at sign which is followed by at least one alphanumeric character followed by a "." dot and exactly 3 more alphanumeric characters or 2 more characters followed by a "." dot and another 2 characters. This should be an easy one, right? Use some sample e-mail addresses and my brute force generator, you should be able to verify your solution.
    Here's mine:
    SELECT data
      FROM TABLE(regex_utils.gen_data('a1@.', 9))
    WHERE REGEXP_LIKE(data, '^[[:alnum:]]+@[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})$', 'i'); Checking on valid domains, in my opinion, should be done in a second function, to keep the checks by itself simple, but that's probably a discussion about readability and taste.
    How about checking a valid URL? I can reuse some parts of the e-mail example and only have to decide what type of URLs I want, for example "http://", "https://" and "ftp://", any subdomain and a "/" after the domain. Using the case insensitive match parameter, this shouldn't take too long, and I can use this thread's URL as a test value. But take a minute to figure that one out for yourself.
    Does it look like this?
    WITH t AS (SELECT 'Introduction to regular expressions ... last part. URL
                 FROM dual
                UNION
               SELECT 'http://x/'
                 FROM dual
    SELECT t.URL
      FROM t
    WHERE REGEXP_LIKE(t.URL, '^(https*|ftp)://(.+\.)*[[:alnum:]]+(\.[[:alnum:]]{3,4}|(\.[[:alnum:]]{2}){2})/', 'i')
    Update: Improvements in 10g2
    All of you, who are using 10g2 or XE (which includes some of 10g2 features) may want to take a look at several improvements in this version. First of all, there are new, perl influenced meta characters.
    Rewriting my example from the first lesson, the WHERE clause would look like this:
    WHERE NOT REGEXP_LIKE(t.col1, '^\d+$')Or my example with searching decimal numbers:
    '^(\.\d+|\d+(\.\d*)?)$'Saves some space, doesn't it? However, this will only work in 10g2 and future releases.
    Some of those meta characters even include non matching lists, for example "\S" is equivalent to "[^ ]", so my example in the second part could be changed to:
    SELECT NVL(LENGTH(REGEXP_REPLACE('Having fun with regular expressions', '\S')), 0)
      FROM dual
      ;Other meta characters support search patterns in strings with newline characters. Just take a look at the link I've included.
    Another interesting meta character is "?" non-greedy. In 10g2, "?" not only means 0 or 1 occurrence, it means also the first occurrence. Let me illustrate with a simple example:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +')
      FROM dual
      ;This is old style, "greedy" search pattern, returning everything until the last space.
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^.* +?')
      FROM dual
      ;In 10g2, you'd get only "Having " because of the non-greedy search operation. Simulating that behavior in 10g1, I'd have to change the pattern to this:
    SELECT REGEXP_SUBSTR('Having fun with regular expressions', '^[^ ]+ +')
      FROM dual
      ;Another new option is the "x" match parameter. It's purpose is to ignore whitespaces in the searched string. This would prove useful in ignoring trailing/leading spaces for example. Checking on unsigned integers with leading/trailing spaces would look like this:
    SELECT REGEXP_SUBSTR(' 123 ', '^[0-9]+$', 1, 1, 'x')
      FROM dual
      ;However, I've to be careful. "x" would also allow " 1 2 3 " to qualify as valid string.
    I hope you enjoyed reading this introduction and hope you'll have some fun with using regular expressions.
    C.
    Fixed some typos ...
    Message was edited by:
    cd
    Included 10g2 features
    Message was edited by:
    cd

    Can I write this condition with only one reg expr in Oracle (regexp_substr in my example)?I meant to use only regexp_substr in select clause and without regexp_like in where clause.
    but for better understanding what I'd like to get
    next example:
    a have strings of two blocks separated by space.
    in the first block 5 symbols of [01] in the second block 3 symbols of [01].
    In the first block it is optional to meet one (!), in the second block it is optional to meet one (>).
    The idea is to find such strings with only one reg expr using regexp_substr in the select clause, so if the string does not satisfy requirments should be passed out null in the result set.
    with t as (select '10(!)010 10(>)1' num from dual union all
    select '1112(!)0 111' from dual union all --incorrect because of '2'
    select '(!)10010 011' from dual union all
    select '10010(!) 101' from dual union all
    select '10010 100(>)' from dual union all
    select '13001 110' from dual union all -- incorrect because of '3'
    select '100!01 100' from dual union all --incorrect because of ! without (!)
    select '100(!)1(!)1 101' from dual union all -- incorrect because of two occurencies of (!)
    select '1001(!)10 101' from dual union all --incorrect because of length of block1=6
    select '1001(!)10 1011' from dual union all) --incorrect because of length of block2=4
    select '10110 1(>)11(>)0' from dual union all)--incorrect because of two occurencies of (>)
    select '1001(>)1 11(!)0' from dual)--incorrect because (!) and (>) are met not in their blocks
    --end of test data

Maybe you are looking for

  • When will the Bonjour protocols be updated for use in an Enterprise?

    Bonjour protocols really need an upgrade. Usage of iOS devices in a large VPN'd network is difficult to say the least.  These limitations are making it impossible to recommend usage of iOS equipment.  In some cases I have found a work around to get i

  • SBLIVE 5.1 : dts drops using spdif under zoomplayer and recl

    Hello there... I really need a big help. I'm usign a personnale HT PC done with athlon 2800 XP, gforce 5200 and SBLIVE 5.. I plugged SBLIVE using spdif out connector to my sherwood R-945 ( DTS, DD, DPL ... ). Everything works fine under Dolby Digital

  • Vfx3 (new)

    Hi all My question basically was my process is order and then billing.and its a order releated billing and all config is set like that.Even the billing relevance at itemcat level is order rel billing.In VOFA config is there is not option to put order

  • NsUniqueId in sun java directory server 6.3 EE

    Hello in the Directory server each user has a nsUniqueId what is the utility of such field? can be used as unique identifier by external applications? how does the directory server handles this ID? is it possible to be reused eg: creation of an user

  • IPhoto backup and external storage

    I have run out of disc space on my macbook. It is full of photos. Everything is backed up with time machine, but I am going to buy another external HD to store all the old photos. I know I can just copy the whole iPhoto library to the HD, but is ther