Usage of Regular Expression

I am trying to use Regular expression to split the full name into First Name , Middle Name , Last
Name . Some of the records may not have a middle name . Tried few variations .. no luck so far.
Thanks in advance
SQL> With V_Data As
2 ( SELECT 'Walt F Disney' Fn FROM Dual
3 Union All
4 SELECT 'Walt Disney' Fn FROM Dual
5 )
6 Select fn ,
7 Regexp_Substr(Fn,'[^ ]+', 1, 1) Firstname ,
8 Regexp_Substr(Fn, ' [^ ]+ ') Middle ,
9 Regexp_Substr(Fn,'[^ ]+', 1, 3)Lastname
10 From V_Data;
FN FIRSTNAME MIDDLE LASTNAME
Walt F Disney Walt F Disney
Walt Disney Walt
SQL>
SQL> With V_Data As
2 ( SELECT 'Walt F Disney' Fn FROM Dual
3 Union All
4 SELECT 'Walt Disney' Fn FROM Dual
5 )
6 Select fn ,
7 Regexp_Substr(Fn,'[^ ]+', 1, 1) Firstname ,
8 Regexp_Substr(Fn, ' [^ ]+ ') Middle ,
9 Regexp_Substr(Fn,'[^ ]+', 1, 2)Lastname
10 From V_Data;
FN FIRSTNAME MIDDLE LASTNAME
Walt F Disney Walt F F
Walt Disney Walt Disney

Hi,
Assuming you have (at most) 3 "words" (that is, no names like 'Nikolai F S Grundtvig')
SELECT fn ,
     REGEXP_SUBSTR ( Fn, '[^ ]+')        AS Firstname ,
     TRIM (REGEXP_SUBSTR (Fn, ' .+ ')) AS Middle ,
     REGEXP_SUBSTR (Fn, '[^ ]+$')         AS Lastname
FROM      V_Data;Middle (iif present) will be the only substring with spaces before and after it.
Lastname is the last word, not necessarily the 3rd.
If you do have names like 'J R R Tolkein', where Middle = 'R R', you can use REGEXP_REPLACE:
SELECT fn ,
     REGEXP_SUBSTR ( Fn, '[^ ]+')        AS Firstname ,
     TRIM ( REGEXP_REPLACE ( Fn
                            , '^[^ ]+ (.*) [^ ]+$'
                     , '\1'
          )                 )             AS Middle ,
     REGEXP_SUBSTR (Fn, '[^ ]+$')         AS Lastname
FROM      V_Data;Edited by: Frank Kulash on Jan 12, 2010 7:57 PM
Added multi-word middle solution.
When you post formatted text on this site (and code should always be formatted), type these 6 characters:
(all small letters, inside curly brackets) before and after each formatted section. This is especially important with regular expressions, because brackets are interpreted as markup outside of code tags.

Similar Messages

"Match Regular Expression" and "Match Pattern" vi's behave differently

Hi,
I have a simple string matching need and by experimenting found that the "Match Regular Expression" and "Match Pattern" vi's behave somewhat differently. I'd assume that the regular expression inputs on both would behave the same. A difference I've discovered is that the "|" character (the "vertical bar" character, commonly used as an "or" operator) is recognized as such in the Match Regular Expression vi, but not in the Match Pattern vi (where it is taken literally). Furthermore, I cannot find any documentation in Help (on-line or in LabVIEW) about the "|" character usage in regular expressions. Is this documented anywhere?
For example, suppose I want to match any of the following 4 words: "The" or "quick" or "brown" or "fox". The regular expression "The|quick|brown|fox" (without the quotes) works for the Match Regular Expression vi but not the Match Pattern vi. Below is a picture of the block diagram and the front panel results:
The Help says that the Match Regular Expression vi performs somewhat slower than the Match Pattern vi, so I started with the latter. But since it doesn't work for me, I'll use the former. But does anyone have any idea of the speed difference? I'd assume it is negligible in such a simple example.
Thanks!
Solved!
Go to Solution.

Yep-
You hit a point that's frustrated me a time or two as well (and incidentally, caused some hair-pulling that I can ill afford)
The hint is in the help file:
for Match regular expression "The Match Regular Expression function gives you more options for matching
strings but performs more slowly than the Match Pattern function....Use regular
expressions in this function to refine searches....
Characters to Find
Regular Expression
VOLTS
VOLTS
A plus sign or a minus sign
[+-]
A sequence of one or more digits
[0-9]+
Zero or more spaces
\s* or * (that is, a space followed by an asterisk)
One or more spaces, tabs, new lines, or carriage returns
[\t \r \n \s]+
One or more characters other than digits
[^0-9]+
The word Level only if it
appears at the beginning of the string
^Level
The word Volts only if it
appears at the end of the string
Volts$
The longest string within parentheses
The first string within parentheses but not containing any
parentheses within it
$[^()]*$
A left bracket
A right bracket
cat, cag, cot, cog, dat, dag, dot, and dag
[cd][ao][tg]
cat or dog
cat|dog
dog, cat
dog, cat cat dog,cat
cat cat dog, and so on
((cat )*dog)
One or more of the letter a
followed by a space and the same number of the letter a, that is, a a, aa aa, aaa aaa, and so
on
(a+) \1
For Match Pattern "This function is similar to the Search and Replace
Pattern VI. The Match Pattern function gives you fewer options for matching
strings but performs more quickly than the Match Regular Expression
function. For example, the Match Pattern function does not support the
parenthesis or vertical bar (|) characters.
Characters to Find
Regular Expression
VOLTS
VOLTS
All uppercase and lowercase versions of volts, that is, VOLTS, Volts, volts, and so on
[Vv][Oo][Ll][Tt][Ss]
A space, a plus sign, or a minus sign
[+-]
A sequence of one or more digits
[0-9]+
Zero or more spaces
\s* or * (that is, a space followed by an asterisk)
One or more spaces, tabs, new lines, or carriage returns
[\t \r \n \s]+
One or more characters other than digits
[~0-9]+
The word Level only if it begins
at the offset position in the string
^Level
The word Volts only if it
appears at the end of the string
Volts$
The longest string within parentheses
The longest string within parentheses but not containing any
parentheses within it
([~()]*)
A left bracket
A right bracket
cat, dog, cot, dot, cog, and so on.
[cd][ao][tg]
Frustrating- but still managable.
Jeff

Regular expression usage question

Hi there.
I have a 200 bytes EBCDIC variable record which I need to break down into fields. Fields are positional and are either text, binary numbers, packed-decimal and 64bytes long numbers.
My question is. Can regular expression handle this complex data.
I want to isolate each field into their corresponding format. EBCDIC into ASCII text, binary into java Integer and so on.
The reason for using reqular expression is because the record format could change and regular expression would be easier to modify without having to change the code.
Your words of advice are highly appreciated.
Please advice.
Regards,
Ulises

Regular expressions? I don't think so.
If you have a situation where positions 1-3 might be a binary number like client number, and the format might change so it moves to positions 12-14, then you could certainly write a record-format class to encapsulate that sort of information. In fact that would be a very good idea. But I can't imagine how a regular expression would help in getting a number out of three bytes, for example.

Help in regular expression matching

I have three expressions like
1) [(y2009)(y2011)]
2) [(y2008M5)(y2011M3)] or [(y2009M5)(y2010M12)]
3) [(y2009M1d20)(y2011M12d31)]
i want regular expression pattern for the above three expressions
I am using :
REGEXP_LIKE(timedomainexpression, '???[:digit:]{4}*[:digit:]{1,2}???[:digit:]{4}*[:digit:]{1,2}??', 'i');
but its giving results for all above expressions while i want different expression for each.
i hav used * after [:digit:]{4}, when i am using ? or . then its giving no results. Please help in this situation ASAP.
Thanks

I dont get your question Can you post your desired output? and also give some sample data.
Please consider the following when you post a question.
1. New features keep coming in every oracle version so please provide Your Oracle DB Version to get the best possible answer.
You can use the following query and do a copy past of the output.
select * from v$version 2. This forum has a very good Search Feature. Please use that before posting your question. Because for most of the questions
that are asked the answer is already there.
3. We dont know your DB structure or How your Data is. So you need to let us know. The best way would be to give some sample data like this.
I have the following table called sales
with sales
as
 select 1 sales_id, 1 prod_id, 1001 inv_num, 120 qty from dual
 union all
 select 2 sales_id, 1 prod_id, 1002 inv_num, 25 qty from dual
select *
from sales 4. Rather than telling what you want in words its more easier when you give your expected output.
For example in the above sales table, I want to know the total quantity and number of invoice for each product.
The output should look like this
Prod_id sum_qty count_inv
1 145 2 5. When ever you get an error message post the entire error message. With the Error Number, The message and the Line number.
6. Next thing is a very important thing to remember. Please post only well formatted code. Unformatted code is very hard to read.
Your code format gets lost when you post it in the Oracle Forum. So in order to preserve it you need to
use the {noformat}{noformat} tags.
The usage of the tag is like this.
<place your code here>\
7. If you are posting a *Performance Related Question*. Please read
 {thread:id=501834} and {thread:id=863295}.
 Following those guide will be very helpful.
8. Please keep in mind that this is a public forum. Here No question is URGENT.
 So use of words like *URGENT* or *ASAP* (As Soon As Possible) are considered to be rude.

[SOLVED]ZSH and regular expressions

Hi
I am getting into regular expressions and i have noticed that with my .zshrc file i have some problem. In bash this expression works:
\^\[^#]
but not also in zsh. I have also noted that regular expression works fine with other zshrc configurations found in archwiki (like grml) but i want to have my configuration. And i really can't find what command make a difference
My .zshrc file is pulled from this site https://github.com/slashbeast/things/bl … s/DOTzshrc.
# .zshrc
# Author: Piotr Karbowski <[email protected]>
# License: beerware.
# Basic zsh config.
umask 077
ZDOTDIR=${ZDOTDIR:-${HOME}}
ZSHDDIR="${HOME}/.config/zsh.d"
HISTFILE="${ZDOTDIR}/.zsh_history"
HISTSIZE='10000'
SAVEHIST="${HISTSIZE}"
export EDITOR="/usr/bin/vim"
export TMP="$HOME/tmp"
export TEMP="$TMP"
export TMPDIR="$TMP"
export TMPPREFIX="${TMPDIR}/zsh"
if [ ! -d "${TMP}" ]; then mkdir "${TMP}"; fi
if ! [[ "${PATH}" =~ "^${HOME}/bin" ]]; then
export PATH="${HOME}/bin:${PATH}"
fi
# Not all servers have terminfo for rxvt-256color. :<
if [ "${TERM}" = 'rxvt-256color' ] && ! [ -f '/usr/share/terminfo/r/rxvt-256color' ] && ! [ -f '/lib/terminfo/r/rxvt-256color' ] && ! [ -f "${HOME}/.terminfo/r/rxvt-256color" ]; then
export TERM='rxvt-unicode'
fi
# Colors.
red='\e[0;31m'
RED='\e[1;31m'
green='\e[0;32m'
GREEN='\e[1;32m'
yellow='\e[0;33m'
YELLOW='\e[1;33m'
blue='\e[0;34m'
BLUE='\e[1;34m'
purple='\e[0;35m'
PURPLE='\e[1;35m'
cyan='\e[0;36m'
CYAN='\e[1;36m'
NC='\e[0m'
# Functions
if [ -f '/etc/profile.d/prll.sh' ]; then
. "/etc/profile.d/prll.sh"
fi
run_under_tmux() {
# Run $1 under session or attach if such session already exist.
# $2 is optional path, if no specified, will use $1 from $PATH.
# If you need to pass extra variables, use $2 for it as in example below..
# Example usage:
# torrent() { run_under_tmux 'rtorrent' '/usr/local/rtorrent-git/bin/rtorrent'; }
# mutt() { run_under_tmux 'mutt'; }
# irc() { run_under_tmux 'irssi' "TERM='screen' command irssi"; }
# There is a bug in linux's libevent...
# export EVENT_NOEPOLL=1
command -v tmux >/dev/null 2>&1 || return 1
if [ -z "$1" ]; then return 1; fi
local name="$1"
if [ -n "$2" ]; then
local file_path="$2"
else
local file_path="command ${name}"
fi
if tmux has-session -t "${name}" 2>/dev/null; then
tmux attach -d -t "${name}"
else
tmux new-session -s "${name}" "${file_path}" \; set-option status \; set set-titles-string "${name} (tmux@${HOST})"
fi
t() { run_under_tmux rtorrent; }
irc() { run_under_tmux irssi "TERM='screen' command irssi"; }
over_ssh() {
if [ -n "${SSH_CLIENT}" ]; then
return 0
else
return 1
fi
reload () {
exec "${SHELL}" "$@"
confirm() {
local answer
echo -ne "zsh: sure you want to run '${YELLOW}$@${NC}' [yN]? "
read -q answer
echo
if [[ "${answer}" =~ ^[Yy]$ ]]; then
command "${=1}" "${=@:2}"
else
return 1
fi
confirm_wrapper() {
if [ "$1" = '--root' ]; then
local as_root='true'
shift
fi
local runcommand="$1"; shift
if [ "${as_root}" = 'true' ] && [ "${USER}" != 'root' ]; then
runcommand="sudo ${runcommand}"
fi
confirm "${runcommand}" "$@"
poweroff() { confirm_wrapper --root $0 "$@"; }
reboot() { confirm_wrapper --root $0 "$@"; }
hibernate() { confirm_wrapper --root $0 "$@"; }
detox() {
if [ "$#" -ge 1 ]; then
confirm detox "$@"
else
command detox "$@"
fi
has() {
local string="${1}"
shift
local element=''
for element in "$@"; do
if [ "${string}" = "${element}" ]; then
return 0
fi
done
return 1
begin_with() {
local string="${1}"
shift
local element=''
for element in "$@"; do
if [[ "${string}" =~ "^${element}" ]]; then
return 0
fi
done
return 1
termtitle() {
case "$TERM" in
rxvt*|xterm|nxterm|gnome|screen|screen-*)
local prompt_host="${(%):-%m}"
local prompt_user="${(%):-%n}"
local prompt_char="${(%):-%~}"
case "$1" in
precmd)
printf '\e]0;%s@%s: %s\a' "${prompt_user}" "${prompt_host}" "${prompt_char}"
preexec)
printf '\e]0;%s [%s@%s: %s]\a' "$2" "${prompt_user}" "${prompt_host}" "${prompt_char}"
esac
esac
git_check_if_worktree() {
# This function intend to be only executed in chpwd().
# Check if the current path is in git repo.
# We would want stop this function, on some big git repos it can take some time to cd into.
if [ -n "${skip_zsh_git}" ]; then
git_pwd_is_worktree='false'
return 1
fi
# The : separated list of paths where we will run check for git repo.
# If not set, then we will do it only for /root and /home.
if [ "${UID}" = '0' ]; then
# running 'git' in repo changes owner of git's index files to root, skip prompt git magic if CWD=/home/*
git_check_if_workdir_path="${git_check_if_workdir_path:-/root:/etc}"
else
git_check_if_workdir_path="${git_check_if_workdir_path:-/home}"
git_check_if_workdir_path_exclude="${git_check_if_workdir_path_exclude:-${HOME}/_sshfs}"
fi
if begin_with "${PWD}" ${=git_check_if_workdir_path//:/ }; then
if ! begin_with "${PWD}" ${=git_check_if_workdir_path_exclude//:/ }; then
local git_pwd_is_worktree_match='true'
else
local git_pwd_is_worktree_match='false'
fi
fi
if ! [ "${git_pwd_is_worktree_match}" = 'true' ]; then
git_pwd_is_worktree='false'
return 1
fi
# todo: Prevent checking for /.git or /home/.git, if PWD=/home or PWD=/ maybe...
# damn annoying RBAC messages about Access denied there.
if [ -d '.git' ] || [ "$(git rev-parse --is-inside-work-tree 2> /dev/null)" = 'true' ]; then
git_pwd_is_worktree='true'
git_worktree_is_bare="$(git config core.bare)"
else
unset git_branch git_worktree_is_bare
git_pwd_is_worktree='false'
fi
git_branch() {
git_branch="$(git symbolic-ref HEAD 2>/dev/null)"
git_branch="${git_branch##*/}"
git_branch="${git_branch:-no branch}"
git_dirty() {
if [ "${git_worktree_is_bare}" = 'false' ] && [ -n "$(git status --untracked-files='no' --porcelain)" ]; then
git_dirty='%F{green}*'
else
unset git_dirty
fi
precmd() {
# Set terminal title.
termtitle precmd
if [ "${git_pwd_is_worktree}" = 'true' ]; then
git_branch
git_dirty
git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
else
unset git_prompt
fi
preexec() {
# Set terminal title along with current executed command pass as second argument
termtitle preexec "${(V)1}"
chpwd() {
git_check_if_worktree
man() {
if command -v vimmanpager >/dev/null 2>&1; then
PAGER="vimmanpager" command man "$@"
else
command man "$@"
fi
# Are we running under grsecurity's RBAC?
rbac_auth() {
local auth_to_role='admin'
if [ "${USER}" = 'root' ]; then
if ! grep -qE '^RBAC:' "/proc/self/status" && command -v gradm > /dev/null 2>&1; then
echo -e "\n${BLUE}*${NC} ${GREEN}RBAC${NC} Authorize to '${auth_to_role}' RBAC role."
gradm -a "${auth_to_role}"
fi
fi
#rbac_auth
# Check if we started zsh in git worktree, useful with tmux when your new zsh may spawn in source dir.
git_check_if_worktree
if [ "${git_pwd_is_worktree}" = 'true' ]; then
git_branch
git_dirty
git_prompt=" %F{blue}[%F{253}${git_branch}${git_dirty}%F{blue}]"
else
unset git_prompt
fi
# Le features!
# extended globbing, awesome!
setopt extendedGlob
# zmv - a command for renaming files by means of shell patterns.
autoload -U zmv
# zargs, as an alternative to find -exec and xargs.
autoload -U zargs
# Turn on command substitution in the prompt (and parameter expansion and arithmetic expansion).
setopt promptsubst
# Control-x-e to open current line in $EDITOR, awesome when writting functions or editing multiline commands.
autoload -U edit-command-line
zle -N edit-command-line
bindkey '^x^e' edit-command-line
# Include user-specified configs.
if [ ! -d "${ZSHDDIR}" ]; then
mkdir -p "${ZSHDDIR}" && echo "# Put your user-specified config here." > "${ZSHDDIR}/example.zsh"
fi
for zshd in $(ls -A ${HOME}/.config/zsh.d/^*.(z)sh$); do
. "${zshd}"
done
# Completion.
autoload -Uz compinit
compinit
zstyle ':completion:*' matcher-list 'm:{a-z}={A-Z}'
zstyle ':completion:*' completer _expand _complete _ignored _approximate
zstyle ':completion:*' menu select=2
zstyle ':completion:*' select-prompt '%SScrolling active: current selection at %p%s'
zstyle ':completion::complete:*' use-cache 1
zstyle ':completion:*:descriptions' format '%U%F{cyan}%d%f%u'
# If running as root and nice >0, renice to 0.
if [ "$USER" = 'root' ] && [ "$(cut -d ' ' -f 19 /proc/$$/stat)" -gt 0 ]; then
renice -n 0 -p "$$" && echo "# Adjusted nice level for current shell to 0."
fi
# Fancy prompt.
if over_ssh && [ -z "${TMUX}" ]; then
prompt_is_ssh='%F{blue}[%F{red}SSH%F{blue}] '
elif over_ssh; then
prompt_is_ssh='%F{blue}[%F{253}SSH%F{blue}] '
else
unset prompt_is_ssh
fi
case $USER in
root)
PROMPT='%B%F{cyan}%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{blue}%1~${git_prompt}%F{blue} %# %b%f%k'
PROMPT='%B%F{blue}%n@%m%k %(?..%F{blue}[%F{253}%?%F{blue}] )${prompt_is_ssh}%B%F{cyan}%1~${git_prompt}%F{cyan} %# %b%f%k'
esac
# Ignore lines prefixed with '#'.
setopt interactivecomments
# Ignore duplicate in history.
setopt hist_ignore_dups
# Prevent record in history entry if preceding them with at least one space
setopt hist_ignore_space
# Nobody need flow control anymore. Troublesome feature.
#stty -ixon
setopt noflowcontrol
# Fix for tmux on linux.
case "$(uname -o)" in
'GNU/Linux')
export EVENT_NOEPOLL=1
esac
# Aliases
alias cp='cp -iv'
alias rcp='rsync -v --progress'
alias rmv='rsync -v --progress --remove-source-files'
alias mv='mv -iv'
alias rm='rm -iv'
alias rmdir='rmdir -v'
alias ln='ln -v'
alias chmod="chmod -c"
alias chown="chown -c"
if command -v colordiff > /dev/null 2>&1; then
alias diff="colordiff -Nuar"
else
alias diff="diff -Nuar"
fi
alias grep='grep --colour=auto'
alias egrep='egrep --colour=auto'
alias ls='ls --color=auto --human-readable --group-directories-first --classify'
# Keys.
case $TERM in
rxvt*|xterm*)
bindkey "^[[7~" beginning-of-line #Home key
bindkey "^[[8~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward #Up Arrow
bindkey "^[[B" history-beginning-search-forward #Down Arrow
bindkey "^[Oc" forward-word # control + right arrow
bindkey "^[Od" backward-word # control + left arrow
bindkey "^H" backward-kill-word # control + backspace
bindkey "^[[3^" kill-word # control + delete
linux)
bindkey "^[[1~" beginning-of-line #Home key
bindkey "^[[4~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward
bindkey "^[[B" history-beginning-search-forward
screen|screen-*)
bindkey "^[[1~" beginning-of-line #Home key
bindkey "^[[4~" end-of-line #End key
bindkey "^[[3~" delete-char #Del key
bindkey "^[[A" history-beginning-search-backward #Up Arrow
bindkey "^[[B" history-beginning-search-forward #Down Arrow
bindkey "^[Oc" forward-word # control + right arrow
bindkey "^[Od" backward-word # control + left arrow
bindkey "^H" backward-kill-word # control + backspace
bindkey "^[[3^" kill-word # control + delete
esac
bindkey "^R" history-incremental-pattern-search-backward
bindkey "^S" history-incremental-pattern-search-forward
if [ -f ~/.alert ]; then cat ~/.alert; fi
Thanks for all the help.
Last edited by Shark (2013-05-11 22:32:24)

Raynman wrote:
"This expression doesn't work", "It doesn't work" ...
Could you try being a bit more specific?
Firstly, i am sorry i didn't post the output. I should have know better.
Secondly, chill out.
I have used above regex with grep command. Output from terminal is:
zsh: bad pattern: ^[^#]
In bash it works perfectly.
If i issue "setopt re_match_pcre" i have the same ouput as above.
EDIT: If i issue "unsetopt no_match" it actually works but i have to change the regex from "\^\[^#]" to "\^[^#]" otherwise i get the same output as above. In bash both options work.
Last edited by Shark (2013-05-11 22:07:21)

FM9 SDL Authoring Assitant Regular Expression Syntax?

I'm trying to trick SDL into identifying words that are not approved by STE.
Under "Configure|Style and Linguistic Checks|User Defined Rules" the program allows regular expressions to create custom rules.
I have all other options in the Utility unchecked.
I am by no means a pro at regular expressions but was able to create a pretty solid command at http://regexlib.com/RETester.aspx.
The idea is to create an expression that looks for any word other than those seperated by vertical bars.
For the test text "this is not the way that should work. this is not the way that should work."
\b(?:(?!should|not|way|this|is|that).)+
returns: the work the work
At that website, I can change the excluded words and it works every time. Change the test text, same thing, still works.
Perfect! I ripped every approved word in STE into the formula and it (SDL) only returns words at the end of the sentence that are followed by a periods and question marks. So I added"\." to the exclusion list in the expression and it only found words next to question marks. I excluded question marks and now it finds nothing. I don't understand this as I wasn't aware that I had any criteria in the expression that dictates functionality only at the end of the sentence.
I have an O'reilly book to refer to, if anyone can give me a shove in the right direction as to which set of rules to adhere to, I would appreciate it. Why did negative word matching have to be my introduction to this subject?

I tried your expression in a couple of regex tools and it seems to parse as you wanted it to. I suspect that the SDL implementation doesn't follow the unix/linux standards. I haven't used the tool and the usage documentation is non-existant, except for the limited flash-based demo.
From the SDL knowledgebase, it states that their regex filter uses the .NET regex flavour and I believe that the differences on this are explained in the "Mastering Regular Expressions" book.

Evaluate Regular expression complexity

Hi all
I've a problem on regular expression usage in my application.
I'm using a regular expression to identify objects and fetch them to be served depending on an input string with has to be matched.
each object has a property representing a regular expression to be matched to be candidate for fetching.
my program receive an external input string, then loops on the full objects collection identifying which are the object whose regular expression match the input stream.
doing an example:
obj1) key = "J*SDK"
obj2) key = "Ja*6*"
obj3) key = "JEE*"
if the input string is "Java 6.0 SDK" obj1 and obj2 are cadidate, while obj3 is discarded.
up to now everithing is fine, now here is my question:
i want only one object as output and I want the one best matching my input string.
this means that
-> obj1 is matching 4 chatacter ans has only one wildchar
-> obj2 is matching 3 characters and has two wildcard
so obj2 is discarded since it's regular expression is more complex than the obj1 one
my problem is HOW to evaulate correctly such complexity for each candidate object to be able to choose my best object.
is there some formal rule / api for this?
I'd like to match all wildcards into the regex, but doing this "by hand" would surely result in some bug due to some missing case, so a "third party" API or a grammar rule would be useful.
hoping for you help.
regards
Michele Sacchetti

ok, after days and days of research i came up to this solution:
1) I used this (http://www.brics.dk/automaton) package for regular expression which let me access the internal state automa data
2) use the getShortestExample() method to retrieve the shortest string matching the given regExp
3) evaluate the Levenshtein Distance between the given string and the one to be matched
PROs:
1) the regexp logic is fully handled by the same state machine which take cares of pattern matching in the first phase
2) the library provide me a non-regexp string to be used with comparison (e.g Levenshtein Distance evaluation)
CONs:
1) the methods getShortestExample is unaware of string to be matched, so if i use "aab|aaa" to match "aab" the method gets the first shortest sort alfabetically, that is "aaa", so I get a LD of 1 even if it should be 0, but it's quite a good deal for my application.
@endasil : your solution rely on grouping, and is based on a pre-parsing done manually so it basically went back to the "manual" parsing i wanted to avoid
Another way I'd like to give a try but had to give up was to use ANTLR (www.antlr.org) to create a parser for regular expression and then evaluating the resulting "tree size" of the parser, but wasn't able to find a formal description of RegExp grammar on the net.
do you have any suggestion or comment on my solution (or other to give a try? )

Regular expression - Replace a part of an expression

Dear All,
This is not a business requirement, just trying to practice regexp.
Suppose we want to replace a part of a regular expression from a string. As an example, in the string 'THIS Number 124356 Is to Change.This Number 5 Also', I am trying to replace the last digit of each number with $ sign.
Output will be
'THIS Number 12435$ Is to Change.This Number $ Also'.
Will this be possible using a single regexp_replce?
Thanks in advance.

MichaelS wrote:
I am trying to achieve this using a SINGLE regexp_replace.Here we go:
SQL> with t as (
select 'ABC124556def568gh236klJ258' str from dual union all
select 'THIS Number 124356 Is to Change.This Number 5 Also' str from dual
select str, regexp_replace(str, '(\d{0,})\d{1}', '\1$') str2
from t
STR                                                     STR2
ABC124556def568gh236klJ258                              ABC12455$def56$gh23$klJ25$
THIS Number 124356 Is to Change.This Number 5 Also      THIS Number 12435$ Is to Change.This Number $ Also
2 rows selected.
Nice..
Learning for me ..
Never thought of usage like {0,} ... kepping the end value OPEN.
I was trying with \d+?\d, which was not working..

How to set keepalive check for regular expression

Hi
We are using css110501 CSS.
Right now the keepalives on services are set using hash values.
But i want to change this keepalives to implement keepalives with regular expression checking.
Any Ideas?

The CSS supports it's own native scripting language that can be used to write keepalives among other things.
Here is an example of a script that I wrote to check a page for some specific text:
! Filename: ap-kal-statpage
! Parameters: None - must be coded in script
! Description:
! This script will attempt to connect to a host and
! "GET" an html page. The script checks the contents
! of the page for a particular string. If the string
! is found, the script passes.
! Failure Upon:
! 1. The correct arguments are not supplied.
! 2. The CSS is unable to connect to the host.
! 3. The string is not found in the page.
no echo
if ${ARGS}[#] "LT" "4"
echo "Usage: ap-kal-portlist \'Hostname Port Page String ...\'"
exit script 1
endbranch
set host "${ARGS}[1]"
set port "${ARGS}[2]"
set page "${ARGS}[3]"
set string "${ARGS}[4]"
set EXIT_MSG "Host ${host} not responding on TCP port ${port}."
socket connect host ${host} port ${port} tcp
socket send ${SOCKET} "GET ${page} HTTP/1.0\n\n"
set EXIT_MSG "String was not found."
socket waitfor ${SOCKET} "${string}" 200
socket disconnect ${SOCKET}
echo "String ${string} was found."
no set EXIT_MSG
exit script 0

Logical AND in Java Regular Expressions

I'm trying to implement logical AND using Java Regular Expressions.
I couldn't figure out how to do it after reading Java docs and textbooks. I can do something like "abc.*def", which means that I'm looking for strings which have "abc", then anything, then "def", but it is not "pure" logical AND - I will not find "def.*abc" this way.
Any ideas, how to do it ?
Baken

First off, looks like you're really talking about an "OR", not an "AND" - you want it to match abc.*def OR def.*abc right? If you tried to match abc.*def AND def.*abc nothing would ever match that, as no string can begin with both "abc" and "def", just like no numeric value can be both 2 and 5.
Anyway, maybe regex isn't the right tool for this job. Can you not simply programmatically match it yourself using String methods? You want it to match if the string "starts with" abc and "ends with" def, or vice-versa. Just write some simple code.

Help in Regular expression

Hello..
I wanted to write a regular expression to match the foll string..
 
 NEW ORLEANS, Louisiana (CNN) 
-- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi, residents say much of America has forgotten their plight.
 
I tried doing..
Matcher matcher= Pattern.compile(" ([^<^>]+?)", Pattern.CASE_INSENSITIVE).matcher(story);
Its not working...
is there any other soln?

Theres probably a better way to do this but here's a way that works.
import java.util.regex.*;
public class RegexTester{
public static void main(String[] args){
 String text =
 " " +
 " NEW ORLEANS, Louisiana (CNN) " +
 "-- Two years after Hurricane Katrina devastated coastal areas of Louisiana and Mississippi," +
 "residents say much of America has forgotten their plight." +
 " ";
 String regex = ">((?:\\s*[\\S&&[^<>]]+\\s*)*?)<";
 Pattern p = Pattern.compile(regex);
 Matcher m = p.matcher(text);
 while(m.find()){
 System.out.println("Match: '" + m.group(1) + "'");
}

Regular expression alphabets

Hi
I want to retrieve the data if the data contains a character or a space or '-' thru select query .
Please help me in writing the combination of 3 with regular expression.
Thanks!!

VT wrote:
Hi,
Try this
SELECT *
FROM <TABLE> WHERE REGEXP_LIKE(<COLUMN>, '[a-z -][A-Z -]');cheers
VTThat won't work as it's expecting at least two characters with the first having to be a-z (lower case) or space or "-" followed by A-Z (upper case) or space or "-".
The correct way is either:
[a-zA-Z -]or
[[:alpha:] -]using the alpha set is often preferable as it can work differently with different character sets/languages rather than restricting to just the a-zA-Z ranges.
Generating a reference for your own database characterset/language can be useful...
SQL> select level-1 as asc_code, decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), CHR(level-1)) as chr,
2 decode(chr(level-1), regexp_substr(chr(level-1), '[[:graph:]]'), 1) is_graph,
3 decode(chr(level-1), regexp_substr(chr(level-1), '[[:blank:]]'), 1) is_blank,
4 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alnum:]]'), 1) is_alnum,
5 decode(chr(level-1), regexp_substr(chr(level-1), '[[:alpha:]]'), 1) is_alpha,
6 decode(chr(level-1), regexp_substr(chr(level-1), '[[:digit:]]'), 1) is_digit,
7 decode(chr(level-1), regexp_substr(chr(level-1), '[[:cntrl:]]'), 1) is_cntrl,
8 decode(chr(level-1), regexp_substr(chr(level-1), '[[:lower:]]'), 1) is_lower,
9 decode(chr(level-1), regexp_substr(chr(level-1), '[[:upper:]]'), 1) is_upper,
10 decode(chr(level-1), regexp_substr(chr(level-1), '[[:print:]]'), 1) is_print,
11 decode(chr(level-1), regexp_substr(chr(level-1), '[[:punct:]]'), 1) is_punct,
12 decode(chr(level-1), regexp_substr(chr(level-1), '[[:space:]]'), 1) is_space,
13 decode(chr(level-1), regexp_substr(chr(level-1), '[[:xdigit:]]'), 1) is_xdigit
14 from dual
15 connect by level <= 256
16 /
ASC_CODE C IS_GRAPH IS_BLANK IS_ALNUM IS_ALPHA IS_DIGIT IS_CNTRL IS_LOWER IS_UPPER IS_PRINT IS_PUNCT IS_SPACE IS_XDIGIT
 0 1
 1 1
 2 1
 3 1
 4 1
 5 1
 6 1
 7 1
 8 1
 9 1 1
 10 1 1
 11 1 1
 12 1 1
 13 1 1
 14 1
 15 1
 16 1
 17 1
 18 1
 19 1
 20 1
 21 1
 22 1
 23 1
 24 1
 25 1
 26 1
 27 1
 28 1
 29 1
 30 1
 31 1
 32 1 1 1
 33 ! 1 1 1
 34 " 1 1 1
 35 # 1 1 1
 36 $ 1 1 1
 37 % 1 1 1
 38 & 1 1 1
 39 ' 1 1 1
 40 ( 1 1 1
 41 ) 1 1 1
 42 * 1 1 1
 43 + 1 1 1
 44 , 1 1 1
 45 - 1 1 1
 46 . 1 1 1
 47 / 1 1 1
 48 0 1 1 1 1 1
 49 1 1 1 1 1 1
 50 2 1 1 1 1 1
 51 3 1 1 1 1 1
 52 4 1 1 1 1 1
 53 5 1 1 1 1 1
 54 6 1 1 1 1 1
 55 7 1 1 1 1 1
 56 8 1 1 1 1 1
 57 9 1 1 1 1 1
 58 : 1 1 1
 59 ; 1 1 1
 60 < 1 1 1
 61 = 1 1 1
 62 > 1 1 1
 63 ? 1 1 1
 64 @ 1 1 1
 65 A 1 1 1 1 1 1
 66 B 1 1 1 1 1 1
 67 C 1 1 1 1 1 1
 68 D 1 1 1 1 1 1
 69 E 1 1 1 1 1 1
 70 F 1 1 1 1 1 1
 71 G 1 1 1 1 1
 72 H 1 1 1 1 1
 73 I 1 1 1 1 1
 74 J 1 1 1 1 1
 75 K 1 1 1 1 1
 76 L 1 1 1 1 1
 77 M 1 1 1 1 1
 78 N 1 1 1 1 1
 79 O 1 1 1 1 1
 80 P 1 1 1 1 1
 81 Q 1 1 1 1 1
 82 R 1 1 1 1 1
 83 S 1 1 1 1 1
 84 T 1 1 1 1 1
 85 U 1 1 1 1 1
 86 V 1 1 1 1 1
 87 W 1 1 1 1 1
 88 X 1 1 1 1 1
 89 Y 1 1 1 1 1
 90 Z 1 1 1 1 1
 91 [ 1 1 1
 92 \ 1 1 1
 93 ] 1 1 1
 94 ^ 1 1 1
 95 _ 1 1 1
 96 ` 1 1 1
 97 a 1 1 1 1 1 1
 98 b 1 1 1 1 1 1
 99 c 1 1 1 1 1 1
 100 d 1 1 1 1 1 1
 101 e 1 1 1 1 1 1
 102 f 1 1 1 1 1 1
 103 g 1 1 1 1 1
 104 h 1 1 1 1 1
 105 i 1 1 1 1 1
 106 j 1 1 1 1 1
 107 k 1 1 1 1 1
 108 l 1 1 1 1 1
 109 m 1 1 1 1 1
 110 n 1 1 1 1 1
 111 o 1 1 1 1 1
 112 p 1 1 1 1 1
 113 q 1 1 1 1 1
 114 r 1 1 1 1 1
 115 s 1 1 1 1 1
 116 t 1 1 1 1 1
 117 u 1 1 1 1 1
 118 v 1 1 1 1 1
 119 w 1 1 1 1 1
 120 x 1 1 1 1 1
 121 y 1 1 1 1 1
 122 z 1 1 1 1 1
 123 { 1 1 1
 124 | 1 1 1
 125 } 1 1 1
 126 ~ 1 1 1
 127 1
 128 Ç 1 1 1
etc.
{code}

Help in query using regular expression

HI,
I need a help to get the below output using regular expression query. Please help me.
SELECT REGEXP_SUBSTR ('PWRPKG(P/W+P/L+CC)', '[^+]+', 1, lvl) val, lvl
FROM DUAL,(SELECT LEVEL lvl FROM DUAL
CONNECT BY LEVEL <=(SELECT MAX ( LENGTH ('PWRPKG(P/W+P/L+CC)') - LENGTH (REPLACE ('PWRPKG(P/W+P/L+CC)','+',NULL))+ 1) FROM DUAL));
I need the output as
correct result:
==============
val lvl
P/W 1
P/L 2
CC 3
But i tried the above it is not coming the above result. Please help me where i did a mistake.
Thanks in advance

Frank gave you a solution in your other thread. You could simplify it if you are on 11g:
SQL> select * from table_x
2 /
TXT
TECHPKG(INTELLI CC+FRT SONAR)
PWRPKG(P/W+P/L+CC)
select txt,
 regexp_substr(
 txt,
 '(.*\()*([^+)]+)',
 1,
 column_value,
 null,
 2
 ) element,
 column_value element_number
from table_x,
 table(
 cast(
 multiset(
 select level
 from dual
 connect by level <= regexp_count(txt,'\+') + 1
 as sys.OdciNumberList
order by rowid,
 column_value
TXT ELEMENT ELEMENT_NUMBER
TECHPKG(INTELLI CC+FRT SONAR) INTELLI CC 1
TECHPKG(INTELLI CC+FRT SONAR) FRT SONAR 2
PWRPKG(P/W+P/L+CC) P/W 1
PWRPKG(P/W+P/L+CC) P/L 2
PWRPKG(P/W+P/L+CC) CC 3
SQL> SY.

Query help in regular expression

Hi all,
SELECT * FROM emp11
WHERE INSTR(ENAME,'A',1,2) >0;
Please let me know the equivalent query using regular expressions.
i have tried this after going through oracle regular expressions documentation.
SELECT * FROM emp11
WHERE regexp_LIKE(ename,'A{2}')
Any help in this regard would be highly appreciated .
Thanks,
P Prakash

please go here
Introduction to regular expressions ...
Thanks,
P Prakash

Urgent!!! Problem in regular expression for matching braces

Hi,
For the example below, can I write a regular expression to store getting key, value pairs.
example: ((abc def) (ghi jkl) (a ((b c) (d e))) (mno pqr) (a ((abc def))))
in the above example
abc is key & def is value
ghi is key & jkl is value
a is key & ((b c) (d e)) is value
and so on.
can anybody pls help me in resolving this problem using regular expressions...
Thanks in advance

"((key1 value1) (key2 value2) (key3 ((key4 value4)
(key5 value5))) (key6 value6) (key7 ((key8 value8)
(key9 value9))))"
I want to write a regular expression in java to parse
the above string and store the result in hash table
as below
key1 value1
key2 value2
key3 ((key4 value4) (key5 value5))
key4 value4
key5 value5
key6 value6
key7 ((key8 value8) (key9 value9))
key8 value8
key9 value9
please let me know, if it is not possible with
regular expressions the effective way of solving itYes, it is possible with a recursive regular expression.
Unfortunately Java does not provide a recursive regular expression construct.
$_ = "((key1 value1) (key2 value2) (key3 ((key4 value4) (key5 value5))) (key6 value6) (key7 ((key8 value8) (key9 value9))))";
my $paren;
   $paren = qr/
           [^()]+ # Not parens
         |
           (??{ $paren }) # Another balanced group (not interpolated yet)
    /x;
my $r = qr/^(.*?)$(\w+?) (\w+?|(??{$paren}))$\s*(.*?)$/;
while ($_) {
     match()
# operates on $_
sub match {
     my @v;
     @v = m/$r/;
     if (defined $v[3]) {
          $_ = $v[2];
          while (/\(/) {
               match();
          print "\"",$v[1],"\" \"",$v[2],"\"";
          $_ = $v[0].$v[3];
     else { $_ = ""; }
C:\usr\schodtt\src\java\forum\n00b\regex>perl recurse.pl
"key1" "value1"
"key2" "value2"
"key4" "value4"
"key5" "value5"
"key3" "((key4 value4) (key5 value5))"
"key6" "value6"
"key8" "value8"
"key9" "value9"
"key7" "((key8 value8) (key9 value9))"
C:\usr\schodtt\src\java\forum\n00b\regex>

Usage of Regular Expression

Similar Messages

Maybe you are looking for