OR ('|') in regular expressions (e.g. split a String into lines)
Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
To make this concrete, suppose that you want to split a String into lines, where the line delimiters are the same as the [line terminators used by Java regex|http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#lt] :
A newline (line feed) character ('\n'),
A carriage-return character followed immediately by a newline character ("\r\n"),
A standalone carriage-return character ('\r'),
A next-line character ('\u0085'),
A line-separator character ('\u2028'), or
A paragraph-separator character ('\u2029)
This problem has [been considered before|http://forums.sun.com/thread.jspa?forumID=4&threadID=464846] .
If we ignore the idiotic microsoft two char \r\n sequence, then no problem; the Java code would be:
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]");How do we add support for \r\n? If we try
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");which pattern of the compound (OR) regex gets used if both match? The
[\\n\\r\\u0085\\u2028\\u2029]or the
\\r\\n?
For instance, if the above code is called when
s = "a\r\nb";and if the first pattern
[\\n\\r\\u0085\\u2028\\u2029]is used for the match when the \r is encountered, then the tokens will be
"a", "", "b"
because there is an empty String between the \r and following \n. On the other hand, if the rule is use the pattern which matches the most characters, then the
\\r\\n
pattern will match that entire \r\n and the tokens will be
"a", "b"
which is what you want.
On my particular box, using jdk 1.6.0_17, if I run this code
String s = "a\r\nb";
String[] lines = s.split("[\\n\\r\\u0085\\u2028\\u2029]|\\r\\n");
System.out.print(lines.length + " lines: ");
for (String line : lines) System.out.print(" \"" + line + "\"");
System.out.println();
if (true) return;the answer that I get is
3 lines: "a" "" "b"So it seems like the first listed pattern is used, if it matches.
Therefore, to get the desired behavior, it seems like I should use
"\\r\\n|[\\n\\r\\u0085\\u2028\\u2029]"instead as the pattern, since that will ensure that the 2 char sequence is first tried for matches. Indeed, if change the above code to use this pattern, it generates the desired output
2 lines: "a" "b"But what has me worried is that I cannot find any documentation concerning this "first pattern of an OR" rule. This means that maybe the Java regex engine could change in the future, which is worrisome.
The only bulletproof way that I know of to do line splitting is the complicated regex
"(?:(?<=\\r)\\n)" + "|" + "(?:\\r(?!\\n))" + "|" + "(?:\\r\\n)" + "|" + "\\u0085" + "|" + "\\u2028" + "|" + "\\u2029"Here, I use negative lookbehind and lookahead in the first two patterns to guarantee that they never match on the end or start of a \r\n, but only on isolated \n and \r chars. Thus, no matter which order the patterns above are applied by the regex engine, it will work correctly. I also used non-capturing groups
(?:X)
to avoid memory wastage (since I am only interested in grouping, and not capturing).
Is the above complicated regex the only reliable way to do line splitting?
bbatman wrote:
Which match gets used when you use OR ('|') to specify multiple possible matches in a regex, and there are multiple matches among the supplied patterns? The first one (in the order written) which matches? Or the one which matches the most characters?
The longest match wins, normally. Except for alternation (or) as can be read from the innocent sentence
The Pattern engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.
in the javadocs. More information can be found in Friedl's book, the relevant page of which google books shows at
[http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false|http://books.google.de/books?id=GX3w_18-JegC&pg=PA175&lpg=PA175&dq=regular+expression+%22ordered+alternation%22&source=bl&ots=PHqgNmlnM-&sig=OcDjANZKl0VpJY0igVxkQ3LXplg&hl=de&ei=Dcg7S43NIcSi_AbX-83EDQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA0Q6AEwAA#v=onepage&q=&f=false]
If this link does not survive, search google for
regular expression "ordered alternation"
My first hit went right into Friedl's book.
Harald.
Similar Messages
-
How to split this string into 4 sections to a max 35 characters
Hello,
Does anyone have an idea how I can acheive this please.
I have this string
Expense_Inv_8- ExpenseInv_7- Exp001- Expense_Inv_6- Expense_Inv_5- Expense_Inv_4- Expense_Inv_3- Expense_Inv_2- Expense_inv1
and I need to display them in sections seperated by ';' as explained below
Section 1 Section 2 Section 3 Section 4
Expense_Inv_8- ExpenseInv_7- Exp001;Expense_Inv_6- Expense_Inv_5;Expense_Inv_4- Expense_Inv_3;Expense_Inv_2;
need to split this string into 4 sections seperated by ';' and each section should be of no more than 35 characters, if null should end ;;;
Section 1, 35 Character ended by;
Section 2, broken off after Expense_Inv_5 because Expense_Inv_4 will take it over 35 chracters)
Section 3, should only take Expense_Inv_4- Expense_Inv_3, because adding Expense_Inv_2 will take it over 35
characters, each record in the string is seperated by '-'
Section 4, dispays the reminder of the string
regards
AdeHi,
Welcome to the forum!
Whenever you ask a question, it helps if you post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) and the results you want from that data.
I think I understand the problemk well enough to attempt a solution, but if the query below isn't right, please post that information.
WITH cntr AS
SELECT LEVEL AS n
FROM dual
CONNECT BY LEVEL <= ( SELECT MAX (LENGTH (txt))
FROM table_x
, got_best_path AS
SELECT id
, txt
, MAX ( SYS_CONNECT_BY_PATH ( TO_CHAR (c.n, '99')
) AS best_path
FROM cntr c
JOIN table_x x ON c.n <= LENGTH (x.txt)
START WITH c.n = 1
CONNECT BY c.n - PRIOR c.n BETWEEN 1
AND :section_length
AND x.id = PRIOR x.id
AND SUBSTR ( x.txt
, c.n
, 1
) = '-'
AND LEVEL <= :section_cnt
GROUP BY id
, txt
, got_pos AS
SELECT id
, REPLACE ( txt
) || ';' AS txt
, best_path
, TO_NUMBER (REGEXP_SUBSTR (best_path, '[0-9]+', 1, 2)) AS pos_2
, TO_NUMBER (REGEXP_SUBSTR (best_path, '[0-9]+', 1, 3)) AS pos_3
, TO_NUMBER (REGEXP_SUBSTR (best_path, '[0-9]+', 1, 4)) AS pos_4
FROM got_best_path
SELECT id
, SUBSTR (txt, 1 , NVL ( pos_2 , :section_length)) AS section_1
, SUBSTR (txt, pos_2 + 1, NVL ((pos_3 - pos_2), :section_length)) AS section_2
, SUBSTR (txt, pos_3 + 1, NVL ((pos_4 - pos_3), :section_length)) AS section_3
, SUBSTR (txt, pos_4 + 1, :section_length ) AS section_4
FROM got_pos
;As written, this requires SQL*Plus 9 (or higher). You can have multiple versions or SQL*Plus on the same client, if you really need to keep the older version.
:section_length is the maximum length of each section (35, as you stated the problem).
:section_cnt is the number of sections. In the query above, this is 4. If you change it, you not only have to change the bind variable, but you have to change the hard-coded SELECT clauses of the main query and the last sub-query (that is, got_pos).
MODEL or PL/SQL would probably be better ways to solve this problem. -
Splitting a string into 4 equal parts
Hi All,
I have a string with maximum no of characters as 100,
I want to split this string into 4 equal parts.
Any help will be appreciated.
RegardsHi Rajeev,
Use this sample code
class SplitString {
public static void main(String[] arguments) {
StringTokenizer ex1, ex2; // Declare StringTokenizer Objects
int count = 0;
String strOne = "one two three four five";
ex1 = new StringTokenizer(strOne); //Split on Space (default)
while (ex1.hasMoreTokens()) {
count++;
wdComponentAPI.getMessageManager().reportSuccess("Token " + count + " is" + ex1.nextToken() );
count = 0; // Reset counter
String strTwo = "item one,item two,item three,item four"; // Comma Separated
ex2 = new StringTokenizer(strTwo, ","); //Split on comma
while (ex2.hasMoreTokens()) {
count++;
wdComponentAPI.getMessageManager().reportSuccess("Token " + count + " is "+ ex2.nextToken() );
Thanks
Anup
Edited by: Anup Bharti on Oct 27, 2008 12:36 PM -
Using Regular Expressions to replace Quotes in Strings
I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
String temp = "\"Hello\" i am a \"variable\"";
temp = temp.replaceAll("\"","\\\\\"");
however, this does not work and when i print out the code to the file the resulting code appears as:
String someVar = ""Hello" i am a "variable"";
and not as:
String someVar = "\"Hello\" i am a \"variable\"";
I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
Thanks in advance.Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.
-
A regular expression to detect a blank string...
Anyone know how to write a regular expression that will detect a blank string?
I.e., in a webservice xsd I'm adding a restriction to stop the user specifying a blank string for an element in the webservice operation call.
But I can't figure out a regular expression that will detect an entirely blank string but that will on the other hand allow the string to contain blank spaces in the text.
So the restriction should not allow "" or " " to be specified but will allow "Joe Bloggs" to be specified.
I tried [^ ]* but this wont allow "Joe Bloggs" to pass.
Any ideas?
Thanks,
Ruairi.Hi ruairiw,
there is a shortcut for the set of whitespace chars in Java. It is the Expression *\s* which is equal to *[ \t\n\f\r\u000b]*.
With this expression you can test whether a String consists only of whitespace chars.
Expamle:
String regex = "\\s*"; // the slash needs to be escaped
// same as String regex = "[ \t\n\f\r\u000b]";
System.out.println("".matches(regex)); // true
System.out.println(" ".matches(regex)); // true
System.out.println(" \r\n\t ".matches(regex)); // true
System.out.println("\n\nTom".matches(regex)); // false
System.out.println(" Tom Smith".matches(regex)); // falseBesh Wishes
esprimo -
Regular expression - get longest number from string
I believe it is easy one but I can't get it.Lets say I have a string 'A1_1000' I want to substract the 1000 using regular expression. When I feed Match regular expression I get '1' which is not the longest number. I know other ways of doing that but I want clean solution in one step. Does anybody knows the right regular expression to accomplish that? Thanks!
LV 2011, Win7
Solved!
Go to Solution.ceties wrote:
This is the best solution I was able to come with. I am just wondering if there is "smoother way" without the cycle.
Since multiple checks are required I would tend to beieve that we do have to loop through the possibilities. in this example
I start check at offset "0" into the string for a number. Provided i find a number I check if it is longer that any previous number I found and if so save the new longer number in the shift register.
Have fun!
Ben
Message Edited by Ben on 04-15-2009 09:23 AM
Ben Rayner
I am currently active on.. MainStream Preppers
Rayner's Ridge is under construction
Attachments:
Find_Longest.PNG 33 KB -
Unique regular expression to check if a string contains letters and numbers
Hi all,
How can I verify if a string contains numbers AND letters using a regular expression only?
I can do that with using 3 different expressions ([0-9],[a-z],[A-Z]) but is there a unique regular expression to do that?
Thanks allDarin.K wrote:
Missed the requirements:
single regex:
^([[:alpha:]]+[[:digit:]]+|[[:digit:]]+[[:alpha:]])[[:alnum:]]*$
You either have 1 or more digits followed by 1 or more letters or 1 or more letters followed by 1 or more digits. Once that is out of the way, the rest of the string must be all alphanumerics. The ^ and $ make sure the whole string is included in the match.
(I have not tested this at all, just typed it up hopefully I got all the brackets in the right place).
I think you just made my point. TWICE. While the lex class would be much more readable as a ring.... I know all my brackets are in the correct places and don't need to hope.
Jeff -
Cannot get regular expression to return true in String.matches()
Hi,
My String that I'm attempting to match a regular expression against is: value=='ORIG')
My regular expression is: value=='ORIG'\\) The double backslashes are included as a delimiter for ')' which is a regular expression special character
However, when I call the String.matches() method for this regular expression it returns false. Where am I going wrong?
Thanks.The string doesn't contain what you think it contains, or you made a mistake in your implementation.
public class Bar {
public static void main(final String... args) {
final String s = "value=='ORIG')";
System.out.println(s.matches("value=='ORIG'\\)")); // Prints "true"
} -
How to split a string into tokens and iterate through the tokens
Guys,
I want to split a string like 'Value1#Value2' using the delimiter #.
I want the tokens to be populated in a array-like (or any other convenient structure) so that I can iterate through them in a stored procedure. (and not just print them out)
I got a function on this link,
http://www.orafaq.com/forum/t/11692/0/
which returns a VARRAY. Can anybody help me how to iterate over the VARRAY, or suggest a different alternative to the split please ?
Thanks.RTFM: http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14261/collections.htm#sthref1146
or
http://www.oracle-base.com/articles/8i/Collections8i.php -
Easy Question: How to split concatenated string into multiple rows?
Hi folks,
this might be an easy question.
How can I split a concatenated string into multiple rows using SQL query?
INPUT:
select 'AAA,BBB,CC,DDDD' as data from dualDelimiter = ','
Expected output:
data
AAA
BBB
CCC
DDDDI'm looking for something kind of "an opposite for 'sys_connect_by_path'" function.
Thanks,
TomasHere is the SUBSTR/INSTR version of the solution:
SQL> WITH test_data AS
2 (
3 SELECT ',' || 'AAA,BBB,CC,DDDD' || ',' AS DATA FROM DUAL
4 )
5 SELECT SUBSTR
6 (
7 DATA
8 , INSTR
9 (
10 DATA
11 , ','
12 , 1
13 , LEVEL
14 ) + 1
15 , INSTR
16 (
17 DATA
18 , ','
19 , 1
20 , LEVEL + 1
21 ) -
22 INSTR
23 (
24 DATA
25 , ','
26 , 1
27 , LEVEL
28 ) - 1
29 ) AS NEW_STRING
30 FROM test_data
31 CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(DATA,'[^,]','')) - 1
32 /
NEW_STRING
AAA
BBB
CC
DDDD -
Split a string into multiple internal tables
Hi all,
I need to split a string based internal table into multiple internal tables based on some sub strings in that string based internal table...
High priority help me out...
eg...
a | jhkhjk | kljdskj |lkjdlj |
b | kjhdkjh | kldjkj |
c | jndojkok |
d |
this data which is in the application server file is brought into a internal table as a text. Now i need to send 'a' to one internal table, 'b' to one internal table, so on... help me
<Priority downgraded>
Edited by: Suhas Saha on Oct 12, 2011 12:24 PMHi pradeep,
eg...
a | jhkhjk | kljdskj |lkjdlj |
b | kjhdkjh | kldjkj |
c | jndojkok |
d |
As per your statement "Now i need to send 'a' to one internal table, 'b' to one internal table"
Do you want only a to one internal table and b to one internal table
OR
Do you want the whole row of the internal table i mean
a | jhkhjk | kljdskj |lkjdlj | to 1 internal table
Having the case of an internal table which is of type string,
1) Loop through the internal table. LOOP AT lt_tab INTO lwa_tab.
2) Ge the work area contents and get the first char wa_tab-string+0(1)
3) FIELD-SYMBOLS: <t_itab> TYPE ANY TABLE.
w_tabname = p_table.
CREATE DATA w_dref TYPE TABLE OF (w_tabname).
ASSIGN w_dref->* TO <t_itab>.
Follow the link
http://www.sap-img.com/ab030.htm
http://www.sapdev.co.uk/tips/dynamic-structure.htm
and then based on the sy-tabix values you will get that many number of internal table
<FS> = wa_tab-string+0(1)
append <FS>
OR
USE SPLIT statement at the relevant seperator
revert for further clarification
Thanks
Sri
Edited by: SRIKANTH P on Oct 12, 2011 12:36 PM -
Splitting a string into respective fields of dynamic internal table
Hi,
I've a string concatenated with a separator. I've to split the string and assign it to the respective fields of an internal table, which is dynamic.
Table name will be passed through selection screen. The data is coming from another system via RFC.
Eg : String ITAB :
100;89001;EN;Material1;MATERIAL1
100;89002;EN;Material2;MATERIAL2
The String ITAB may contain any master data. Let's say the above data is from MAKT table. So, I want to assign the above data to the respective fields of MAKT internal table(Dynamic).
I heard, this requirement can be achieved using some standard CLASS.
Please help me in doing this task.
Regards,
SunnyHello,
you can use dynamic programming for this issue, i.e.:
DATA: gv_table_name TYPE string,
gr_type_desc TYPE REF TO cl_abap_typedescr,
gr_struct_desc TYPE REF TO cl_abap_structdescr,
gr_table_desc TYPE REF TO cl_abap_tabledescr,
gv_t TYPE c,
gv_comp TYPE i,
gr_table_ref TYPE REF TO data,
gr_struc_ref TYPE REF TO data.
DATA: gt_itab TYPE TABLE OF string,
gt_split TYPE TABLE OF string,
gv_str TYPE string.
FIELD-SYMBOLS: <table> TYPE ANY TABLE,
<struct> TYPE ANY,
<comp> TYPE ANY.
APPEND '100;89001;EN;Material1;MATERIAL1' TO gt_itab.
APPEND '100;89002;EN;Material2;MATERIAL2' TO gt_itab.
"go!
gv_table_name = 'MAKT'.
cl_abap_tabledescr=>describe_by_name(
EXPORTING p_name = gv_table_name
RECEIVING p_descr_ref = gr_type_desc
EXCEPTIONS type_not_found = 4 ).
gr_struct_desc ?= gr_type_desc.
gr_table_desc = cl_abap_tabledescr=>create( gr_struct_desc ).
CREATE DATA gr_table_ref TYPE HANDLE gr_table_desc.
CREATE DATA gr_struc_ref TYPE HANDLE gr_struct_desc.
ASSIGN gr_table_ref->* TO <table>.
ASSIGN gr_struc_ref->* TO <struct>.
DESCRIBE FIELD <struct> TYPE gv_t COMPONENTS gv_comp.
LOOP AT gt_itab INTO gv_str.
CLEAR: gt_split.
SPLIT gv_str AT ';' INTO TABLE gt_split.
DO gv_comp TIMES.
READ TABLE gt_split INTO gv_str INDEX sy-index.
ASSIGN COMPONENT sy-index OF STRUCTURE <struct> TO <comp>.
<comp> = gv_str.
CLEAR gv_str.
ENDDO.
INSERT <struct> INTO TABLE <table>.
ENDLOOP.
After this code you will have all data in <table> field symbol in proper type.
Regards,
Jacek -
Regular expression to get substring from string
Hi,
I’m having the following problem:
SELECT REGEXP_SUBSTR(';first field;ir-second field-02;ir-second field-01; third field','.*ir-(.*)-01.*’)FROM dual
[\CODE]
This is the select that I have with a java expression!
In java I’m able to do find the right expression to retrieve what I want, but I don’t know how to adapt this for oracle!
In oracle I was trying to do something like this:
NVL(SUBSTR(REGEXP_SUBSTR(CONCAT(';', list),';ir-[^01;]+'),LENGTH(';ir-')+1,LENGTH(REGEXP_SUBSTR(CONCAT(';',list),';ir-[^01;]+'))), ' ') AS result
[\CODE]
But it doesn’t work because “ir” can repeat in other parameters.
“ir-something-01” only appears once.
Is it in oracle a logic similar to result groups in oracle?
best regards,
Ricardo Tomásrctomas wrote:
Hi,
In java I’m able to do find the right expression to retrieve what I wantWell, would be nice to tell us what that right expression would be :). Anyway, is this what you are looking for:
SQL> SELECT REGEXP_SUBSTR(';first field;ir-second field-02;ir-second field-01; third field',';ir-([^;]*)-01')
2 from dual
3 /
REGEXP_SUBSTR(';FIR
;ir-second field-01
SQL> SY. -
Splitting a string into separate values, but the array length is variable
We are a cabinet manufacturer. I'm trying to write a report to show the location of individual hinges on a door. I have a string that will look something like these examples:
60, 540
60, 540, 956
60, 540, 956, 1340
It may have 2, 3, 4, or 5 locations. (Don't think I've ever seen one over 5).
So I have a formula that says:
Split({Doors.HingeCenterLines},',')[1]
This will return 60 - the first figure in my string.
I need to have all of them separated, but the critical thing is the last measurement. I need it to be the height of the door minus the last hinge location. So for an example, if I have a 1400mm tall door, the 4th example string listed above would be generated. I want my entries to look like this:
60
540
956
60
My string always references the hinge relationship to the bottom of the door, but for the top hinge only, I need it to be based on the distance from the top of the door.
Can anyone help with this? Thanks in advance!Try this code please:
whileprintingrecords;
local stringvar array arr := Split({Doors.HingeCenterLines},',');
local numbervar temp;
local stringvar fin_string;
local numbervar i;
For i := 1 to ubound(arr) do
If i = ubound(arr) then
temp := {Door_height} - tonumber(arr[i]);
fin_string := fin_string + totext(temp,0,"");
else
fin_string := fin_string + arr[i] + chr(13);
fin_string;
You can then right-click the field > Format Field > Common tab > Check the 'Can Grow' option.
-Abhilash -
Split item quantity into line items
Hi Gurus,
I work in CRM 7.0. I created a Quote with 1 line item (Monitor) and quantity is 10 pcs. Then I create a Service Contract as follow-up document from that Quote. My requirenment is to get 10 line items with 1 piece of Monitor as quantity in each line item instead of getting 1 line item with 10 pcs of Monitor in it.
1 line item with 10 Monitors -> 10 line items with 1 Monitor in each line item.
Could you advice a solution please?
Regards,
AlexHi Alex,
Could you get a solution for this? We are facing the same problem.
Not able to split the reference line items based on quantity, for complaint creation..
Any help will be greatly appreciable.
thanks,
Spurthi
Maybe you are looking for
-
Videos can no longer be downloaded
I have been using Firefox (for Linux Desktop) to download videos for offline viewing for a long time now. I've used DownloadHelper and Flash Video Downloader addons and they've always done the job, just fine. However, after updating to Firefox 29.0 y
-
Mac won't boot up if external hard drive is on
I just purchased an external hard drive to use for my Time Machine backups. The "installation" instructions said that with the computer off, to power up the e.h.d., then power up the computer. Well, the computer is evidently trying to boot up from th
-
Can't re-install the OS X10.2.8 or OS X 10.4.3
I am using a iMac-G4, the original system was OS X10.2.8, then I upgraded to OS X10.4.3 Tiger. Now my G4 is experiencing a preblem: after start-up, when the apple logo and the blue loading-bar disapeared in the screen, the Log-in Box DID NOT SHOW UP,
-
Bing keep opening on my firefox start up page and i don't want it but I can't seem to delete
I was deleting some add ons and when I updated my firefox , somehow I got Bing which I don't want
-
Clear DB Table Buffer in J2EE Engine
Hi all, Is it possible to clear specific database table buffer in Open SQL layer in J2EE engine? I'm insterested in programmatically and manually approach. I know that buffer is automatically refreshed after successfull INSERT, UPDATE or DELETE state