Dumbfounded by Scanner processing String using regular expression
I was reading Bruce Eckel's book when I came across something interesting: extending Scanner with regular expressions. Unfortunately, I was confronted with an issue that doesn't make much sense to me: if the String that I am scanning contains a hyphen, the Scanner doesn't produce anything. As soon as I take it out, it all works like a charm. Here is my example:
import java.util.Scanner;
import java.util.regex.*;
public class StringScan {
public static void main (String [] args){
String input = "there's one caveat when scanning with regular expressions";
Scanner scanner = new Scanner (input);
String pattern = "[a-z]\\w+";
while (scanner.hasNext(pattern)){
scanner.next(pattern);
MatchResult match = scanner.match();
String output = match.group();
System.out.println(output);
}What could be the reason? I imagined it could be because the hyphen for some reason gets given a special meaning but when I tried escaping it, it still didn't work.
Thanks for your prompt reply.
I have figured out what was wrong with my code, by the way. Since a single quote is not a word character, it does not match w+. And as the very first input token does not match, the scanner stops immediately. I rewrote my regex to "[a-z].*" and now it does work.
Similar Messages
-
Format string using Regular Expression
Input string output format...
SELECT q'<select ab_c "ABC", efg "EFG" from dual>' str FROM DUAL
Output:
STR
select ab_c "ABC", efg "EFG" from dual
Required output format using regular expression...
STR
select 'ab_c' "ABC", 'efg' "EFG" from dualRegular expressions have many limitations as parsing tools, and you didn't specify the rules you wanted. This expression puts quotes around the non blank string before a quoted string:
SELECT regexp_replace(q'<select ab_c "ABC", efg "EFG" from dual>',
'([^" ]+)( +"[^ ]*")' , '''\1''\2' ) str FROM DUAL;
STR
select 'ab_c' "ABC", 'efg' "EFG" from dual
{code}
It is not robust - a missing " will confuse it, and you should be using bind variables anyway. -
Spliting a large string using regular expression which contain special char
I have huge sting(xml) containing normal character a-z,A-Z and 0-9 as well as special char( <,>,?,&,',",;,/ etc.)
I need to split this sting where it ends with </document>
for e.g.
Original String:
<document>
<item>sdf</item>
<item><text>sd</text</item>
</document>
<document>hi</document>
The above sting has to be splited in to two parts since it is having two document tag.
Can any body help me to resolve this issue. I can use StringTokenizer,String split method or Regular expression api too.manas589 wrote:
I used DOM and sax parser and got few exception. Again i don't have right to change xml. so i thought to go with RegularExpression or some other way where i can do my job.If the file actually comes in lines like what you posted, you should just be able to compare the contents of each line to see if it contains "</document>" or whatever you're looking for. I wouldn't use regex unless I needed another problem.
I got excpetion like: Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)So then it isn't even XML.
Edit: sorry, I just realized why you're considering all of these heavy-duty ideas. It's just that you don't know how to break the string into lines. You do it like this:
BufferedReader br = new BufferedReader(new StringReader(theNotXMLString)); -
Filter String using Regular Expression
Hello,
I have an application that monitors serial communication between a PC and device. The message protocol is a byte stream that I convert to a string to parse into pretty messages. The start of the string is always "10 02", but if the string is preceded with another "10" like this "10 10 02" it is part of a message. I've been trying to use a regular expression with the Search and Replace VI. My regex is "[^10]\s10\s02" which almost works but it cuts off part of the message:
Before:
10 03 10 02
After:
10 0 <= missing the "3"
10 02
Here's what I'm doing:
Any ideas on what I'm missing? I've attached a simple example.
Thanks
Message Edited by Derek Price on 02-14-2008 08:37 PM
Attachments:
Filter Beginning Message1.vi 14 KB
FilterMessageRegex1.png 7 KBTry this approach.
Do search and replace on '10\s02' and replace with '\r\n10\s20'
Then do another search and replace on '10\r\n\10\s20' with '10\s10\s20'
See attached.
Randall Pursley
Attachments:
Message Filter.PNG 18 KB -
How to Capture Multiple Line String using Regular Expression?
Hi,
I have a simple program like this:
What I want to accomplish is to capture everything between >>start and >>end using a single Match Regular Expression node. It seems that setting multiple? to True or False does not help.
I am using LabVIEW 2012.
If it is impossible to capture it using a single node, that is fine. But I want to make sure that I can make full use of this node without combining serveral others.
Thank you!
TailOfGon
Certified LabVIEW Architect 2013
Solved!
Go to Solution.Thank you for the fast response! Your solution worked in the example case
After I saw your post, I was finally able to step forward. But I still wanted to make use of dot notation due to the limitation of characters that match with \w.
I made some more modification to your regular expression then now it seems working for all characters:
>>start((?:\s|.)*)>>end
Thanks!
TailOfGon
Certified LabVIEW Architect 2013 -
Replace a string using regular expression from powershell
I want to replace the following:
'browserName': 'firefox'
with :
'browserName': 'chrome'
then I tried this:
(get-content $conffile) -replace "^('browserName': ')\S+","browserName': 'chrome' |set-content $conffile
But nothing happened.
Could someboby tell me how to write the regular expression here? Thanks a lot.Second person today with the same question.
get-content $conffile |%{$_ -replace "'browserName':\s+'firefox'","'browserName': 'chrome'" | set-content $conffile
\_(ツ)_/ -
Filter Strings using regular expressions
Requirements.
1.I have a table with different names.
2.I input a word(string) through a text box.
3.I filter table using the input string through text box using the code
((DefaultRowSorter)table_customer.getRowSorter()).setRowFilter(RowFilter.regexFilter(regex, indices));
4.regex is obtained as follows.
String regex = "";
String text = txtFilterText.getText();
regex = "^(?i)"text".*"; //for starts with filter
regex = "." + text + ".";//for contains filter
regex = "(?i)["text".*]";//for doesnt start with filter
regex =".*(?i)"text"$";//for end with filter
I need help for doesnt contain and doesnt end with filters.Plz help me out..
Aneesh2. {color:ff0000}Double post{color}
Reply here: http://forum.java.sun.com/thread.jspa?threadID=5231406 -
Getting non numeric strings using regular expression
Hi Guys ,
I want to get list of string values in table which contains no numeric values .....
I have a string column name A and table name B .
I have written following code , but it seems it is incorrect .
Plz help me out .....
SELECT
A FROM
B
WHERE
regexp_like(A, '([^[:digit:]])'
Thanks in advance ....96097f0e-f165-463a-a0a2-3d15214c8a3d wrote:
Hi Guys ,
I want to get list of string values in table which contains no numeric values .....
I have a string column name A and table name B .
I have written following code , but it seems it is incorrect .
Plz help me out .....
SELECT
A FROM
B
WHERE
regexp_like(A, '([^[:digit:]])'
Thanks in advance ....
That will give you every one that has at least one non-numeric character, if you want ones which contain no numeric characters then it should be
regexp_like(A,'^[^0-9]*$') -
Using Regular Expressions to replace Quotes in Strings
I am writing a program that generates Java files and there are Strings that are used that contain Quotes. I want to use regular expressions to replace " with \" when it is written to the file. The code I was trying to use was:
String temp = "\"Hello\" i am a \"variable\"";
temp = temp.replaceAll("\"","\\\\\"");
however, this does not work and when i print out the code to the file the resulting code appears as:
String someVar = ""Hello" i am a "variable"";
and not as:
String someVar = "\"Hello\" i am a \"variable\"";
I am assumming my regular expression is wrong. If it is, could someone explain to me how to fix it so that it will work?
Thanks in advance.Thanks, appearently I'm just doing something weird that I just need to look at a little bit harder.
-
Changeparticular characters in a string by using regular expressions ...
Hello Everyone,
I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
Is it possible by using regular expressions in oracle ?
Can you give me some clues ?
Thank youSSU wrote:
Hello Everyone,
I am trying to write a function by using oracles regular expression function REGEXP_REPLACE but I could not succed till now.
My problem as follows, I have a text in a column for example let say 'sdfsdf Sdfdfs Sdfd' I want replace all s and S characters with X and make the text look like 'XdfXdf XdfdfX Xdfd'.
Is it possible by using regular expressions in oracle ?
Can you give me some clues ?
Thank you
SQL> SELECT
2 regexp_replace('sdfsdf Sdfdfs Sdfd','s|S','X') from dual;
REGEXP_REPLACE('SD
XdfXdf XdfdfX XdfdRegards,
Achyut -
String extract using regular expression
Hi
I have text like this "<a>45</a><ct>Hi</ct><R>45 85</R><H>Here</H>" .I want to extract using regular expression or any techniques the text between <R> and </R> also need to replace the space with pipe between 45 and 85 like "45|85"
Edited by: vishnu prakash on Mar 2, 2012 4:42 AMHi,
Here's one way:
REPLACE ( REGEXP_REPLACE ( txt
, '.*<R>(.*)</R>.*'
, '\1'
, '|'
)This assumes there is only one <R> tag in txt.
Always say which version of Oracle you're using. The expression above will work in Oralce 10 and up, but starting in Oracle 11 you can use REGEXP_SUBSTR rather than the less intuitive REGEXP_REPLACE.
Edited by: Frank Kulash on Mar 2, 2012 7:48 AM -
Trying to use regular expressions to convert names to Title Case
I'm trying to change names to their proper case for most common names in North America (esp. the U.S.).
Some examples are in the comments of the included code below.
My problem is that *retName = retName.replaceAll("( [^ ])([^ ]+)", "$1".toUpperCase() + "$2");* does not work as I expect. It seems that the toUpperCase method call does not actually do anything to the identified group.
Everything else works as I expect.
I'm hoping that I do not have to iterate through each character of the string, upshifting the characters that follow spaces.
Any help from you RegEx experts will be appreciated.
{code}
* Converts names in some random case into proper Name Case. This method does not have the
* extra processing that would be necessary to convert street addresses.
* This method does not add or remove punctuation.
* Examples:
* DAN MARINO --> Dan Marino
* old macdonald --> Old Macdonald <-- Can't capitalize the 'D" because of Ernst Mach
* ROY BLOUNT, JR. --> Roy Blount, Jr.
* CAROL mosely-BrAuN --> Carol Mosely-Braun
* Tom Jones --> Tom Jones
* ST.LOUIS --> St. Louis
* ST.LOUIS, MO --> St. Louis, Mo <-- Avoid City Names plus State Codes
* This is a work in progress that will need to be updated as new exceptions are found.
public static String toNameCase(String name) {
* Basic plan:
* 1. Strategically create double spaces in front of characters to be capitalized
* 2. Capitalize characters with preceding spaces
* 3. Remove double spaces.
// Make the string all lower case
String retName = name.trim().toLowerCase();
// Collapse strings of spaces to single spaces
retName = retName.replaceAll("[ ]+", " ");
// "mc" names
retName = retName.replaceAll("( mc)", " $1");
// Ensure there is one space after periods and commas
retName = retName.replaceAll("(\\.|,)([^ ])", "$1 $2");
// Add 2 spaces after periods, commas, hyphens and apostrophes
retName = retName.replaceAll("(\\.|,|-|')", "$1 ");
// Add a double space to the front of the string
retName = " " + retName;
// Upshift each character that is preceded by a space
// For some reason this doesn't work
retName = retName.replaceAll("( [^ ])([^ ]+)", "$1".toUpperCase() + "$2");
// Remove double spaces
retName = retName.replaceAll(" ", "");
return retName;
Edited by: FuzzyBunnyFeet on Jan 17, 2011 10:56 AM
Edited by: FuzzyBunnyFeet on Jan 17, 2011 10:57 AMHopefully someone will still be able to provide a RegEx solution, but until that time here is a working method.
Also, if people have suggestions of other rules for letter capitalization in names, I am interested in those too.
* Converts names in some random case into proper Name Case. This method does not have the
* extra processing that would be necessary to convert street addresses.
* This method does not add or remove punctuation.
* Examples:
* CAROL mosely-BrAuN --> Carol Mosely-Braun
* carol o'connor --> Carol O'Connor
* DAN MARINO --> Dan Marino
* eD mCmAHON --> Ed McMahon
* joe amcode --> Joe Amcode <-- Embedded "mc"
* mr.t --> Mr. T <-- Inserted space
* OLD MACDONALD --> Old Macdonald <-- Can't capitalize the 'D" because of Ernst Mach
* old mac donald --> Old Mac Donald
* ROY BLOUNT,JR. --> Roy Blount, Jr.
* ST.LOUIS --> St. Louis
* ST.LOUIS,MO --> St. Louis, Mo <-- Avoid City Names plus State Codes
* Tom Jones --> Tom Jones
* This is a work in progress that will need to be updated as new exceptions are found.
public static String toNameCase(String name) {
* Basic plan:
* 1. Strategically create double spaces in front of characters to be capitalized
* 2. Capitalize characters with preceding spaces
* 3. Remove double spaces.
// Make the string all lower case
String workStr = name.trim().toLowerCase();
// Collapse strings of spaces to single spaces
workStr = workStr.replaceAll("[ ]+", " ");
// "mc" names
workStr = workStr.replaceAll("( mc)", " $1 ");
// Ensure there is one space after periods and commas
workStr = workStr.replaceAll("(\\.|,)([^ ])", "$1 $2");
// Add 2 spaces after periods, commas, hyphens and apostrophes
workStr = workStr.replaceAll("(\\.|,|-|')", "$1 ");
// Add a double space to the front of the string
workStr = " " + workStr;
// Upshift each character that is preceded by a space and remove double spaces
// Can't upshift using regular expressions and String methods
// workStr = workStr.replaceAll("( [^ ])([^ ]+)", "$1"toUpperCase() + "$2");
StringBuilder titleCase = new StringBuilder();
for (int i = 0; i < workStr.length(); i++) {
if (workStr.charAt(i) == ' ') {
if (workStr.charAt(i+1) == ' ') {
i += 2;
while (i < workStr.length() && workStr.charAt(i) == ' ') {
titleCase.append(workStr.charAt(i++));
if (i < workStr.length()) {
titleCase.append(workStr.substring(i, i+1).toUpperCase());
} else {
titleCase.append(workStr.charAt(i));
return titleCase.toString();
{code} -
Request some help, over procedure's performance uses regular expressions for its functinality
Hi All,
Below is the procedure, having functionalities of populating two tables. For first table, its a simple insertion process but for second table, we need to break the soruce record as per business requirement and then insert into the table. [Have used regular expressions for that]
Procedure works fine but it takes around 23 mins for processing 1mm of rows.
Since this procedure would be used, parallely by different ETL processes, so append hint is not recommended.
Is there any ways to improve its performance, or any suggestion if my approach is not optimized? Thanks for all help in advance.
CREATE OR REPLACE PROCEDURE SONARDBO.PRC_PROCESS_EXCEPTIONS_LOGS_TT
P_PROCESS_ID IN NUMBER,
P_FEED_ID IN NUMBER,
P_TABLE_NAME IN VARCHAR2,
P_FEED_RECORD IN VARCHAR2,
P_EXCEPTION_RECORD IN VARCHAR2
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_EXCEPTION_LOG_ID EXCEPTION_LOG.EXCEPTION_LOG_ID%TYPE;
BEGIN
V_EXCEPTION_LOG_ID :=EXCEPTION_LOG_SEQ.NEXTVAL;
INSERT INTO SONARDBO.EXCEPTION_LOG
EXCEPTION_LOG_ID, PROCESS_DATE, PROCESS_ID,EXCEPTION_CODE,FEED_ID,SP_NAME
,ATTRIBUTE_NAME,TABLE_NAME,EXCEPTION_RECORD
,DATA_STRUCTURE
,CREATED_BY,CREATED_TS
VALUES
( V_EXCEPTION_LOG_ID
,TRUNC(SYSDATE)
,P_PROCESS_ID
,'N/A'
,P_FEED_ID
,NULL
,NULL
,P_TABLE_NAME
,P_FEED_RECORD
,NULL
,USER
,SYSDATE
INSERT INTO EXCEPTION_ATTR_LOG
EXCEPTION_ATTR_ID,EXCEPTION_LOG_ID,EXCEPTION_CODE,ATTRIBUTE_NAME,SP_NAME,TABLE_NAME,CREATED_BY,CREATED_TS,ATTRIBUTE_VALUE
SELECT
EXCEPTION_ATTR_LOG_SEQ.NEXTVAL EXCEPTION_ATTR_ID
,V_EXCEPTION_LOG_ID EXCEPTION_LOG_ID
,REGEXP_SUBSTR(str,'[^|]*',1,1) EXCEPTION_CODE
,REGEXP_SUBSTR(str,'[^|]+',1,2) ATTRIBUTE_NAME
,'N/A' SP_NAME
,p_table_name
,USER
,SYSDATE
,REGEXP_SUBSTR(str,'[^|]+',1,3) ATTRIBUTE_VALUE
FROM
SELECT
REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
FROM
DUAL t1 CROSS JOIN
TABLE
CAST
MULTISET
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
AS SYS.odciNumberList
) t2
WHERE REGEXP_SUBSTR(str,'[^|]*',1,1) IS NOT NULL
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
RAISE;
END;
Many Thanks,
ArpitRegex's are known to be CPU intensive specially when dealing with large number of rows.
If you have to reduce the processing time, you need to tune the Select statements.
One suggested change could be to change the following query
SELECT
REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
FROM
DUAL t1 CROSS JOIN
TABLE
CAST
MULTISET
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
AS SYS.odciNumberList
) t2
to
SELECT REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,level) str
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
Before looking for any performance benefit, you need to ensure that this does not change your output.
How many substrings are you expecting in the P_EXCEPTION_RECORD? If less than 5, it will be better to opt for SUBSTR and INSTR combination as it might work well with the number of records you are working with. Only trouble is, you will have to write different SUBSTR and INSTR statements for each column to be fetched.
How are you calling this procedure? Is it not possible to work with Collections? Delimited strings are not a very good option as it requires splitting of the data every time you need to refer to. -
Using regular expressions in java
Does anyone of you know a good source or a tutorial for using regular expressions in java.
I just want to look at some examples....
Thanksthanks a lot... i have one more query
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
if i want to use the $ for comparing with string(text) then how can i use it.
Eg if it is $120 i got a hit
but if its other than that if should not hit. -
How to define a regular expression using regular expressions
Hi,
I am looking for some regular expression pattern which will identify a regular expression.
Also, is it possible to know how does the compile method of Pattern class in java.util.regex package work when it is given a String containing a regex. ie. is there any mechanism to validate regular expression using regular expression pattern.
Regards,
AbhisekI am looking for some regular expression pattern which will identify a regular
expression. Also, is it possible to know how does the compile method of
Pattern class in java.util.regex package work when it is given a String
containing a regex. ie. is there any mechanism to validate regular
expression using regular expression pattern.It is impossble to recognize an (in)valid regular expression string using a
regular expression. Google for 'pumping lemma' for a formal proof.
kind regards,
Jos
Maybe you are looking for
-
Can't install Oracle 9i ... why?
Hello, I am trying to install Oracle that I downloaded from the official website (learning purpose). The problem is that, when I click on 'setup.exe' from disk1, it loads the CD for some seconds, then nothing more happens. The Oracle version is 9i R2
-
Update Consumption Value in EDM Profile From Meter Readings in DM
Dear, Can it be possible that consumptions values from meter readings (for non interval readings / Non AMR meters) in DM imported / transfered to profiles in EDM. Sohail Ashraf
-
Just installed Elements 13. The "Expand all items in Version set" is not working for me. What am I doing wrong?
-
How come I get the message (Not Charging) when I am plugged up to power?
How come I get the message (Not Charging) when I am plugged up to power? I keep getting this message, is there a problem? Or is this normal
-
How to completely delete photos?
Hi, I am using iphoto 5 on my G5 iMac running leopard. I am taking a digital photo class and we are taking a lot of 'test' pictures to learn the manual controls. So I am deleting 75% of them after downloading, in order to determine which meet the cla