Find text by regular expression

I'm struggling with a problem. I have some files that contain a string like:
"name": "someName"
I can find these using egrep like so:
egrep -r '"name": "[^"]*"' /some/folder/path/
This works admirably, except that it returns more than I really want. It gives me something like this:
/path/to/Chrome/extension/manifest.json: "name": "Extension Name"
What I really want is just the name itself, eg:
Extension Name
I don't care about anything else. I'm running these searches from an AppleScript, which already knows what folder it's looking in, and just needs to know 1) if a match was found, and if so, 2) what the name was.
How can I do this? Is there a way to do a search that only returns a particular portion of the results, or would I need to run the results through something else to filter it further?
Thanks in advance!

MrHoffman wrote:
If I've guessed correctly at what you're doing
Heh... I imagine you have. I wasn't sure who might frequent this forum, so didn't want to make any assumptions.
You may or may not know that I have an EtreCheck-like script that I use in conjunction with my Adware Removal Tool. It has proven to be extremely helpful in locating new adware, but it still somewhat limited in getting information on Firefox and Chrome extensions. It has proven difficult to get names, especially for Chrome extensions. I'm working hard on trying to improve it.
An excerpt from a sample manifest.json file in a Chrome extension - modified slightly for clarity - looks like this:
   "manifest_version": 2,
   "name": "Some extension name",
   "offline_enabled": true,
   "version": "6.3"
I'm using egrep to search for this "name" key so that I don't have to make too many assumptions about the internal structure of the extension folder. I know the manifest.json file is in there somewhere, and will have this key in it, so this is an easy way to find that name. To just display the name, the simple egrep I'm using is adequate, but in some cases there's special data in that name string that indicates where to find a localized name, so I need to be able to do something special in those cases, and need just the name string and nothing else.
The solution turns out to be a combination of several responses. The addition of the -o and -h flags gets me close to what I want, and passing it through cut with a double-quote as the delimiter works to trim it down perfectly. So now I can do:
egrep -roh '"name": "([^"]*)"' /path/to/folder/ | cut -d \" -f 4
The egrep will return:
"name": "Some extension name"
Passing it through cut with the above parameters gives exactly what I need... just:
Some extension name
Thanks to all for the advice!

Similar Messages

Find text using regular expression and add highlight annotation

Hi Friends
Is it possible to find text using regular expression and add highlight annotation using plugin

A plugin can use the PDWordFinder to get a list of the words on a page, and their location. That's all that the API offers for searching. Of course, you can use a regular expression library to work with that word list.

Finding URLs using regular expression.

I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.
"Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"
I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.
For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.
My acutal problem is to find the URL irrespective its position within the string.
Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);
Matcher matcher = urlPattern.matcher(plainText);
Map stringIndexMap = new HashMap();
//Searching the input string for urlPattern...
while(matcher.find()) {
String urlString = matcher.group();
//Storing the urls in a hashmap with their indices as keys....
stringIndexMap.put(new Integer(matcher.start()), urlString.trim());
Set keySet = stringIndexMap.keySet();
Iterator it = keySet.iterator();
//Iterating over the hashmap containing urls...
while(it.hasNext()) {
String urlString = (String) stringIndexMap.get(it.next());
* Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"
* using String index
clickableURLString.replace(clickableURLString.indexOf(urlString),
clickableURLString.indexOf(urlString) + urlString.length(),
"<a href=\"#\" onclick=\"window.open('" + urlString
+ "')\">" + urlString + "</a>");
return clickableURLString.toString();

The end of the input is '$' as a regex.
import java.util.regex.*;
public class Prasanna{
public static void main(String[] args){
 String text
= "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";
// String regex = "(http|https)://.+?(?:\\s|$)"; // this works
 String regex = "(http|https)://[^ ]+"; // this also works
 Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
 Matcher mat = pat.matcher(text);
 while (mat.find()){
 System.out.println(mat.group());
}

Find/replace and regular expression problem

Hello, i'm using find and replace with a regular expression
for the first time. I have it checkmarked and it's finding my text
but it's missing (not highlighting) the ')' at the end of the line.
Here's my code:
[($[0-9]+US)]
it's supposed to find everything inside the square brackets -
but it misses the closing parenthesis after . I need
to find this string and replace with nothing to remove the string
from any/all pages. Is there a reason why it's missing the closing
parenthesis? I was actually able to add a few more parenthesis
(e.g. "))))") before OR after the closing square bracket and it
still found the original text minus the closing bracket and the
extra parenthesis didn't prevent the text from being found.
Any help is appreciated!
James...

WyattEA wrote:
> Hello, i'm using find and replace with a regular express
for the first time. I
> have it checkmarked and it's finding my text but it's
missing (not
> highlighting) the ')' at the end of the line. Here's my
code:
>
> [($[0-9]+US)]
That's not how square brackets work
Try:
$\$\d+US$
A left parens, followed by the dollar sign, followed by at
least one
digit, followed by US,
followed by a right parens.
Mick
>
> it's supposed to find everything inside the square
brackets - but it misses
> the closing parenthesis after . I need to
find this string and replace
> with nothing to remove the string from any/all pages. Is
there a reason why
> it's missing the closing parenthesis? I was actually
able to add a few more
> parenthesis (e.g. "))))") before OR after the closing
square bracket and it
> still found the original text minus the closing bracket
and the extra
> parenthesis didn't prevent the text from being found.
>
> Any help is appreciated!
>
> James...
>

How to find sunstring with regular expression?

How can I find a substring in a string with a regular expression?
Example:
I have a original string "<tr><th>RecordId: </th><td valign=middle>A4711</td></tr>"
Now i want to extract the value "A4711" from this string with a regular expression. Everything except "A4711" is fixed, the id "A4711" itself is dynamic. How is it possible to get the substring "A4711" of the original string with a regular expression?

i wrote a little method with the infos above to get such results:
 * Get all substrings of a string that matches a regular expression.
 * @param original String to inspect.
 * @param regExp Regular expression as search criteria.
 * @return All matches of regExp or null if one input parameter is null.
 public static String[] getSubstrings(String original, String regExp) {
 String[] result = null;
 if (original != null && regExp != null) {
 Pattern pattern = Pattern.compile(regExp);
 Matcher matcher = pattern.matcher(original);
 boolean matchFound = matcher.find();
 Vector matches = new Vector();
 while (matchFound) {
 String match = matcher.group();
 matches.addElement(match);
 matchFound = matcher.find();
 }//next match
 int count = matches.size();
 result = new String[count];
 for (int i = 0; i < count; i++) {
 result[i] = (String) matches.elementAt(i);
 }//next match
 }//else: input unavailable
 return result;
 }//getSubstrings()

VISA Find resource function regular expression

Hi guys,
I've been trying to get which Serial port is a GPS receiver connected to using the VISA Find Resource Function with no luck. The idea is to use a regular expression similar to
ASRL?*INSTR{VI_ATTR_ASRL_BAUD == 9600}
but instead of looking for baud rate, I want to search the value
ASRL3 (COM3 - GNSS Receiver)
as seen in MAX/VISA Test Panel. The attribute name is VI_ATTR_INTF_INST_NAME.
Something like ASRL?*INSTR{VI_ATTR_INTF_INST_NAME == ASRL? (COM? - GNSS Receiver)} should work, but it's not.
How should I write the expression?
Thanks!
Best regards,
Néstor
LabVIEW 2011 + Windows 7 32bits SP1

Hi Dennis,
thanks for answering.
I haven't assigned the name, MAX did it. I just opened MAX and there it was, COM3 (In Devices and interfaces/ASRL3/settings/name). I suppose it gets directly the Windows serial port name, what name do you mean exactly?
Anyway, what I'm interested in getting is the Port description, that would tell me the connected device in that port. I could build a small loop and look for the interface description of each serial device, but I found the VISA Find Resourde function and it seems a more simple and direct way to get what I want.
If I can list all serial devices with 9600 baud rate as in the previous example, why not do the same with the instrument name? I clearly see it when I open the VISA Test Panel. I am maybe missing something?
Best regards,
Néstor
LabVIEW 2011 + Windows 7 32bits SP1

Find/Replace Using Regular Expressions

Can someone help me with this...I am using Regular expressions to
FIND:
http.*lid=([^&"]*)[^"]*
REPLACE:
$set(\1,ID_id,code)$
So that in the following it will change this:
a href="http://www.test.com/shc/s/home_10153_12605?lid=Search" rilt="Search"
To this:
a href="$set(Search,ID_id,code)$" rilt="Search
Those expressions work in Notepad++ but when i use dreamweaver it just replaces the http... with "$set(\1,ID_id,code)$" and doesnt reference the "search"
Any help?
Thanks

Let me begin by saying I'm a complete idiot with DW's Reg Ex. I use Search Specific Tag whenever possible. See screenshot below.
Try this on your Current Document to see if it works. Then make a back-up copy of site before attempting it on Entire Local Site as you cannot "Undo" this process.
Good luck,
Nancy O.

How i can validate a entered text against regular expressions ?

Thank you for reading my post.
how i can validate an entered text to checkk its syntax to ensure that it is a domain name ?
I think i should use RE , but i do not know how i can do this.
Thank you

If you want to validate at client side, you need to create a javascript function (validation) and add it to the "onBlur" attribute of the TextField component In the propertysheet (Set it via JavaScrip->onBlur in the property sheet). To put the Actual Java Script, you need to edit the JSP page. If you need to do it server side, create a custom validator.
http://developers.sun.com/prodtech/javatools/jscreator/learning/tutorials/2/customvalidator.html
- WInston
http://blogs.sun.com/winston

Regular expressions in Format Definition add-on

Hello experts,
I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.
Summary of my problem:
In SAP Business One (patch level 42) it is possible to use bank statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached code snippet).
:61:071222D208,00N026
:86:P 12345678BELASTINGDIENST F8R03782497 $GH
$0000009 BETALINGSKENM. 123456789123456
0 1234567891234560
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN
:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02
- I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, python, etc.)
Besides that I need the regular expressions below, so the Format Definition will match the right lines from my bankfile.
- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
I am looking forward to the right solutions, I can give more info if you need any.

Hello Hendri,
Q1:I am in search of the right type of regular expression that is used by the Format Definition add-on (javascript, .NET, perl, JAVA, pythonetc.)
Answer: Format Definition uses .Net regular expression.
You may refer the following examples. If necessary, I can send you a guide about how to use regular expression in Format Defnition. Thanks.
Example 6
Description:
To match a field with an optional field in front. For example, u201C:61:0711211121C216,08N051NONREFu201D or u201C:61:071121C216,08N051NONREFu201D, which comprises of a record identification u201C:61:u201D, a date in the form of YYMMDD, anther optional date MMDD, one or two characters to signify the direction of money flow, a numeric amount value and some other information. The target to be matched is the numeric amount value.
Regular expression:
(?<=:61:\d(\d)?[a-zA-Z]{1,2})((\d(,\d*)?)|(,\d))
Text:
:61:0711211121C216,08N051NONREF
Matches:
1
Tips:
1. All the fields in front of the target field are described in the look behind assertion embraced by (?<= and ). Especially, the optional field is embraced by parentheses and then a u201C?u201D (question mark). The sub expression for amount is copied from example 1. You can compose your own regular expression for such cases in the form of (?<=REGEX_FOR_FIELDS_IN_FRONT)(REGEX_FOR_TARGET_FIELD), in which REGEX_FOR_FIELDS_IN_FRONT and REGEX_FOR_TARGET_FIELD are respectively the regular expression for the fields in front and the target field. Keep the parentheses therein.
Example 7
Description:
Find all numbers in the free text description, which are possibly document identifications, e.g. for invoices
Regular expression:
(?<=\b)(?<!\.)\d+(?=\b)(?!\.)
Text:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
Matches:
6
Tips:
1. The regular expression given finds all digits between word boundaries except those with a prior dot or following dot; u201C.u201D (dot) is escaped as \.
2. It may find out some inaccurate matches, like the date in text. If you want to exclude u201C-u201D (hyphen) as prior or following character, resemble the case for u201C.u201D (dot), the regular expression becomes (?<=\b)(?<!\.)(?<!-)\d+(?=\b)(?!\.)(?!-). The matches will be:
:86:GIRO 6890316
ENERGETICA NATURA BENELU
AFRIKAWEG 14
HULST
3187-A1176
TRANSACTIEDATUM* 03-07-2007
You may lose some real values like u201C3187u201D before the u201C-u201D.
Example 8
Description:
Find BP account number in 9 digits with a prior u201CPu201D or u201C0u201D in the first position of free text description
Regular expression:
(?<=^(P|0))\d
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. Use positive look behind assertion (?<=PRIOR_KEYWORD) to express the prior keyword.
2. u201C^u201D stands for that match starts from the beginning of the text. If the text includes the record identification, you may include it also in the look behind assertion. For example,
:86:0000006681 FORTIS ASR BETALINGSCENTRUM BV
The regular expression becomes
(?<=:86:(P|0))\d
Example 9
Description:
Following example 8, to find the possible BP name after BP account number, which is composed of letter, dot or space.
Regular expression:
(?<=^(P|0)\d)[a-zA-Z. ]*
Text:
0000006681 FORTIS ASR BETALINGSCENTRUM BV
Matches:
1
Tips:
1. In this case, put BP account number regular expression into the look behind assertion.
Example 10
Description:
Find the possible document identifications in a sub-record of :86: record. Sub-record is like u201C?00u201D, u201C?10u201D etc. A possible document identification sub-record is made up of the following parts:
u2022 keyword u201CREu201D, u201CRGu201D, u201CRu201D, u201CINVu201D, u201CNRu201D, u201CNOu201D, u201CRECHNu201D or u201CRECHNUNGu201D, and
u2022 an optional group made up of following:
 a separator of either a dot, hyphen or slash, and
 an optional space, and
 an optional string starting with keyword u201CNRu201D or u201CNOu201D followed by a separator of either a dot, hyphen or slash, and
 an optional space
u2022 and finally document identification in digits
Regular expression:
(?<=\?\d(RE|RG|R|INV|NR|NO|RECHN|RECHNUNG)((\.|-|/)\s?((NR|NO)(\.|-|/))?\s?)?)\d+
Kind Regards
-Yatsea

Regular expressions in ActionScript??

I have been looking at the Adobe publication Programming Action Script (pdf) and it
specifies ECMA-262 3rd edition specification. But the specification don't seem to
state exactly what type of regular expression engine and version is used.
Is it POSIX, or PERL compatible regular expressions (or both)?
I have read and used the classic O'Reilly text Mastering Regular Expressions
and coded regular expressions in javascript/php/etc (anywhere regular expressions
could be use, Apache configuration file, other server config files, etc etc etc)
There is a difference in the type of engine used, where as performance is
concerned, as well as the range of syntax valid in a particular implementation.
Thank You
JK

http://www.regular-expressions.info/javascript.html

Carriage Return - Regular Expression

Hi guys,
I'm looking for an effective method to speed up the extreme optimization process in my work (finally to not do it manually).
The particulary issue is to find a good regular expression to replace the carriage returns in the source code with nothing.
I searched on the net, and many sources converge on the RegExr tool: http://gskinner.com/RegExr/
I tried to set up an expression to solve my problem but it doesn't work. The expression that was generated by the tool is:
Find: /\r/g Replace: (none)
When i enter the expression in Dreamweaver Find & Replace panel (with regular expression option checked and match case, etc. unchecked), it seems to not produce any valid change on the code.
I'm sure that i'm wrong something.
Anyone have suggestion?
Thanks all for help.

I don't understand the point of this. There's very little to be gained from removing white space from code. And if you do this to JavaScript, you'll very likely break the code.
Safer method, go to Edit > Preferences > Code Format > click on Tag Libraries and define how you want your code formatted. Then apply with Command > Apply Source Formatting.
Nancy O.

Regular Expression Challenge

Hello, everybody!
I'm trying to come up with a regular expression to validate a identifier that starts and ends with double quotes. That's what I have until now:
\"[a-zA-Z_][\w]*\"
This regex is working. It matches identifiers like:
"name"
"name_1"
"_name123"but the challenge that I couldn't solve is this: I would like to augment this regex to also accept escaped double quotes inside the string itself, like this:
"na""me" (correct, double quote inside escaped by another double quote)
"name""""_1" (correct, two double quotes inside escaped by double quotes)
"""_name123" (correct, double quote inside escaped by another double quote)and reject strings that miss the escape double quote, like the following:
"na"me"
"name""" "_1"
"" "_name123"as you can see, besides the start and end quotes, if there's a double quote inside the identifier, it has to be followed +immediately+ by another double quote, it has to be a pair, no matter how many and where. I tried to find such a regular expression but I didn't have success. I couldn't find a way to say that a double quote inside the identifier has to be followed by another double quote.
Any help would be appreciated.
Thank you in advance.
Marcos

Marcos_AntonioPS wrote:
r035198x wrote:
Marcos_AntonioPS wrote:
..If you are not willing/able to learn about them and make an attempt then you will find it difficult to get help here.
You will also find it difficult to get help if you insult regulars when they try to advise you on following the posting guidelines by chosing an appropriate thread title.r035198x, I didn't insult regulars. But you insulted me. I consider saying that I was trying to make someone here do 'my homework' an insult. I think that the worst insults in life are the ones that we say in a sarcastic manner, like yours.
MarcosI can't think of one instance where a 'challenge' has been issued in these forums that was not an indirect request for someone's homework to be done. Yours could be the first but even then is stinks of "I can't be bothered to learn about look-ahead so can some kind person solve my problem for me?". Issuing a 'challenge' like this is an insult to us. It assumes we are gullible enough to do your work for you.
You can get somewhere towards a solution using look-ahead but you will have to place restrictions since regex can't count.

Using Regular Expressions to Find Quoted Text

I have run into a couple problems with the following code.
1) Slash-Star and Slash-Slash commented text must be ignored.
2) It does not detect backslashed quotes, or if that backslash is backslashed.
Can this be accomplished with Regular Expressions, or should I implement this using if/indexOf logic?
Thank You in advance,
Brian
 * Finds position of next quoted string in a line
 * of source code.
 * If no strings exist, then a Pointer position of
 * (0,0) is returned.
 * @param startPos position to start search from
 * @param argText the line of text to search
 * @returns next string position
 public Pointer getQuotedStringPosition(int startPos, String aString) {
 String argText = new String( aString );
 Pattern p = Pattern.compile("[\"][^\"]+[\"]");
 Matcher m = p.matcher( argText.substring(startPos); );
 if( m.find() )
 return new Pointer( m.start() + startPos, m.end() + startPos );
 else
 return new Pointer( 0, 0 ); // indicates nothing was found
 }

YATArchivist was right about the regular expressions.
I think I've got it but somebody test it if you want. Let me know what you find.
I've included a barebones Position class as well...
import java.util.regex.*;
import java.io.*;
import java.util.*;
@author Joshua A. Logan, Jr.
public class RegexTest
 private static final String SLASH_SLASH = "(//.*)";
 private static final String SLASH_STAR =
 "(/\\*(?:[^\\*]|(?:\\*(?!/)))+(\\*/)?)";
 private static final Pattern COMMENT_PATTERN =
 Pattern.compile( SLASH_SLASH + "|" + SLASH_STAR );
 private static final Pattern QUOTED_STRING_PATTERN =
 Pattern.compile( "\" ( (?:(\\\\.) | [^\\\"])*+ ) \"",
 Pattern.COMMENTS );
 // Breaking the above regular expression down, you'd have:
 // " ( (?: (\\ .) | [^\\ "] ) *+ ) "
 // ^ ^ ^ ^ ^ ^
 // | | | | | |
 // 1 2 3 4 5 6
 // which matches:
 // 1) The starting quote...
 // Followed by something that is either:
 // 2) some escaped sequence ( e.g. _\n_ or even _\"_ ),
 // 3) ...or...
 // 4) a character that is neither a _\_ nor a _"_ .
 // 5) Keep searching this as much as possible, w/o giving up
 // any found text at the end.
 // Note: the text found would be in group(1)
 // 6) Finally, find the ending quote!!
 public static Position [] getQuotedStringPosition( final String text )
 Matcher cm = COMMENT_PATTERN.matcher( text ),
 qm = QUOTED_STRING_PATTERN.matcher( text );
 final int len = text.length();
 int startPos = 0;
 List positions = new ArrayList();
 while ( startPos < len )
 if ( cm.find(startPos) )
 int commStart = cm.start(),
 commEnd = cm.end();
 // are we starting @ a comment?
 if ( commStart == startPos )
 startPos = commEnd;
 else if ( qm.find(startPos) )
 // Search for unescaped strings in here.
 int stringStart = qm.start(1),
 stringEnd = qm.end(1);
 // Is the quote start after comment start?
 if ( stringStart > commStart )
 startPos = commEnd; // restart search after comment end...
 else if ( (stringEnd > commEnd) ||
 (stringEnd < commStart) )
 // In this case, the "comment" is actually part of
 // the quoted string. We found a match.
 positions.add( new Position(text, qm.group(1),
 stringStart,
 stringEnd) );
 int quoteEnd = qm.end();
 startPos = quoteEnd;
 else
 throw new IllegalStateException( "illegal case" );
 else
 startPos = commEnd;
 else
 // no comments were found. Search for unescaped strings.
 int quoteEnd = len;
 if ( qm.find( startPos ) ) {
 quoteEnd = qm.end();
 positions.add( new Position(text,
 qm.group(1),
 qm.start(1),
 qm.end(1)) );
 startPos = quoteEnd;
 return positions.isEmpty() ? Position.EMPTY_ARRAY
 : (Position[])positions.toArray(
 Position.EMPTY_ARRAY);
 public static void main( String [] args )
 try
 BufferedReader br = new BufferedReader(
 new InputStreamReader(System.in) );
 String input = null;
 final String prompt = "\nText (q to quit): ";
 System.out.print( prompt );
 while ( (input = br.readLine()) != null )
 if ( input.equals("q") ) return;
 Position [] matches = getQuotedStringPosition( input );
 // What does it do?
 for ( int i = 0, max = matches.length; i < max; i++ )
 System.out.println( "-->" + matches[i] );
 System.out.print( prompt );
 catch ( Exception e )
 System.out.println ( "Exception caught: " + e.getMessage () );
class Position
 public Position( String target,
 String match,
 int start,
 int end )
 this.target = target;
 this.match = match;
 this.start = start;
 this.end = end;
 public String toString()
 return "match==" + match + ",{" + start + "," + end + "}";
 final String target;
 final int start;
 final int end;
 final String match;
 public static final Position [] EMPTY_ARRAY = { };
}

Regular expression - find if string does NOT contain text....

I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
joe schmoe<=jack, jane
should become
joe
schmoe
<=
jack
jane
As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Here's the code I plan on using for this:
 public String[] getWords(String input) {
 Matcher matcher = WORD_PATTERN.matcher(input);
 ArrayList<String> words = new ArrayList<String>();
 while (matcher.find()) {
 words.add(matcher.group());
 return (String[]) words.toArray(new String[0]);
 }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
([^\s]|^(,)|^(<=))+|,|<=
but that produces the following:
joe
schmoe<=jack,
jane
I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

Try:
* Tokenizer.java
* version 1.0
* 01/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class StrangeTokenizer {
 public static void main(String[] args) {
 String text = "joe schmoe<=jack, jane";
 Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
 Matcher matcher = pattern.matcher(text);
 while( matcher.find() ) {
 System.out.println( "Item: " + matcher.group(1) );
}May the code be with you.

Find text Between tags with a Regular Expression

I am trying to find specif text -- table names - within a
<cfquery> tag in all my cfm files. I am using an extend find
function in Homesite (I think Dreamweaver has the same
functionality). This expression works:
<[Cc][fF][qQ][uU][eE][rR][Yy]
[^>]*>[^>]*(EventName|AttendeeName)[^>]*</[Cc][fF][qQ][uU][eE][rR][Yy]>
for find the text EventName or AttendeeName. However, if
there are other cf tags like <cfif> within the
<cfquery>, then the tag/text is not found.
Can anyone help? It is a useful expression to have if you are
trying to transfer applications developed on a windows machine to
a, say, linux machine, and have table name sensititvity issues with
mySql.

quote:
Originally posted by:
Newsgroup User
Thanks for all the help. Comments below.
> Thanks, but it:
> 1) Captures everything between the first and last query
in a script if there
> is more than one cfquery in the script
Oops: sorry. Stick a question mark after the asterisks to
stop the matches
being greedy.
Used this:
<[Cc][fF][qQ][uU][eE][rR][Yy].*?(EventName|AttendeeName)[^>].*?</[Cc][fF][qQ][uU][eE][rR][ Yy]>
and got some finds again with multiple queries and some
errors as mentioned below.
> 2) It produces some regular expression errors in
Homesite.
Can't help you there. Sounds like HS's regex processor is
bung: there's
nothing non-standard or tricky about that regex (which might
cause
compatibility issues; JS vs PERL vs Java, etc).
HS on the whole is bung (IMO). Have you considered using a
text editor
that is... err... *current*? ;-)
No, can you suggest one. Just use HS for years and it does
most of what I want.
What sort of errors is it giving?
Regular expression error No 17. Bad expression format or
internal error.
> The reason for this is I am developing on a windows
machine with mysql and
> want to use the application online on a linux machine
where table names are
> case sensitive. My code was not always faithful to that
since in windows you
> can be sloppy!
Have you seen this:
http://dev.mysql.com/doc/refman/5.0/en/identifier-case-sensitivity.html
It might be a better approach anyhow.
Adam

Find text by regular expression

Similar Messages

Maybe you are looking for