About Regex

Hi all,
I am working on regular expressions.I need some help on matching the string ,replacing it and also it parses the remaining file according to the specifications shown below:
like the string is
|12| 23 |34| 45| 56 abc 78| 65 ABC..................
what i have to do is break the string separate into columns with the number in the pipe becoming key and outside corresponding value and also to make 'abc' in uppercase format......
thanks

thanks for the reply..
i have the solution for separating into columns but
t i need the expression pattern for matching the
special character See my example with the call to split().
,replacing it into uppercase and atSee my comments on toUppercase().
the same time it should separate into columns..What do you mean? Strings don't have columns. Split() will create an array, where each element of the array is what's between the | characters.

Similar Messages

About regex and 'like' query

Dear all,
can i doing regex query in SAP B1?
And how to use 'like' query with table, i mean :
SELECT * FROM test T0 INNER JOIN test2 T1
WHERE T0 LIKE '%T1.testfield%'
thanks for your help

wait - something came out funny in the previous posting - the system highlighted the name "field" with blue - that is NOT what I typed...
instead I typed...
"field" between two brackets ( bracket = [ and the other bracket - I cannot type them in because they come out as a different character)
what the heck is going on with the forum here??? I am seeing this highlighting in other postings as well...
Hope THIS one comes out correctly
Edited by: Zal Parchem on Dec 29, 2007 2:47 PM

Help needed about regex usage.

hi all,
i am kind of new to java.util.regex, and i can't solve this problem:
i have a big string, that actually represents the text code of a method. in this method, i have an object, called, let's say "visitor" and i call several non static methods of it. i need to search this string and put into an ArrayList<String> all the occurrences of visitor, the method called, and its parameters.
if i wasn't very clear, here is an example:
the input string: "public void test_add()
          TemplateTestMethodVisitor visitor = new TemplateTestMethodVisitor(XMLArrayListTest.class);
          XMLArrayListComparer comparer = new XMLArrayListComparer();
          while(visitor.moveNext())
               fillArrayList(visitor);
               Exception exception = null;
               try
                    _testedArray.add(visitor.getParameterAsInt("position"), ValueFactory.getMemberValue(visitor,
                              visitor.getParameterAsString("type"), "ElementValue"));
               catch(Exception e)
                    exception = e;
               ExceptionValidator.validateException(visitor, exception);
               if(exception != null)
                    continue;
               VisitorAssert.assertEquals(visitor, "The content of the array list is not as expected.",
                         visitor.getResultAsCollectionData("List"), _testedArray, comparer);
and i need to obtain an ArrayList<String> with {"visitor.moveNext()", "visitor.getParameterAsInt("position")" , "visitor.getParameterAsString("type")", "visitor.getResultAsCollectionData("List")"}. and it would be very helpful for me to do this using regex, but i can't think of a right pattern...
thank you all.

import java.io.*;
import java.util.regex.*;
public class TestRegexp {
public static void main(String[] argv) {
    Pattern p = Pattern.compile("visitor\\..*?\$.*?\$");
    try {
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      String line;
      while((line = in.readLine()) != null) {
        Matcher m = p.matcher(line);
        while(m.find()) {
          System.out.println(m.group());
    } catch(IOException e) {
      e.printStackTrace();
}

Tokenize xml maybe using RegEx?

I'm working on a simple search engine for dynamically loaded XML data. I have data of this form (more or less):
<sessions>
<session id=##>
          <title><![CDATA[The Title]]></title>
          <presenter>A person or two goes here with title</presenter>
          <date>2011-2-15-10-00-a</date>
          <webex><![CDATA[https://alink.com]]></webex>
          <audience> <![CDATA[Various types that might be interested]]> </audience>
          <desc><![CDATA[A longish description that might include some simple html tags like bold or some lists]]></desc>
          <resources>
          <resource>
          <name><![CDATA[A slide deck, website, white patper, etc.]]></name>
          <link active="true"><![CDATA[thelink to the resourcepdf]]></link>
          <tip><![CDATA[A description of what is at the site or why the resource is interesting]]></tip>
          </resource>
     <resource>
     </resource>
          </resources>
</session>
<session>
</sessions>
</sessions>
I need to break apart all the "useful" words and run them through my indexer. Currently I'm using e4x to pull out certain nodes and get the content as a string. Then I'm using something like this to break it all up:
var tokens:Array=[" - ","?",",","."....etc];
for(var i:int=tokens.length-1;i>=0;i--){
   str=str.split(tokens[i]).join(" ");
Is there a quicker, more efficient, better way to do this? I'm just learning about RegEx and think it could maybe have some use here, but I'm not all that good with it. Part of the problem here is that the tokens array needs to take into account all of the possible characters that could signal divisions between words. But there are some many of them. It might be simpler to go the route of here are the things we want to keep. That list is much shorter.
remove any xml tags and remove any html tags
keep A-Za-z (including accented letters such as grave, acute, umlaut, etc.)
keep ' or - when they are in the middle of a word, i.e. surrounded by letters
everything else goes
So there are really two parts to this:
1. What is the best, fastest, easiest way to extract all the data from the xml.
2. What is the most reliable easiest way to break all that data into just the words.

What I'm seeing in these RegExps:
re1 in English:
Globally in the current string, any <![cdata[ or ]]> or <*> or http(s)://*\s or /.,(3 chars)"“”!(not)?(up to)@#$%^(and)*(nothing)[](n/a amount)(invalid range)–—(one or more):;<>©®™= or -(invalid range)(one or more than)
The substitutions are in (parens).
Think of RegExp like a language who's sole purpose is to give you a ton of wildcards with programattic-like features to "describe" content you want. Using characters like ! (exclaimation point) actually mean "not" just like they do in AS3.
So to match a string that has NO lowercase 'a':
/!a/
That's why I mentioned (not) in the description, for a simple example. If you explicitly want a character the safest thing you can do is escape it just like you did with the brackets. To match an exclaimation point:
It's just like "reserved words" in coding. You'd never make a variable name like 'for' or 'if' because you know the compiler will balk. Same deal with RegExp. Knowing what are operators (|,&,[,!,^,$,{,},(,),.,etc) will help solidify your meaning. There's tons of reference guides out there but being Perl was the big proponent of regular expressions I often just follow the simple PHP preg_* function syntax referece (click the links at the top for categories: http://php.net/manual/en/reference.pcre.pattern.syntax.php )
Any time you add in an "or" with | you're better off making a new RegExp for that. It's much easier to debug smaller complicated RegExps than a string of a bunch all together. trace() your string between every step to see which RegExps are misbehaving and medicate as needed.
For your example, from what I assume you want to do is just remove things. I'd do it like so:
removal of CDATA wrapper:
var str:String = '<![cdata[this is some text]]> moo';
var cdataRe:RegExp = /\<\!\[cdata\[(.*)?\]\]\>/i;
str = str.replace(cdataRe,"$1");
trace(str);
// trace: this is some text moo
This is a replace that shows you parenthesis's ability to capture text. Captured text will be put (in order of parenthesis) inside variables $1, $2, $3, etc. I captured the text between CDATA tags and my replacement was only the text inside it.
removal of any HTTP(S), RTMP, FTP links:
// important to note no space after, but will match
// taking out ftp,http,rtmp
var str:String = 'this is some text http://www.moo.com/a/b/c/?ref=123&q=2 and https://www.foo.com/cpanel/?a=login.do links HTTPS:// RTMP://media.someserver.com/moo.flv ftp://woo:[email protected]';[email protected]';
var httpsRe:RegExp = /[fhr]t{1,2}m*ps*\:\/\/.*?\s+|[fhr]t{1,2}m*ps*\:\/\/.*$/igm;
str = str.replace(httpsRe,'');
trace(str);
// trace: this is some text and links
You get the idea. I'm describing every bit of the text as I go. I wanted to show a decent usage of the | (or) branch in the case of removing 2 different types of links. A link in the beginning or middle of a sentence will have a space after it, or if the link is at the end of the string with no space. However it's not perfect. You run tests on it and you'd see if it ended up at the end of a sentence and there was a period, that period would get eaten too. It's exceptions like "no space after it" or "end of a sentence" that greedy RegExps need a lot of extra conditional logic on. That's why I woudln't bundle more than a single purpose RegExp because when you REALLY field test against data those seemingly simple one-purpose RegExps end up being huge.
The re2 I see above seems to have some very specific data sent to it. It's saying: A string containing a return or newline or space followed by quotes or apostrophe or dash followed by quotes or apostrophe or dash followed by a return or newline or space or just one or more spaces.
That's a pretty weird RegExp. That would match something like this:
var a:String = "
// or
var b:String = ' "" ';
The final 'or' is the only thing I'd condense because you have it in your bracket already. You're saying at the end [\r\n\s] or \s+. So:
var re2:RegExp = /[\r\n\s]["'\-]{2}[\r\n]*\s*/gm;
Writing it like that just states either return or newline or one or more spaces will match. You can see the usage of braces marking the range of matches I desire, so {2} means I need 2 of the previous characters specified in a row. 1-5 characters specified in a row is just as easy, /[a-z]{1,5}/ means from 1 to 5 lowercase letters from a to z.

Regex or String methods

Hi
Im wondering how more experienced Java developers would approach this.
I have a parser which receives Strings according to the irc-protocol.
servername PRIVMSG #channel :Hello worldAlmost every message is defined by the second word in the String. So its easy to just do String.split(" ") ... check the messagetype and start working with the other elements. Of course things gets a bit out of hand with compareTo(), IndexOf(), Substring() ... and all the other String methods.
How would u use regex for this if u had to? I see this as exercise ... slower code, more work ... doesnt matter and its a plus learning more about regex.
Example ... the line above, how would u use regex to check the messagetype "privmsg" ... channel "#channel" and the messagepart after the ':' etc ...
many thanks in advance

I wouldn't use regex, but split indeed, since I'd
have very easy access to each part of the message.
Anyway, since you want to learn regex: why don't you
grab the Pattern API and read and try a little
yourself? Nothing teaches you better than finding out
yourself.Well yes regexes are nice easy and clean. And a wevy usefull if you learn them cos it gives you unlimited power of string manipulation.
But at a high cost.
It is verry important to make the right choice (regex or string methods) becouse it will decide how good your code looks and work.
If the pattern that need to be parsed is simple and cam be done using string methods (including string tokenizer) regex is not the way to go.
Normally when handling simple text patterns like above the string methods will perform about 5 times faster than regexes.
So my recomendation is.
If you are doing this for learning do it both ways and practice both regexes and string methods. Becouse in the industry you have know the both well.
If you are doing this for some sort of a project and performance is a significant quality factor you need to use string methods in this perticuler situation.
Above pattern can easilly broaken down using a combination of substring, index of and string tokenizer.

SAP IDM : Using regex for password

Hi experts,
I need your help about regex.
We are trying to use this to be sure that password set in Web inteface by users are correct according to the company policies :
- The first character cannot be a number
- The three first characters cannot be the same
We finally found the following expression to test what we want (reponse is "true" when one of the policies is ok) :
(.)(1)|(d)
But IDM never accept any password with that expression.
So my questions are :
1) Does IDM need a special regex language or is it the same than PERL language for example?
2) Does anybody know how to use regex in our policies case?
Thanks a lot for your response.
Regards.
Jérémy.

Thanks a lot for your answers.
Actually, i am already doing my test on another attribut which is not encrypted. but yeah its a good point to know about the encryption.
I also tried to use evaluators on internet and my expression looks good if i want to catch cases that i dont want in my password...
as a reminder : (.)(\1)|(\d)
So if i write :
"1password" >> result : true
"password" >> result : false
"pppassword" >> result : true
"password01" >> result : false
"passwooord" >> result : false
but finally if i put that expression in the field "regex" of my attribut, IDM never let me save my new password because it's not a "correct value".
So im disapointed because i don't how to solve it in IDM but i feel like i found the good expression...
Any another idea?
Regards
Jeremy

Newbie in regex... pb with pattern

Hello,
I want to identify each occurence of -> <tr bgcolor="#f7f7f7"> <- in a html code source, this string could be written like this -> <td class="t2" bgcolor="#f7f7f7"> <-...
I try the fallowing java code to parse it :
Pattern p=Pattern.compile("<tr bgcolor=\"#f7f7f7\">"); <** CORRET ;-) **>or
Pattern p=Pattern.compile("<td[ class=\"t2\"]? bgcolor=\"#f7f7f7\">"); <** FALSE **>or
Pattern p=Pattern.compile("<td[ class=\"t2\"]?? bgcolor=\"#f7f7f7\">"); <** FALSE **>or
Pattern p=Pattern.compile("<td[ class=\"t2\"]?+ bgcolor=\"#f7f7f7\">"); <** FALSE **>which one do I have to use ??
thx
stuuf

In a regex, square brackets are used to create a character class, which means "match any one of this set of characters". You're using a character class to match a specific sequence of characters, which is incorrect. It happens to work in this case, but you should really use grouping (parentheses) for that. For instance, if you only wanted to match tags of the form <tr bgcolor="#f7f7f7"> or <tr class="t2" bgcolor="#f7f7f7"> but not, e.g., <tr class="t1" bgcolor="#f7f7f7"> or <tr bgcolor="#f7f7f7" class="t2"> you could put the class="t2" part in a group and make the group optional, like so: Pattern p = Pattern.compile("<tr (class=\"t2\" )?bgcolor=\"#f7f7f7\">");For a correct use of character classes, suppose you want to match either tr or td tags with the bgcolor="#f7f7f7" attribute (and you don't care about any other attributes). This slight variation on yawmark's solution would do the trick: Pattern p = Pattern.compile("<t[dr] .*?bgcolor=\"#f7f7f7\".*?>");If you want to learn more about regexes, here's a good place to start: http://www.regular-expressions.info/

RegEx to ID file path that uses Wildcard appropriately

I am looking for a Regex expression to identify paths with wildcards, but pilchards only allowed in the file name and extension portion of the path. So both of these should return true
C:\Path\*.*
\\SRV\Path\*.txt
However, \\*\Path\*.* should return false since a wildcard anywhere but the end is invalid.
My Google Fu has failed me, as all my searches are coming up with discussions of wild cards in the RegEx, rather than * occurring in a specific pattern.
To give some context, I have a Copy function that is going to take Source and Destination arguments, as well as optional Breadcrumb and Overwrite arguments, and it will handle the actual copy differently depending on the source being a file, folder or wildcard
and destination being a file or folder. And of course it will return an error in the log if the destination is a wildcard.

I couldn't figure out a way to do it with regex, although I did learn a lot about regex in the process. For anyone else that might find this thread, here is the best link I've found explaining regex:
http://www.freeformatter.com/regex-tester.html
Unfortunately, it wasn't good enough (or I'm just not smart enough) to puzzle this one out. The following code works, but is obviously much less elegant than a straight up regex. I post it only because I spent the time on it and if it can ever
possibly help anyone, it was worth the effort:
$a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
Foreach ($target in $a) {
$flag = $null
$path = $target -split ("\\")
$count = $path.count
If ($path[$path.count -1] -match "\*") {
Foreach ($index in 0..($path.count -2)) {
If ($path[$index] -match "\*") {$flag = 1}
If (!($flag)){$target}
PS C:\Users\user> $a = "\\path\folder\job*.*","\\path\fo*lder\star.tar","c:\path\folder\bo*lder\f*","c:\blah\blah.txt","\\blah\blagaa\blahgahah\a\f\*"
PS C:\Users\user> Foreach ($target in $a) {
>> $flag = $null
>> $path = $target -split ("\\")
>> $count = $path.count
>> If ($path[$path.count -1] -match "\*") {
>> Foreach ($index in 0..($path.count -2)) {
>> If ($path[$index] -match "\*") {$flag = 1}
>> }
>> If (!($flag)){$target}
>> }
>> }
>>
\\path\folder\job*.*
\\blah\blagaa\blahgahah\a\f\*
That being said, I'm eager to see the 10 character or less regex expression to do it that someone is sure to post soon and make me feel silly.
I hope this post has helped!

Regex related problem

Friends,
I'm trying to create a regex to verify that a given string only has characters a-z or A-Z
Hi Friends,
I'm trying to create a regex to verify that a given string only has characters a-z or A-Z.
Examples:
1. "abcdef" = true;
2. "a2bdef" = false;
3. "333" = false;
4. "j" = true;
I'm using the below code.
//Pattern Matching
            // Create a pattern
            Pattern p = Pattern.compile("[a-zA-Z]");
        // Create a matcher with an input string
            Matcher m = p.matcher("AB56CD");
            boolean result = m.find();
            if(!result)
                JOptionPane.showMessageDialog(null,"Only characters are            allowed","Alert",JOptionPane.ERROR_MESSAGE);
                check = false;
            //Patern MatchingFor the given string I'm getting true for the result variable. But I'm supposed to get false & display the error message. I'm not getting where is the problem in the regex pattern.
My second query is how can I ensure that my given string would be 25 letters long. Would ^[a-zA-Z]{1,25}$
regex work?
thanks in advance.

Your patterns says that you want to match a string that contains only a single a-z A-Z character so if you have more than one it will fail. You need something like
Pattern p = Pattern.compile("[a-zA-Z]*");to include the empty string orPattern p = Pattern.compile("[a-zA-Z]+");if there must be at least one character orPattern p = Pattern.compile("[a-zA-Z]{3,33}");if there must be between 3 and 33 inclusive characters.
Visit [http://www.regular-expressions.info/|http://www.regular-expressions.info/] to learn more about regex.

Multiple replacements from an input file with 1.4 Regex

hi,
i'm trying to make multiple replacments to a source file <source> using a second input file <patterns> to hold the regex's. the problem i'm having is that the output file only makes the last replacement listed in the input file. Example if the input file is
a#123
b#456
only b is changed to 456 and a remains.
the second debug i've got shows that all the replacements are in memory, but i'm not sure how to write them all to the file.
the syntax is Java MyPatternMatcher <source> <patterns>
import java.util.regex.*;
import java.io.*;
import java.util.*;
public class MyPatternResolver {
     private File patternFile;
     private File sourceFile;
     private Vector patterns = new Vector();
     public MyPatternResolver(String sourceFile, String patternFile) {
          this.sourceFile = new File(sourceFile);
          this.patternFile = new File(patternFile);
          loadPatterns();
          resolve();
     private void loadPatterns() {
          // read in each line if File
          FileReader fileReader = null;
          try {
               fileReader = new FileReader(patternFile);
          } catch(FileNotFoundException fnfe) {
               fnfe.printStackTrace();
          BufferedReader reader = new BufferedReader(fileReader);
          String s = null;
          String[] strArr = new String[2];
          try {
               while((s = reader.readLine()) != null) {
                    StringTokenizer tokenizer = new StringTokenizer(s, "#");
                    for (int i =0; i < 2; i++) {
                         strArr[i] = tokenizer.nextToken();
                         //Debugging Info
                         System.out.println("Token Value " + i + " = " + strArr);
                         //End Debugging Info
                    patterns.add(new PatternResolver(strArr[0], strArr[1], sourceFile));
          } catch(IOException ioe) {
                    ioe.printStackTrace();
     private void resolve() {
          Iterator iterator = patterns.iterator();
          while(iterator.hasNext()) {
               PatternResolver pr = (PatternResolver) iterator.next();
               pr.resolve();
     public static void main(String[] args) {
          MyPatternResolver mpr = new MyPatternResolver(args[0], args[1]);
     public class PatternResolver {
          private String match, replace;
          private File source;
     public PatternResolver(String s1, String s2, File f) {
          this.match = s1;
          this.replace = s2;
          this.source = f;
     public File resolve() {
          File fout = null;
          try {
     //Create a file object with the file name in the argument:
     fout = new File(sourceFile.getName() + "_");
     //Open and input and output stream
     FileInputStream fis = new FileInputStream(sourceFile);
     FileOutputStream fos = new FileOutputStream(fout);
     BufferedReader in = new BufferedReader(new InputStreamReader(fis));
     BufferedWriter out = new BufferedWriter(new OutputStreamWriter(fos));
     // The find and replace statements
     Pattern p = Pattern.compile(match);
               Matcher m = p.matcher(replace);
               //Debugging Info
               System.out.println("Match value = " + match + " Replace value = " + replace);
               //Debugging Complete
               String aLine = null;
               while((aLine = in.readLine()) != null) {
               m.reset(aLine);
               //Make replacements
               String result = m.replaceAll(replace);
               out.write(result);
          out.newLine();
               in.close();
               out.close();
          } catch (Exception e) {
               e.printStackTrace();
return fout;

If your aim is to learn about regex, then its okay.
Otherwise you might want to check the utility "sed" (stream editor) which does something similar what you are up to. It is a POSIX (.i.e. UNIX) utility, but it is available (in several versions) for other platforms (including Windows) too. (Cf. Cygnus or Mingw).

Regex convert URL's to link if NOT wrapped in quotes

Hi there,
Regular expressions have always been very difficult for me to understand. I currently have a regular expression to search a long string and replace URL's with HTML code to convert to a link. So...
http://java.sun.com would become:
<a href="http://java.sun.com">http://java.sun.com</a>It works EXCELLENT, however, I allow users to enter in HTML code for images using the following...
<img src="http://www.url.to.image.com/image.gif">But, my regex replaces the URL with the link code above. To fix this, I want to simply only replace URL's with the HTML a href code if the URL is NOT surrounded by quotes. As I said above, I'm not good with regular expressions. The one I'm currenlty using was found on a forum too. Does anyone know how to accomplish this? Thanks very much, I appreciate it!

scoobasteve1982 wrote:
Can you or anyone else suggest a good place to learn more about regex...something that starts with the VERY BASICS? Thanks again!~Sure.
The website regular-expressions.info has a chapter about Java regex:
[http://www.regular-expressions.info/java.html]
But I recommend you to have a look at the other sections of that site too.
And Sun has a tutorial about their own regex-api, of course:
[http://java.sun.com/docs/books/tutorial/essential/regex/]
May the dark regex force be with you!
; )

URL paths and regular expressions in ASDM

Some background info - I've recently switched to an ASA 5510 on 8.4(3) coming from a Checkpoint NGX platform (let's say fairly quickly and without much warning ). I have a couple questions and they're kind of similar so I'll post them up. I've read docs about regex and creating them both via command line and ASDM, but the examples always seem to include info I don't need or honestly something I don't understand yet (mainly related to defining class\inspect maps). If someone could provide a simple example of how to do these in ASDM that would help a lot in understanding how regular expressions are properly configured. So here we go.
I know this is basic but I need to make sure I understand this properly - I have a single web server (so this won't be a global policy) where I need to allow access to a specific URL path\file and that's it. So we'll call it \test\testfile.doc. Any other access to any other path should be dropped. What's the best way to do this in ASDM (6.4)? I think if I saw a basic example for this I could figure out next few questions but I'll post them as well just in case.
I have another single public web server (again this won't be a global policy) where I'd like to specify blocking file types, like .php, .exe., etc... again a basic example would be great.
Lastly, and this is kind of related, but we have a single office/domain and sometimes we get spam from forged addresses appearing to be from our domain. On Checkpoint I used to use its built-in SMTP security server and could define if it received mail from *@mydomain.com to drop it because we would never receive mail externally from our own domain name. I saw something similar with ESMTP in ASDM and it looks kind of like how you set up the URL access mentioned above. Can I configure this in ASDM as well, and if so how?
TIA for your help,
Jordan

/bump

Verifying a String. Is there a better way to do this?

Hi!!!
Maybe for some of you this is obvious...
I need to verify a String object. What I have to do is just a verification if there is a character different from 0 (zero) in a String. Then, the code would be something like this:
String test = "00001";
boolean ok = true;
for (int i = 0 ; i < test.length ; i++) {
if (test.charAt(i) != '0') {
ok = false;
break;
}I want to know if there is another different solution, more appropriate, without using that for loop. Is there a magic method in Java that can do that verification?
Thank you! ;-)

(cue someone posting some regex foo that will dothis
in one line of something that looks liketransmission
line noise)Yeah, I've thought that. But, since I don't know
anything about regex, well, I'll do some homeworks
now...thank you!!!Your for loop is fine and much more efficient than any regex would be.

Find certain areas of text from a text file

Hello all java Gurus! I need your help on something I am trying to do in order to learn java. I am very newbie and please show mercy if I don't understand every answer you probably give me. Here is my problem. I have text file with certain areas I need to "isolate" (to insert the data in a jTable after I solve this first).
Let's say we have the following data in a text file:
FILE IS "A.txt"
***first***
this is 1 line in first
this is 2 line in first
this is 3 line in first
***second***
this is 1 line in second
this is 2 line in second
this is 3 line in second
***third***
this is 1 line in third
this is 2 line in third
this is 3 line in third
**************I now need to read the "segment" that start with " *** blablabab *** " and ends with " ************** " and store the lines of this segment to another text file "B.txt".
So "B.txt" has let's say this "segment".
***second***
this is 1 line in second
this is 2 line in second
this is 3 line in second
**************How can I do this? I know how to read/write from/to a text file with FileReader and BufferReader but I have no idea how to experiment with this. Is there anyone who can give me a hint or a help to start?
I am very new to java so be gentle :-)
Thank you very much in advance.
Kostas

Well this is a very good and I think efficient way to
do but unfortunately I don't have a clue about regex
and how they work. This is something I should learn.
With the expression you wrote I would have each text
block at a time? This will be stored then to an
Arraylist like above? If you have another idea please
suggest. I am interested in learning new tricks :-)
Thanks for your reply notivago!!!
KostasRegular expressions have a somewhat steep learning curve but they are worthy the effort, they are fast and powerfull text finding tools, the whole idea is that you search the text for some pattern. I will provide you a sample code, it is half way to solve your problem, you will have to make some minor adjustements to use it you your real application(none in the RE itself I hope).
To understand it, look at the Pattern class documentation and the Matcher class documentation in the API.
* SectionExtractor.java
* version 1.0
* 25/05/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class SectionExtractor {
     * @param args
    public static void main(String[] args) {
        String text =
                 "***first***\r\n" +
                  "this is 1 line in first\r\n" +
                  "this is 2 line in first\r\n" +
                  "this is 3 line in first\r\n" +
                  "**************\r\n" +
                  "***second***\r\n" +
                  "this is 1 line in second\r\n" +
                  "this is 2 line in second\r\n" +
                  "this is 3 line in second\r\n" +
                  "**************\r\n" +
                  "***third***\r\n" +
                  "this is 1 line in third\r\n" +
                  "this is 2 line in third\r\n" +
                  "this is 3 line in third\r\n" +
        Pattern pattern = Pattern.compile( "\\*{3}(.+?)\\*{3}$(.*?)\\*{14}", Pattern.DOTALL | Pattern.MULTILINE);
        Matcher matcher = pattern.matcher(text);
        matcher.find();
        System.out.println( "Header: " + matcher.group(1) );
        System.out.println( "Text Body: \n" + matcher.group(2) );
}The sample as is, runs and give you output that should be clear on how the expression works. Try running it.
May the code be with you.

Pattern matching regular expressions

I'm attempting to determine if a string matches a pattern of containing less than 100 alphanumeric characters a-z or 0-9 case insensitive. So my regular expression string looks like:
"^[a-zA-Z0-9]{0,100}$"And I use something like...
Pattern pattern = Pattern.compile( regexString );I'd like to modify my regex string to include the email 'at' symbol "@". So that the at symbol will be allowed. But my understanding of regex is very limited. How do I include an "or at symbol" in my regex expression?
Thanks for your help.

* Code by sabre150
private static final Pattern emailMatcher;
    static
        // Build up the regular expression according to RFC821
        // http://www.ietf.org/rfc/rfc0821.txt
        // <x> ::= any one of the 128 ASCII characters (no exceptions)
        String x_ = "\u0000-\u007f";
        // <special> ::= "<" | ">" | "(" | ")" | "[" | "]" | "\" | "."
        //              | "," | ";" | ":" | "@" """ | the control
        //              characters (ASCII codes 0 through 31 inclusive and
        //              127)
        String special_ = "<>()\\[\\]\\\\\\.,;:@\"\u0000-\u001f\u007f";
        // <c> ::= any one of the 128 ASCII characters, but not any
        //             <special> or <SP>
        String c_ = "[" + x_ + "&&" + "[^" + special_ + "]&&[^ ]]";
        // <char> ::= <c> | "\" <x>
        String char_ = "(?:" + c_ + "|\\\\[" + x_ + "])";
        // <string> ::= <char> | <char> <string>
        String string_ = char_ + "+";
        // <dot-string> ::= <string> | <string> "." <dot-string>
        String dot_string_ = string_ + "(?:\\." + string_ + ")*";
        // <q> ::= any one of the 128 ASCII characters except <CR>,
        //               <LF>, quote ("), or backslash (\)
        String q_ = "["+x_+"$$[^\r\n\"\\\\]]";
        // <qtext> ::= "\" <x> | "\" <x> <qtext> | <q> | <q> <qtext>
        String qtext_ = "(?:\\\\[" + x_ + "]|" + q_ + ")+";
        // <quoted-string> ::= """ <qtext> """
        String quoted_string_ = "\"" + qtext_ + "\"";
        // <local-part> ::= <dot-string> | <quoted-string>
        String local_part_ = "(?:(?:" + dot_string_ + ")|(?:" + quoted_string_ + "))";
        // <a> ::= any one of the 52 alphabetic characters A through Z
        //              in upper case and a through z in lower case
        String a_ = "[a-zA-Z]";
        // <d> ::= any one of the ten digits 0 through 9
        String d_ = "[0-9]";
        // <let-dig> ::= <a> | <d>
        String let_dig_ = "[" + a_ + d_ + "]";
        // <let-dig-hyp> ::= <a> | <d> | "-"
        String let_dig_hyp_ = "[-" + a_ + d_ + "]";
        // <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
        // String ldh_str_ = let_dig_hyp_ + "+";
        // RFC821 looks wrong since the production "<name> ::= <a> <ldh-str> <let-dig>"
        // forces a name to have at least 3 characters and country codes such as
        // uk,ca etc would be illegal! I shall change this to make the
        // second term of <name> optional by make a zero length ldh-str allowable.
        String ldh_str_ = let_dig_hyp_ + "*";
        // <name> ::= <a> <ldh-str> <let-dig>
        String name_ = "(?:" + a_ + ldh_str_ + let_dig_ + ")";
        // <number> ::= <d> | <d> <number>
        String number_ = d_ + "+";
        // <snum> ::= one, two, or three digits representing a decimal
        //              integer value in the range 0 through 255
        String snum_ = "(?:[01]?[0-9]{2}|2[0-4][0-9]|25[0-5])";
        // <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum>
        String dotnum_ = snum_ + "(?:\\." + snum_ + "){3}"; // + Dotted quad
        // <element> ::= <name> | "#" <number> | "[" <dotnum> "]"
        String element_ = "(?:" + name_ + "|#" + number_ + "|\\[" + dotnum_ + "\\])";
        // <domain> ::= <element> | <element> "." <domain>
        String domain_ = element_ + "(?:\\." + element_ + ")*";
        // <mailbox> ::= <local-part> "@" <domain>
        String mailbox_ = local_part_ + "@" + domain_;
        emailMatcher = Pattern.compile(mailbox_);
        System.out.println("Email address regex = " + emailMatcher);
    }Wow. Sheesh, sabre150 that's pretty impressive. I like it for two reasons. First it avoids some false negatives that I would have gotten using the regex I mentioned. Like, [email protected] is a valid email address which my regex pattern has rejected and yours accepts. It's unusual but it's valid. And second I like the way you have compartmentalized each rule so that changes, if any custom changes are desired, are easier to make. Like if I want to specifically aim for a particular domain for whatever reason. And you've commented it so that it is easier to read, for someone like myself who knows almost nothing about regex.
Thanks, Good stuff!

About Regex

Similar Messages

Maybe you are looking for