Regular expressions: find files with exactly 'n' digits in a row

Hi there,
I want to filter files that contain only a fixed number of digits, but not more (at least not in after the digits).
For example, I have
01.mp3
02.mp3
test10.txt
test000110101010.txt
04.flac
and for n=2 I want to get all files except 'test000110101010.txt'.
The following is not working, and I'm a total newb regarding regular expressions
ls -l | grep '^-' | awk '{print $9}' | grep '([0-9]\{2\})[^0-9]\{2\}'
Thanks for help.
Regards,
drm

Thanks!
I wrote a python script to scan e.g. a music folder for missing files and needed to extract the file numbers from the files to get the "highest" number.
You can get it from here: http://pastebin.com/Sg9yDHiw (Python3, expires in 1 month)
Regards,
drm
Edit: found a bug
Last edited by drm00 (2011-02-04 13:57:43)

Similar Messages

Regular expressions in file mask in file protocol

Hi,
I wanted to check whether we can use regular expressions to in file mask with file protocol? As per the documentation, we can use regular expression in file mask only in case of ftp/sftp.
Regards,
Anuj

The documentation is not correct at this point. There is no regex support in file mask.
OSB File Transport Configuratuin - File Mask

Spotlight does not find files with "_" (underscores), but only for certain file types

Hey Folks,
this is strange, maybe someone has an idea.
I already searched the internet for a while, nothing found so far.
I have a file "calc_mean.m" on my desktop.
When I type "calc" in spotlight, it shows the file.
But when I type "calc_" it suddenly does not show the file anymore. Nor does find the file, when I enter "calc_mean.m" in spotlight.
When I enter "calc mean.m" in spotlight, it finds it (using space instead of the underscore).
Now comes the real surprise:
When I rename the file to "calc_mean.txt", spotlight suddenly DOES find the file when entering "calc_mean.txt".
I recreated this "feature" with other files, copying and renaming ".txt" files to ".m" files, and if there's a underscore in the file, spotlight wont find it.
Playing around a bit more, it seems spotlight does find files with underscore when they are documents, at least it works for the following extensions:
.pdf
.doc
.txt
.xls
But these extensions for example do not work:
.mp3
.m
.k
.a
.ka
(and other random endings I tried).
I am pretty confused. Sure it's no big deal learning to search for files that include underscores in their name using space instead. But I'm still quite puzzled. Any idea?

All of those have meaning in various database search syntax (not sure if it matters).
_ usually means any character.
% usually means any run of characters.
- is often used to negate what comes next, i.e. "don't include results that have the following text."
I don't see any problem on my Mac, though.
I also don't have any problem finding file names with those accented characters using Spotlight. I would suggest reindexing Spotlight, but if cmd-f finds them, I'm not sure that would help.
Spotlight: How to re-index folders or volumes

Script to find files with same names with in a folder and it sub folders.

Looking for script to find files with same names with in a folder and it sub folders.

Are you just looking to find if any two files underneath a folder have the same name?
If you just want to know that a file named "whatever" exists in two folders, that's not too difficult, but you probably want to know the full path names.
Here's one attempt:
$ perl -MFile::Find -le 'find(\&w, "."); while (($n,$p)=each %file) {if(@{$p}>1){print join(" ",@{$p})}} sub w{push @{$file{$_}},$File::Find::name;}'That will print the pathnames on the same line of any files with the same name that appear anywhere underneath your current directory.
It's a bit long for a "one-liner", but functional.
Darren

Spotlight fails to find files with search items

I just did a Spotlight search on
Japanese English は
Spotlight found more than 3500 items, but the two sample Word files I looked at didn't have は in them.
This is really frustrating. Is there a way to ensure Spotlight only finds files with the input?

Spotlight is notorious for its weaknesses and erratic behavior. You might try reindexing your Spotlight. Open up Spotlight in Sys Prefs, go to Privacy and drag the HD Folder there. Leave it a minute then remove it by clicking on the minus at the bottom. This will force Spotlight to reindex the drive, which may take some time.
I mostly use EasyFind (free) instead of Spotlight.
http://www.devon-technologies.com/products/freeware/

ALC-FUT-001-007: Syntax error in the regular expression *. from File Utilities - Find in ES2

Has anyone encountered the above error when trying to use the Foundation - File Utilities service Find?
This is occuring in ES2, and I have tried using an asterisk, as well as the exact file name I know exists.
I have tried literals in the Process Properties, as well as passing it in via string variables.
Any help is appreciated.
Thanks
Mark

Thanks for your response.
Here is what we have learned:
We should not put the regular expression value in single quotes - eliminates the error.
The problem is, the Find operation always returns an empty list no matter what criteria we enter - even C:\ with criteria = *.*
ES 2 is running on a Windows 2003 machine.
Thanks Again
Mark

Regular Expression Find and Replace with Wildcards

Hi!
For the world of me, I can't figure out the right way to do this.
I basically have a list of last names, first names. I want the last name to have a different css style than the first name.
So this is what I have now:
AAGAARD, TODD, S. 
AAMOT, KARI, 
AARON, MARJORIE, C. 
and this is what I need to have:
AAGAARD , TODD, S. 
AAMOT , KARI, 
AARON , MARJORIE, C. 
Any ideas?
Thanks!

Make a backup first.
In the Find field use:
(\w+),\s+([^<]+)<\/b>\s* 
In the Replace field use:
$1 $2 
Select Use regular expression. Light the blue touch paper, and click Replace All.

Java – Regular Expressions – Finding any non digit byte in a multiple byte

Hello,
I’m new to JAVA and Regular Expressions; I’m trying to write a regular expression that will find any records that contain a non digit byte in a multiple byte field.
I thought the following was the correct expression but it is only finding records that contain “all” non digit bytes.
\D{1,}
\D = Non Digit
{1,} = at least 1 or more
Below is my sample data. I would like the regular expression to find all of the records that are not all numeric. However when I use the regular expression \D{1,} it is only finding the 2 records that all bytes are non digits. (i.e. “ “ and “A “)
“ 111229”
“2 111229”
“20091229”
“200912c9”
“201#1229”
“20101229”
“20110229”
“20111*29”
“20111029”
“20111229”
“20B11229”
“A “
“A0111229”
Please note I have also tried \D{1,}+ and \D{1,}? And they also do not return my desired results
Any assistance someone can provide would be greatly appreciated.

You don't show the code you are using but I surmise you are using String.matches() which requires that the whole target must match the regular expression not just part of it. Instead you should create a Pattern and then a Matcher and use the Matcher.find() method. Check the Javadoc for Pattern and Matcher and look at the Java regex tutorial - http://docs.oracle.com/javase/tutorial/essential/regex/ .
P.S. You can re-use the Pattern object - you don't have to create it every time you need one.
P.P.S. Java regular expressions work with characters not bytes and characters are not not not bytes.

Regular expression on words with % wildcard

Hi,
I've got some processing working using regular expression where I need to process words e.g.
regexp_replace('word1 word2','(\w+)','myprefix{\1}') - results in - 'myprefixword1 myprefixword2'
However, if I'm presented with this; '%word0 word1% wo%d2 word3', then I need to treat % as special case and leave the word as is, so result here would be; - '%word0 word1% wo%d2 myprefixword3', is this achievable using regexp ?

And for those who don't know, I guess we should explain why we're having to expand single spaces to double spaces...
(I'll use the "¬" character to represent spaces to make it clearer to see)
If we have a string such as
word1¬word2¬word3and we want to identify the words in the string (without using any special regexp word identifier) then we are going to use the spaces to identify the start and end of words. To make life easy, we manually put a space at the start and end of the string so we can say that each word in the string will have a space before and after it regardless of where it is in the string...
¬word1¬word2¬word3¬However, when we specify what we want to search for we are going to say we want a space, followed by a number of characters (not spaces), followed by a space...
¬[^¬]*¬So, ideally, you'd expect it to look through the string and say
¬word1¬word2¬word3¬
\_____/... found word1
¬word1¬word2¬word3¬
      \_____/... found word2
¬word1¬word2¬word3¬
            \_____/... found word3
Unfortunately, there is a problem. Once the first word has been found the pointer for searching the rest of the string is located on the next character after the match i.e.
¬word1¬word2¬word3¬
       ^So it won't be able to pick out word2 and will only get to word3. Let's see it in action...
SQL> ed
Wrote file afiedt.buf
1 with t as (select ' word1 word2 word3 ' as txt from dual)
2 --
3 select regexp_replace(txt, ' [^ ]* ', 'xxxxx') as txt
4* from t
SQL> /
TXT
xxxxxword2xxxxx
SQL>In order to deal with this, if we replace the single spaces with double spaces (not required at the start and end) our string looks like...
¬word1¬¬word2¬¬word3¬So as it searches it finds word1 as a match and then the pointer in the string is located...
¬word1¬¬word2¬¬word3¬
       ^... so the next match for the pattern of space-characters-space is word2 and then the pointer is located...
¬word1¬¬word2¬¬word3¬
              ^... ready to find word 3. Example...
SQL> ed
Wrote file afiedt.buf
1 with t as (select ' word1 word2 word3 ' as txt from dual)
2 --
3 select regexp_replace(txt, ' [^ ]* ', 'xxxxx') as txt
4* from t
SQL> /
TXT
xxxxxxxxxxxxxxx
SQL>Hopefully that's a little clearer. You just have to remember the "pointer" principle and the fact that once a match is found it is located on the character after the match.
;)

Viewing comments by default; finding files with comments

I searched the various preferences but did not see this: Is there a way to tell Acrobat to open files with the comments list showing?
Also, I have a folder of PDFs that have been indexed with catalog. Only a few of the files have comments. Is there any way to construct a search for "any PDF with a comment"? What I've been doing is using the user ID that Acrobat automatically places in edits, but that won't help for long, as the number of people doing comments will increase, and some future documents will have those IDs in regular text.

LarryHN:
1) your suggestion will not work for the issue I'm describing: if I have both Raw and Jpeg files of the same image on my SD card, the iPad sort of lumps these together, so when I view the contents of my SD card it shows as one image/file, and if I select it and download it, it brings over both the Raw and Jpeg versions of the image, which is what I am trying to avoid and why I started the tread in the first place.
2) I spoke with sales people at two Apple stores and one person with their online sale, and none of them could addrss my issue. The online sales person said she might have to put me through to their tech people to resolve this or at least get the full, correct info. None of them were janitors.
I did find a workaround which I think will work:
Download the files over to my iPad, then use an app like Photo Manager Pro to be able to view the files by type, then select the Raw files and delete them from my iPad. This will work I hope, but seems like a lot of extra work for a device that's supposed to be user friendly--I think Apple missed the boat on this one, opting for making things 'simple' instead of functional for photographers.

Regular Expression Character Sets with Pattern and Matcher

Hi,
I am a little bit confused about a regular expressions I am writing, it works in other languages but not in Java.
The regular expressions is to match LaTeX commands from a file, and is as follows:
\\begin{command}([.|\n\r\s]*)\\end{command}
This does not work in Java but does in PHP, C, etc...
The part that is strange is the . character. If placed as .* it works but if placed as [.]* it doesnt. Does this mean that . cannot be placed in a character range in Java?
Any help very much appreciated.
Kind Regards
Paul Bain

In PHP it seems that the "." still works as a all character operator inside character classes.
The regular expression posted did not work, but it does if I do:
\\begin{command}((.|[\n\r\s])*)?\\end{command}
Basically what I'm trying to match is a block of LaTeX, so the \\begin{command} and \\end{command} in LaTeX, not regex, although the \\ is a single one in LaTeX. I basically want to match any block which starts with one of those and ends in the end command. so really the regular expression that counts is the bit in the middle, ((.|[\n\r\s])*)?
Am I right it saying that the "?" will prevent the engine matching the first and last \\bein and \\end in the following example:
\\begin{command}
some stuff
\\end{command}
\\begin{command}
some stuff
\\end{command}

Regular Expressions find and replace

Hi ,
I have a question on using Regular Expressions in Java(java.util.regex).
Problem Description:
I have a string (say for example strHTML) which contains the whole HTML code of a webpage. I want to be able to search for all the image source tags and check whether they are absolute urls to the image source(for eg. <img src="www.google.com/images/logo.gif" >) or relative(for eg. <img src="../images/logo.gif" >).
If they are realtive urls to the image path, then I wish to replace them with their absolute urls throughout the webpage(in this case inside string strHTML).
I have to do it inside a servlet and hence have to use java.
I tried . This is the code. It doesn't match and replace and goes inside an infinite loop i.e probably the pattern matches everything.
//Change all images to actual http addresses FOR example change src="../images/logo.gif" to src="http://www.google.com/../images/logo.gif"
 String ddurl="http://www.google.com/";
String strHTML=" < img src=\"../images/logo.gif\" alt=\"Google logo\">";
Pattern p = Pattern.compile ("(?i)src[\\s]*=[\\s]*[\"\']([./]*.*)[\"\']");
Matcher m = p.matcher (strHTML);
while(m.find())
m.replaceAll(ddurl+m.group(1));
what is wrong in this?
Thanks,
Rajiv

Right, here's the full monte (whatever that means):import java.util.regex.*;
public class Test1
public static void main(String[] args)
 String domain = "http://www.google.com/";
 String strHTML =
 " < img src=\"images/logo.gif\" alt=\"Google logo\">\n" +
 " <img alt=\"Google logo\" src=images/logo.gif >\n" +
 " <IMG SRC=\"/images/logo.gif\" alt=\"Google logo\">\n" +
 " <img alt=\"Google logo\" src=../images/logo.gif>\n" +
 " <img src=http://www.yahoo.com/images/logo.gif alt=\"Yahoo logo\">";
 String regex =
 "(<\\s*img.+?src\\s*=\\s*) # Capture preliminaries in $1. \n" +
 "(?: # First look for URL in quotes. \n" +
 " ([\"\']) # Capture open quote in $2. \n" +
 " (?!http:) # If it isn't absolute... \n" +
 " /?(.+?) # ...capture URL in $3 \n" +
 " \\2 # Match the closing quote \n" +
 " | # Look for non-quoted URL. \n" +
 " (?!http:) # If it isn't absolute... \n" +
 " /?([^\\s>]+) # ...capture URL in $4 \n" +
 Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
 Matcher m = p.matcher(strHTML);
 StringBuffer sbuf = new StringBuffer();
 while (m.find())
 String relURL = m.group(3) != null ? m.group(3) : m.group(4);
 m.appendReplacement(sbuf, "$1\"" + domain + relURL + "\"");
 m.appendTail(sbuf);
 System.out.println(sbuf.toString());
}First off, observe that I'm using free-spacing (or "COMMENTS") mode to make the regex easier to read--all the whitespace and comments will be ignored by the Pattern compiler. I also used the CASE_INSENSITIVE flag instead of an embedded (?i), just to remove some clutter. By the way, your second (?i) was redundant; the first one would remain in effect until "turned off" with a (?-i). Another way to localize a flag's effect by using it within a non-capturing group, e.g., (?i:img).
As jaylogan said, the best way to filter out absolute URL's is by using a negative lookahead, and that's what I've done here. The problem of optional quotes I addressed by trying to match first with quotes, then without. The all-in-one approach might work with URL's, since they can't (AFAIK) contain whitespace anyway, but the alternation method can be used to match any attribute/value pair. It's also, I feel, easier to understand and maintain. Unfortunately, it also means that you can't use replaceAll(), since you have to determine which alternative matched before doing the replacement, but the long version is still pretty simple (especially when you can just copy it from the javadoc for the appendReplacement() method, as I did).

Regular expression - find if string does NOT contain text....

I have a string that I want to tokenize. The string can contain basically anything. I want to produce tokens for each "word" found, and for each "<=" or "," found. There does not need to be whitespace around a "<=" or a "," to consider it a token. So for example:
joe schmoe<=jack, jane
should become
joe
schmoe
<=
jack
jane
As a constraint, I do not want to use StringTokenizer at all, as "its use is discouraged in new code". http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Here's the code I plan on using for this:
 public String[] getWords(String input) {
 Matcher matcher = WORD_PATTERN.matcher(input);
 ArrayList<String> words = new ArrayList<String>();
 while (matcher.find()) {
 words.add(matcher.group());
 return (String[]) words.toArray(new String[0]);
 }The trick, though, is coming up with a working regular expression. The closest I've found yet is:
([^\s]|^(,)|^(<=))+|,|<=
but that produces the following:
joe
schmoe<=jack,
jane
I think what I need is to be able to find if a string does not contain the substring "<=" or "," using a regular expression. Anyone know how to do this, or another way to do this using regular expressions?

Try:
* Tokenizer.java
* version 1.0
* 01/06/2005
package samples;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
* @author notivago
public class StrangeTokenizer {
 public static void main(String[] args) {
 String text = "joe schmoe<=jack, jane";
 Pattern pattern = Pattern.compile( "((?:<=)|(?:,)|(?:\\w+))");
 Matcher matcher = pattern.matcher(text);
 while( matcher.find() ) {
 System.out.println( "Item: " + matcher.group(1) );
}May the code be with you.

Regular Expression wierdness - problem with $ character

If I use the following KM code in Beanshell Technology - it works correctly and replaces "C$_0MYREMOTETABLE RMTALIAS, MYLOCALTABLE LOCALIAS, " with "C$_0MYREMOTETABLE_000111 RMTALIAS, MYLOCALTABLE LOCALIAS, "
But when I try to use the same exact code in 'Undefined' technology - it does not match anything in the source string - and does not replace anything.
If I change the regular expression to not use the $ it still does not work.
But if I change the source string to remove the $ - then the regular expression works.
If I use the same code in Beanshell technology - it works fine - but then I can't use the value in a later 'Undefined' technology step.
Does anyone know if the java technology does something special with $ characters when ODI parses the KM code?
Does anyone know if there is a way to use the value from a Beanshell variable in a 'Undefined' technology step?
String newSourceTableList = "";
String sessionNum ="<%=odiRef.getSession("SESS_NO") %>";
String sourceTableList = "<%=odiRef.getSrcTablesList("", "[WORK_SCHEMA].[TABLE_NAME] [POP_TAB_ALIAS]" , ",", ",") %>";
String matchExpr = "(C\\$_\\S*)"; (should end with two backslashes followed by 'S*' - this editor mangles it)
String replaceExpr = "$0_"+sessionNum+ " ";
newSourceTableList = sourceTableList.replaceAll(matchExpr,replaceExpr);
---------------------------------------------------

Phases of substitution in ODI:
The way ODI works allows for three separate phases of substitution, and you can use them all. The three phases are:
- First Phase: <% %> You will see these appear in the knowledge moduiles etc and these are substituted on generation. (when you generate a scenario, or tell ODI to execute an interface directly) this phase is used to generate the column names, table names etc which are known from the metadata at that phase.
- Second Phase: <? ?> This phase is substituted when the scenario is instatntiuated as an excution - session generation. At this point, ODI has the additional information which allows it to generate the schema names, as it has resolved the Logical/Physical Schemas through the use of the Context (which is provided for the execution to take place. All the substitutions at this point are written to the execution log.
- Third Phase <@ @> This phase is substituted when the execution code is read from the session log for execution. You will note that anything substituted in this phase is NEVER written to the execution log. (see PASSWORDS as a prime example, you don't want those written to the logs, with the security risks associated with that!)
Anything in <@ @> is always interpreted for substitution by the java beanshell, it does not have to be a Java Beanshell step, it can be any kind of step, it will be interpreted at that run-time point.

Regular Expression - find double hyphens only

I am wondering if there's a way to write a regular expression to find double hyphens and change them to single hyphens. The catch is that some of the text I'm searching through have multiple hyphens.
Example:
str1 = "Here is my sample text with double -- and I would like to replace this with one hyphen."
str2 = "Here is another sample with multiple hyphens ----- that I do not want to change but leave as is."
Is there a way to change only str1 to a single hyphen and keep str2 as is?

You are correct. I should have been more explict. Here are some real examples. I hope this helps.
Helps what?
Have you tried to write the regex yourself, at least?
Adam

Regular expressions: find files with exactly 'n' digits in a row

Similar Messages

Maybe you are looking for