Fastest way to count the number of occurences of string in file

I have an application that will process a number of records in a plain text file, and the processing takes a long time. Therefore, I'd like to first calculate the number of records in the file so that I can display a progress dialog to the user (e.g. " 1234 out of 5678 records processed"). The records are separated by the string "//" followed by a newline, so all I need to do to get the number of records is to count the number of times that '//' occurs in the file. What's the quickest way to do this? On a test file of ~1.5 Gb with ~500 000 records, grep manages under 5 seconds, whereas a naive Java approach:
BufferedReader bout = new BufferedReader (new FileReader (sourcefile));
               String ffline = null;
               int lcnt = 0;
               int searchCount = 0;
               while ((ffline = bout.readLine()) != null) {
                    lcnt++;
                    for(int searchIndex=0;searchIndex<ffline.length();) {
                         int index=ffline.indexOf(searchFor,searchIndex);
                         if(index!=-1) {
                              //System.out.println("Line number " + lcnt);
                              searchCount++;
                              searchIndex+=index+searchLength;
                         } else {
                              break;
               }takes about 10 times as long:
martin@martin-laptop:~$ time grep -c '//' Desktop/moresequences.gb
544064
real     0m4.449s
user     0m3.880s
sys     0m0.544s
martin@martin-laptop:~$ time java WordCounter Desktop/moresequences.gb
SearchCount = 544064
real     0m42.719s
user     0m40.843s
sys     0m1.232sI suspect that dealing with the file as a whole, rather than line-by-line, might be quicker, based on previous experience with Perl.

Reading lines is very slow. If your file has single byte character encoding then use something like the KMP algorithm on an BufferedInputStream to find the byte sequence of "//\n".getBytes(). If the file has a multi-byte encoding then use the KMP algorithm on a BufferedReader to find the chars "//\n".getCharacters() .
The basis for this can be found in reply #12 of http://forum.java.sun.com/thread.jspa?threadID=769325&messageID=4386201 .
Edited by: sabre150 on May 2, 2008 2:10 PM

Similar Messages

  • Count the number of occurence in table with toplink

    Hi !
    there is no way to build a query with expressionbuilder or .... to count the number of occurences in my table ?
    i don't want use query " select count(*) from table "
    thanks

    Not sure of the question. Are you looking to get the sql "select count(*) from table" from using the TopLink expression framework or are you getting that SQL already and want something else?
    If you are looking just to get the count from a table/class, you can use a ReportQuery:
    ReportQuery rquery = new ReportQuery(ClassToQueryOn.class);
    rquery.addCount(); //equivalent to count(*);
    session.executeQuery(rquery);
    You can use a report query to return data instead of objects, and use selection criteria just like a normal read query.
    Best Regards,
    Chris

  • Counting the number of occurences in a table column

    Hi All
    I have a table with a column that contains approx. 5000 6-digit codes. A number of these codes are duplicted in the column, and I want to count the number of occurences of each code. The column looks a bit like -
    WCID
    940042
    920012
    940652
    940199
    188949
    155146
    155196
    174196
    152148
    151281
    196209
    174015
    182163
    195465
    195318
    182008
    189589
    150675
    There can be mulitple instances of each WCID and I need to count the number of instances of each. I also have access to another table that also has a column of each WCID, but only once - ie no multiple instances. The second table is identical except that there are only single instances of each WCID.
    I thought I could either loop through on the table to be counted, from 100000 to 999999 and count each occurence that way, but it would be very inefficient. The other way I thought would be to perhaps select a WCID from the unique table, count the occurence of that, select the next WCID from the unique table, count that and so on, however I'm not sure how to do it in PL/SQL. Perhaps select the WCID from the unique table into a cursor, loop through that and compare it with the original table and count the instances?
    I hope this makes some sense, any help would be really appreciated
    Thanks
    Bill

    Hi, Bill,
    That sounds like a job for GROUP BY:
    SELECT    wcid
    ,         COUNT (*)       AS num_found
    FROM      table_x
    GROUP BY  wcid
    ORDER BY  wcid
    I hope that answers your question.
    If not, post CREATE TABLE and INSERT statements for a little sample data, and the results you want from that data.

  • Count the number of occurences of a pattern in some files

    Hi,
    I am trying to extract some patterns in files using regex. I can easily extract the patterns using my program. I also wanted to count the number of occurences of the pattern in the files. For that I have written this code:
    int count=0;
              while (m.find())
                 System.out.println("Found Patterns: "+m.group());
                 count++;
           System.out.println(count);  //giving error saying identifier expectedUnfortunately I am getting an error saying identifier expected at the count line. What to do?
    Thanks

    Cheapside-Poultry wrote:
    Move the } down one line so the count is in scope.No, it looks like he just forgot the opening { with the while.                                                                                                                                                                                                                                                                                                                                       

  • I need to WAP to count the number of occurences of an alphabet in a string.

    I need to WAP to count the number of occurences of an alphabet in a string.I tried a lot and have surfed a lot regarding this problem.
    I m not the most proficient with java.but this is all i could come up with,and would appreciate some help here.I hope you guys would help me find a solution to this.
    e.g String : abcabrty
    Result should be
    a:2
    b:2
    c:1
    r:1
    t:1
    y:1
    public class chkoccurences
         public static void main(String args[ ])
              String user_Data=args[0];
              int counter=0;     
              try
                   for(int i=0;i<user_Data.length( );i++)
                        for(int j=0;j<user_Data.length( );j++)
                             if(user_Data.charAt(i) == user_Data.charAt(j))
                             counter++;
                        System.out.println(user_Data.charAt(i)+" exists "+counter+" time(s) in the word.");
                   System.out.println(" ");
              catch(ArrayIndexOutOfBoundsException e)
                   System.out.println("Check the array size.");
    }This is the output i get out of the program:
    a exists 2 time(s) in the word.
    b exists 4 time(s) in the word.
    c exists 5 time(s) in the word.
    a exists 7 time(s) in the word.
    b exists 9 time(s) in the word.
    r exists 10 time(s) in the word.
    t exists 11 time(s) in the word.
    y exists 12 time(s) in the word.What i think is i need an array to store the repeated characters because the repeated characters are getting counted again in the loop,if you know what i mean.
    Please, i would appreciate some help here.

    Criticism is welcomed
    public class tests {
         final int min = 10;
         final int max = 35;
         final char[] chars = new char[] {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
                   'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
                   'v', 'w', 'x', 'y', 'z'};
         public static void main(String[] args){
              tests t = new tests();
              String[] strings = new String[] {"aabcze", "att3%a", ""};
              for(String s : strings){
                   System.out.println(t.getAlphaCount(s));
         public String getAlphaCount(String s){
              int[] alphaCount = new int[26];
              int val;
              for(char c : s.toCharArray()){
                   val = Character.getNumericValue(c);
                   if( (val>=min) && (val<=max)){
                        alphaCount[val-min]++;
              StringBuilder result = new StringBuilder();
              for(int i=0; i<alphaCount.length; i++){
                   if(alphaCount[i] > 0){
                        result.append(chars[i] + ":" + alphaCount[i] + ", ");
              if(result.length() != 0){
                   result.delete(result.length()-2, result.length());
              return result.toString();
    }

  • A quick way to count the number of  newlines '/n' in string of 200 chars

    I am trying to establish the number of lines that a string will generate.
    I can do this by counting the number of '/n' in the string. However my brute force method (shown below) is very slow.
    Normally this would not be a problem on a 2800mhz Athlon (Standard) PC this takes < 1 second. However this code resides within a speed critical loop (not shown). The code shown below is a Achilles heal as far as the performance of this speed critical loop goes.
    Can anyone suggest a faster way to count the number of �/n� (new lines) within a text string of around 50- 1000 chars, given that there may be 10 � 100 newline chars. Speed is a very important factor for this part of my program.
    Thanks in advance
    Andrew.
        int lineCount =0;
        String txt = this.getText();
        //loop throught text and count the carridge returns
        for (int i = 0; i < txt.length(); i++)
          char ch = txt.charAt(i);
          if (ch == '\n')
           lineCount ++;
        }//end forMessage was edited by:
    scottie_uk
    Message was edited by:
    scottie_uk

    Well, here is a C version. On my computer the Java version (reply 9 above) is slightly faster than C. YMMV. For stuff like this a compiler can be hard to beat even with assembler, as you need to do manual loop unrolling and method inlining which turn assembly into a maintenance nightmare.
    // gcc -O6 -fomit-frame-pointer -funroll-loops -finline -o newlines.exe newlines.c
    #include <stdio.h>
    #include <string.h>
    #if defined(__GNUC__) || defined(__unix__)
    #include <time.h>
    #include <sys/time.h>
    #else
    #include <windows.h>
    #endif
    #if defined(__GNUC__) || defined(__unix__)
    typedef struct timeval TIMESTAMP;
    void currentTime(struct timeval *time)
        gettimeofday(time, NULL);
    int milliseconds(struct timeval *start, struct timeval *end)
        int usec = (end->tv_sec - start->tv_sec) * 1000000 +
         end->tv_usec - start->tv_usec;
        return (usec + 500) / 1000;
    #else
    typedef FILETIME TIMESTAMP;
    void currentTime(FILETIME *time)
        GetSystemTimeAsFileTime(time);
    int milliseconds(FILETIME *start, FILETIME *end)
        int usec = (end->dwHighDateTime - start->dwHighDateTime) * 1000000L +
         end->dwLowDateTime - start->dwLowDateTime;
        return (usec + 500) / 1000;
    #endif
    static int count(register char *txt)
        register int count = 0;
        register int c;
        while (c = *txt++)
         if (c == '\n')
             count++;
        return count;
    static void doit(char *str)
        TIMESTAMP start, end;
        long time;
        register int n;
        int total = 0;
        currentTime(&start);
        for (n = 0; n < 1000000; n++)
         total += count(str);
        currentTime(&end);
        time = milliseconds(&start, &end);
        total *= 4;
        printf("time %ld, total %d\n", time, total);
        fflush(stdout);
    int main(int argc, char **argv)
        char buf[1024];
        int n;
        for (n = 0; n < 256 / 4; n++)
         strcat(buf, "abc\n");
        for (n = 0; n < 5; n++)
         doit(buf);
    }

  • How to count the number of occurences of a character

    hi
    wat command is used to count the number of occurences of a charcter in a line?
    i have to count the number of '.' in a line

    FIND
    Searches for patterns.
    Syntax
    FIND <p> IN [SECTION OFFSET <off> LENGTH <len> OF] <text>
                [IGNORING CASE|RESPECTING CASE]
                [IN BYTE MODE|IN CHARACTER MODE]
                [MATCH OFFSET <o>] [MATCH LENGTH <l>].
    The system searches the field <text> for the pattern <p>. The SECTION OFFSET <off> LENGTH <len> OF addition tells the system to search only from the <off> position in the length <len>. IGNORING CASE or RESPECTING CASE (default) specifies whether the search is to be case-sensitive. In Unicode programs, you must specify whether the statement is a character or byte operation, using the IN BYTE MODE or IN CHARACTER MODE (default) additions. The MATCH OFFSET and MATCH LENGTH additions set the offset of the first occurrence and length of the search string in the fields <p> and <l>.

  • To count the number of occurences of a character

    Hi All,
    Is there a particular function to count the number of occurences of a particular character in a string...?
    Let me know if u know the same.
    Thanx and regards
    Akshat

    A method with a test driver:
    public class a {
         public static void main(String args[]) {
              System.out.println(howmany(args[0],args[1].charAt(0)));
         public static int howmany(String what,char c) {
              String regexp = "" + c;
              String s=null;
              try     {
                   s=what.replaceAll(regexp,"");
              catch (Exception e) {          
                    regexp = "\\" + c;
                   s=what.replaceAll(regexp,"");
              return what.length() - s.length();
    }

  • Count the number of comments in a PDF file automatically?

    Hello O Experts,
    My documentation team members use Acrobat and Reader 8, and frequently need to count the number of comments in a PDF file. Is it possible to count the number of comments automatically? We can't find this functionality anywhere, and have to resort to manual counting. Since our PDFs can contain thousands of comments, this is very time-consuming. I've tried searching the Web and these forums, but the words "count" and "comments" are too frequent in other contexts to find anything useful...
    Thanks and best regards,
    --M.T.

    Hi again sypark,
    That is a great idea, and it works!
    I would actually search on a more unique phrase - for instance, "Subject:" - that definitively occurs a single time per comment. The reason is:
    There are many types of comments, not just sticky notes... you have highlights, text deletions, text insertions, replacement text, attachments, callouts, and so on. But each of them has a "Subject:" in the summary.
    It's best to avoid searching for the comment names because they differ by language (an important issue when you work with colleagues in multiple countries). For example, the name of the "Highlight" comment type in French is "Texte surligné" - so you'd have to search for the latter term to get an accurate count.
    In any case, your method is flexible and allows for easy customization of comment counts. Many thanks for your effective help!
    Cheers,
    --Michael

  • Count the number of lines in a txt file

    I need to count the number of lines in a txt file, but I can't do it using readLine(). This is because the txt file is double spaced. readLine() returns null even if it is not the end of the file. thanks for the help

    I need to count the number of lines in a txt file,
    but I can't do it using readLine(). Then just compare each single byte or char to the newline (code 10).
    This is because the txt file is double spaced. readLine() returns
    null even if it is not the end of the file.Errm what? What do you mean by "double spaced"? Method readLine() should only return null if there's nothing more to read.

  • Is there a way to count the number of times an array moves from positive to negative?

    I have an array of values, and I need to find the number of times that the array changes signs (from positive to negative, or vice versa). In other words from a graphical standpoint, how many times a certain line crosses the x-axis. Counting the number of times the array equals zero does not help however, because the array does not always equal exactly zero when it crosses the axis (ie, the points could move from .1 to -.1).
    Thanks for you help. Feel free to email me at [email protected] I only have lv 5.1.1 so if you attach any files, they cannot be version 6.0.

    Attached is a VI showing the # of Pos and Neg numbers in an array, with 0 considered as non-Pos. It is easily modifiable to other parameters - including using the X-axis value as your compare point versus only Zero.
    This is a modified VI from LV (Separate Array.vi)
    Compare this with your other responses to find the best fit.
    Doug
    Attachments:
    arraysizesposneg.vi ‏40 KB

  • Give me a way to count the number of entries in a  Database Table

    Hello All,
    I am writing a code to determine the number of  entries in a SAP/Custom table.
    Can you please suggest a proper approach and a good query.
    Thanks in advance.

    Hi,
    If you want to do it in a more generic way you can do the following:
    DATA: tblname(50),
          tp_rows TYPE i.
    tblname = 'MARA'.
    SELECT COUNT(*)
    FROM (tblname)
    INTO tp_rows.
    IF sy-subrc <> 0.
      CLEAR tp_rows.
    ENDIF.
    At runtime the table is being determined and in this case it's set to MARA. The value of the number of rows is in the variable tp_rows.
    Best regards,
    Guido Koopmann

  • Is there an easy way to count the number of albums I have in my library?

    Hi, there is probably a very easy way to find out how many albums I have in my iTunes library - but I haven't discovered it yet! (looking for something a bit like word count in MS Word document?)
    Powerbook G4 15 1.67   Mac OS X (10.4.3)   my first Mac - where have I been!?!

    Click on the Library icon in the source list Go Edit->Show Browser.
    This will split the library window into subwindows listing each genre, artist and album, at the top of these windows it will show the total for each of these categories.
    Apart from being the easiest way to count your albums, it's also a really useful way to view the library or any playlist. You can turn off the genre subwindow through the general preferences.
    Hope that helps!
    Sara.

  • Is there a way to count the number of chars in a formatted text box?

    I have a formatted text box in my web dynpro for comments pertaining to workflow.
    in the backend, this is mapped to a char200 field.
    is there a way to have a running counter to let the user know how many chars are left? I'm not sure if there's an event to use for that.
    thanks,
    robert.

    Hello Robert,
    There is no way to get a running total of characters typed by the user - if you really need this functionality - consider creating an Adobe Flash Island.
    There was in the last year another thread which covered pretty much the same theme - it could be worth looking at that - although you will find that the eventual solution is the same as I suggest above.

  • How to count the number of times a string occurs in a column.

    I am listing team names in a column and want to have a tally at the bottom. In Excel I can use =SUM(IF(range="text",1,0)), but Numbers will not accept a range in that IF statement. ANy suggestions on a formula for his? I know I could create hidden columns and put formulas in all those and hen count them, but that sure seems like a heard way to resolve what should be one formula.
    THanks!

    Those which took time to read carefully *_iWork Formulas and Functions User Guide_* are aware of the availability of wildcard characters.
    =COUNTIFI(range;"=text")
    will do the trick.
    Yvan KOENIG (VALLAURIS, France) lundi 26 avril 2010 17:36:34

Maybe you are looking for