Word Frequency Counter...

Hello all, I am working on a project that is supposed to read in a text file from a command prompt, and then break all the words up. As the words are read in by the Scanner, I need to have a counter that counts the number of times the word has occured already that I can access and display in the output. I have come up with this so far as my driver/main class, and also the Count class that I'm trying to use to keep track of the number of times a word has occured in the text, and then so I can add it to a HashMap and display later... The problem is, whenever I try to run the program with a text file, it just ends up displaying all the words in a line and then a number 1 next to it. What I need is the output to look similar to this... For example,
hello 1
world 1
Any help would be appreciated! Thanks.
   import java.io.*;
   import java.util.*;
    public class Driver{
       public static void main(String[] args){
         HashMap words = new HashMap();
         String nameOfFile = args[0];
         File file = new File(nameOfFile);
         String wordd;
         Count count;
         try{
            Scanner scanner = new Scanner(file).useDelimiter(" \t\n\r\f.,<>\"\'=/");
            while(scanner.hasNext())
               String word = scanner.next();
               count = (Count) words.get(word);
               if(count==null){
                  words.put(word, new Count(word, 1));
               else {
                  count.i++;
               System.out.println(word);
             catch(FileNotFoundException e){
         Set set = words.entrySet();
         Iterator iter = set.iterator();
         while(iter.hasNext()) {
            Map.Entry entry = (Map.Entry) iter.next();
            wordd = (String) entry.getKey();
            count = (Count) entry.getValue();
            System.out.println(wordd +
               (wordd.length() < 8 ? "\t\t" : "\t") +
               count.i);
{code}
{code}
public class Count
     String word;
     int i;
     public Count(String inputWord, int increment)
          word = inputWord;
          i = increment;
{code}
Edited by: VisualAssassin on Apr 22, 2009 2:45 PM

VisualAssassin wrote:
Scanner scanner = new Scanner(file).useDelimiter(" \t\n\r\f.,<>\"\'=/");
{code}According to the documentation for Scanner.useDelimiter(), the String supplied is used as a regular expression. Therefore, for the scanner to tokenize into two separate tokens, your input stream would have to contain all of those listed characters in order!
Instead, use this (untested):
{code}
Scanner scanner = new Scanner(file).useDelimiter("[" + Pattern.quote(" \t\n\r\f.,<>\"\'=/") + "]+");
The beginning and end square braces tell the regular expression engine to match +any+ of those characters, and the plus means one or more times. The Pattern.quote is used to escape some of the characters that would get you into trouble because they have a special meaning in regexes, notably "."
Edited by: endasil on 22-Apr-2009 11:43 PM

Similar Messages

Best way to implement a word frequency counter (input = textfile)?

i had this for an interview question and basically came up with the solution where you use a hash table...
//create hash table
//bufferedreader
//read file in,
//for each word encountered, create an object that has (String word, int count) and push into hash table
//then loop and read out all the hash table entries
===skip this stuff if you dont feel like reading too much
then the interviewer proceeded to grill me on why i shouldn't use a tree or any other data structure for that matter... i was kidna stumped on that.
also he asked me what happens if the number of words exceed the capacity of the hash table? i said you can increase the capacity of the hash table, but it doesn't sound too efficient and im not sure how much you know how to increase it by. i had some ok solutions:
1. read the file thru once, and get the number of words in the file, set the hashtable capacity to that number
2. do #1, but run anotehr alogrithm that will figure out distinct # of words
3. separate chaining
===
anyhow what kind of answeres/algorithms would you guys have come up with? thanks in advance.

i had this for an interview question and basically
came up with the solution where you use a hash
table...
//create hash table
//bufferedreader
//read file in,
//for each word encountered, create an object thatWell, first you need to check to make sure the word is not already in the hashtable, right? And if it is there, you need to increment the count.
has (String word, int count) and push into hash
table
//then loop and read out all the hash table entries
===skip this stuff if you dont feel like reading too
much
then the interviewer proceeded to grill me on why i
shouldn't use a tree or any other data structure for
that matter... i was kidna stumped on that.A hashtable has ammortized O(1) time for insert and search. A balanced binary search tree has O(log n) complexity for the same operations. So, a hashtable will be faster for large number of words. The other option is a so-called "trie" (google for more), which has O(log m) complexity, where m is the length of the longest word. So if your words aren't too long, a trie may be just as fast as a hashtable. The trie may also use less memory than the hashtable.
also he asked me what happens if the number of words
exceed the capacity of the hash table? i said you can
increase the capacity of the hash table, but it
doesn't sound too efficient and im not sure how much
you know how to increase it by. i had some ok
solutions:The hashmap implementation that comes with Java grows automatically, you don't need to worry about it. It may not "sound" efficient to have to copy the entire datastructure, the copy happens quickly, and occurs relatively infrequently compared with the number of words you'll be inserting.
1. read the file thru once, and get the number of
words in the file, set the hashtable capacity to that
number
2. do #1, but run anotehr alogrithm that will figure
out distinct # of words
3. separate chaining
===
anyhow what kind of answeres/algorithms would you
guys have come up with? thanks in advance.I would do anything to avoid making two passes over the data. Assuming you're reading it from disk, most of the time will be spent reading from disk, not inserting to the hashtable. If you really want to size the hashtable a priori, you can make it so its big enough to hold all the words in the english language, which IIRC is about 20,000.
And relax, you had the right answer. I used to work in this field and this is exactly how we implemented our frequency counter and it worked perfectly well. Don't let these interveiewers push you around, just tell them why you thought hashtable was the best choice; show off your analytical skills!

Where, if any, Word Count and Word Frequency tools on Apple Works or MS Wrd

5/26/2008. Is there a "word count" and/or "word frequency" tool(s) on Apple Works word document and/or Microsoft Word for Mac word document, and how does one use or activate it, or find such for it on the web?? Many thanks. C. Yopst, Chicago

Hi C,
In an AppleWorks WP document, go Edit > Writing Tools > Word count. This will give you a count of characters, words, lines and paragraphs in the document or, if you have selected a portion of the document, in the selection.
To my knowledge, there's no tool to give a word frequency figure, although that could be done with some fairly easy manipulation involving a spreadsheet.
For MS Word questions, I'd suggest checking the Word section of Microsoft's Mactopia site. Try searching "count words".
Regards,
Barry

Counting word frequency

Hi all,
I'm very new to Java, only a few months into it. I am working on a program that compresses a file, thread, string. Firstly it takes a string of undetermided length as input and using the delimiter splits the string, then places that string into a list. Then from that list I take out the recurring words with another list and return the index of where the words are, this has been done without any problems.
Next I want to get the word frequency and return the most common word into the first index following onto the next and so on. I am not sure how best to go about this. I did use a map but that just gives the frequency of the words without any ordering. Can anybody give me any suggestions that I could try bearing in mind that java is my first language so this is also a confidence building exercise.
Many thanks to anybody who responds.

littlejim4 wrote:
Hi all,
I'm very new to Java, only a few months into it. I am working on a program that compresses a file, thread, string. Firstly it takes a string of undetermided length as input and using the delimiter splits the string, then places that string into a list. Then from that list I take out the recurring words with another list and return the index of where the words are, this has been done without any problems.
Next I want to get the word frequency and return the most common word into the first index following onto the next and so on. I am not sure how best to go about this. I did use a map but that just gives the frequency of the words without any ordering. Can anybody give me any suggestions that I could try bearing in mind that java is my first language so this is also a confidence building exercise.
Many thanks to anybody who responds.Okay, so you have a Map with the 'word' as keys, and the 'frequency' as values, right?
If so, create a class (WordOccurrence for example) that holds a "word" and "occurrence" variable. Let that class also implement the Comparable interface. Now, when you can do something like this:
public static List<WordOccurrence> getWordOccurrances(String text) {
    // get the occurrence of each (unique) word
    Map<String, Integer> frequencyMap = getFrequencyMap(text);
    // create a lit to hold you WordOccurrence instances
    List<WordOccurrence> list = new ArrayList<WordOccurrence>();
    // for each word in your map, create a WordOccurrence instance
    for(String word : frequencyMap.keySet()) {
        list.add(new WordOccurrence(word, frequencyMap.get(word)));
    // sort the list of WordOccurrences
    Collections.sort(list);
    // return the sorted list
    return list;
}

Creating of the word-frequency histogram from the Oracle Text

I need make from the Oracle Text index of the "word-frequency histogram", this is list of the tokens in this index, where each token contains the list of documents that contain that token and frequency this token in the every document. Don´t anybody know how to get this data from Oracle Text index so that result will save to the table or to the text file?

You can use ctx_report.token_info to decipher the token_info column, but I don't think the report format that it produces is what you want. You can use a query template and specify algorithm=count to obtain the number of times a token appears in the indexed column. You can do that for every token by using the dr$...$i table, as shown below. Formatting is preserved by prefacing the code with pre enclosed in square brackets on the line above all of the code and /pre in square brackets on the line below all of the code.
SCOTT@10gXE> create table otntest
2    (doc_id       number primary key,
3      document varchar2(100))
4 /
Table created.
SCOTT@10gXE> insert all
2 into otntest values (1, 'This is a test for generating a histogram')
3 into otntest values (2, 'Histogram shows the list of documents that contain that token and frequency')
4 into otntest values (3, 'frequency histogram frequency histogram frequency')
5 select * from dual
6 /
3 rows created.
SCOTT@10gXE> create index otntest_ctx_idx
2 on otntest(document)
3 indextype is ctxsys.context
4 /
Index created.
SCOTT@10gXE> column token_text format a30
SCOTT@10gXE> select t.doc_id, i.token_text, score (1) as token_count
2 from   otntest t,
3          (select distinct token_text
4           from   dr$otntest_ctx_idx$i) i
5           where contains
6                 (document,
7                  '<query>
8                  <textquery grammar="CONTEXT">'
9                  || i.token_text ||
10                  '</textquery>
11                  <score datatype="INTEGER" algorithm="COUNT"/>
12                  </query>',
13                  1) > 0
14 order by doc_id, token_text
15 /
    DOC_ID TOKEN_TEXT                     TOKEN_COUNT
         1 GENERATING                               1
         1 HISTOGRAM                                1
         1 TEST                                     1
         2 CONTAIN                                  1
         2 DOCUMENTS                                1
         2 FREQUENCY                                1
         2 HISTOGRAM                                1
         2 LIST                                     1
         2 SHOWS                                    1
         2 TOKEN                                    1
         3 FREQUENCY                                3
         3 HISTOGRAM                                2
12 rows selected.
SCOTT@10gXE>

Is there a way to create a Word Frequency list in 8.2.5?

I would like to be able to create a Word Frequency list for each PDF in a batch of PDFs. I have used Batch Processing to make sure each one is OCR/searchable. Is there a way to do this in Acrobat Pro 8.2.5 / Windows XP?
Thanks!

The SMTP banner is not customizable in 3x.
In 4x and IMS 5x, you can use the configutil <b>service.smtp.banner</b> attribute.

Frequency counter unreliable using rotary vane anemometer

I have set up an anemometer to measure air flow speed, with the signal being acquired by a 9402 module in a cDAQ-9174 chassis (4 slot). I am using Signal Express 2011 to program the instrument.
I have set up a frequency counter task, using a maximum frequency of 1.8 kHz a minimum frequency of 250 mHz, rising edge and 1-counter (low frequency). The output is scaled using y = 0.0111x + 0. This gives 20 m/s when the frequency is 1.8 kHz.
The readings given by Signal Express give a very noisy signal, at a frequency much higher than the bandwidth of the anemometer. The range is in the order of 20% of the mean. I have attached a PDF of the signal, for two different air flow sources, the seproj file and the tdms file for one of the runs. I have also observed that the counter output is rounded to the nearest multiple of 10, e.g. 490, 470, 480, 480, 470, etc.
Is this consistent with a digital bounce issue? Is this consistent with an earthing issue? Is this consistent with a sample rate issue? Any other ideas for investigating this issue?
thanks
Attachments:
EXPT 20 - ANEMOMETER NOISE.pdf ‏49 KB
EXPT 20 Anemometer Noise.seproj ‏637 KB

I have logged the voltage signal coming from the anemometer, using our 9239, a report of a few cycles is attached as a PDF. The graph clearly shows a small digital bounce. The noise is small compared to the signal, is there any way it can be filtered when acquired by the 9402?
Other tests that I ran showed a background noise in the order of 50 microVolts and a frequency of 50 Hz, this is very likely a background noise from a power supply (we run at 50 Hz in Aus) but is many orders of magnitude smaller than the signal, I assume it would be ignoredby the 9402.
Attachments:
EXPT 20 Anemometer Noise Voltage.seproj ‏261 KB
EXPT 20 Anemometer Noise Voltage.pdf ‏19 KB

Frequency counter with E-Series DAQ Card

I am trying to measure the frequency of a signal at 0.1 sec increments using a DAQCard-6062E and an AT-MIO-16E-10. I am using a VI from the examples (frequency counter.vi, in a while loop) which works well with the DAQCard, but it's giving wrong measurements with the AT-MIO card. Why is that?

Hello Waw,
Take a look at this KnowledgeBase to see if it fixes the problem.
http://digital.ni.com/public.nsf/websearch/862567530005F09E8625681C0074935E?OpenDocument
If not, what version of NI-DAQ do you have? Are you getting incorrect measurements with different ranges of frequencies? Is LabVIEW giving you any errors? Do any examples provide correct results with the AT-MIO-16E-10? Please let me know if you have any questions. Have a great day!
Marni S.
National Instruments

Frequency counter

sir,
our project is about designing an "eddy current based sensor for contactless measurement of breathing".we had so far designed an colpitts oscillator with 12MHZ of output,this frequency is converted to square pulse using a schmitt trigger.Now we want to know whether it is possible to count this frequency of 12MHZ in lab view?.what is the maximum and minimum frequency counted by lab view?
Solved!
Go to Solution.

Not until you tell me exactly which DAQ hardware you are using (model number) and what you mean by "feeding 12MHz to LabVIEW". Are you trying to measure the frequency, pulses, edges, what?
“A child of five could understand this. Send someone to fetch a child of five.”
― Groucho Marx

Frequency counter with start and stop - Need help

When No load, 200VA Unity , 400VA Unity, 600VA Unity, 400VA LAG, 400VA LEAD or Short is active Frequency Counter starts.
When OFF and BENCH PWR is active Frequency Counter stops and is grayed out.
Frequency Counter is grayed out when BENCH PWR is inactive
Any one know ho to do this?
Attachments:
Buttons.vi ‏77 KB

In order to enable or disable a control you need to write a U8 value to the Disabled property of the property node for that control.
The values are:
0 = Enabled
1 = Disabled
2 = Disabled and Grayed Out
Kelly Bersch
Certified LabVIEW Developer
Kudos are always welcome

JTables and frequency count report

Is it possible to use jTable and make the cells un-editable and have it set up where if you click on a cell it could perform a specific action? I�m working on a program that calculates a frequency count report and I want to be able to click on the cell and have it display in another window the observations that contributed to that cells count. Is this possible using jTable or is there a better way to get what I want? Thanks for any suggestions.

Yes, it looks to me as if it should be possible.

NEED HELP WITH WORD & CHAR COUNT USING HASHTABLE

I have to use a hashtable to be able to count all the words and characters (#'s, punctuations, etc) from a file
I have been able to get it to correctly count the words in the file but none of the characters and it also does not display the words alphabetically and just displays it in an odd way
Here's the code: public static void main (String [] args)
      Hashtable table = new Hashtable();
      String input = JOptionPane.showInputDialog("Enter the filename:");
      try{
      BufferedReader br = new BufferedReader(new FileReader(input));
      String s = br.readLine();
      StringTokenizer words = new StringTokenizer( s, " \n\t\r" );
        while ( words.hasMoreTokens() ) {
         String word = words.nextToken().toLowerCase(); // get word
         // if the table contains the word
         if ( table.containsKey( word ) ) {
            Integer count = (Integer) table.get( word ); // get value
            // and increment it
            table.put( word, new Integer( count.intValue() + 1 ) );
         else // otherwise add the word with a value of 1
            table.put( word, new Integer( 1 ) );
       } // end while
         String output = "";
      Enumeration keys = table.keys();
      // iterate through the keys
      while ( keys.hasMoreElements() ) {
         Object currentKey = keys.nextElement();
         output += currentKey + "\t" + table.get( currentKey ) + "\n";
         System.out.println(output.toString());
      catch (IOException e)
        System.out.println(e);
      }The output that I get for a file containing the line " Hi this is my java program" is:
this     1
this     1
program     1
this     1
program     1
hi     1
this     1
program     1
hi     1
java     1
this     1
program     1
hi     1
java     1
is     1
this     1
program     1
hi     1
java     1
is     1
my     1
I'm not sure what I am doing wrong and help would be greatly appreciated.

I have been able to get it to correctly count the
words in the file but none of the characters and it
also does not display the words alphabetically and
just displays it in an odd way
That's because hash tables are not ordered; to maintain order of insertion you could use LinkedHashMap and to maintain alphabetical order TreeMap.

Help in word frequency program for .doc files

hi,,
can anybody help me...
i have implemented a prog to find word frequency of .txt files,
now i want to use the same prog for .doc files,
how can this be done???

Hi,
I'm sure a few seconds on Google and you would have found the answer. But, take a look at Apache POI. This will allow you to extract the text from the doc files.
http://poi.apache.org/
Ben.

What words are counted?

what words are counted in the word count for pages? Do words in tables are counted?

Takes all the fun out of making someone else do it.
Peter

No reading on the frequency counter

I'm not getting a reading on the frequency counter when I wired the clock signal onto it. There's just one input to the counter, so I wired a clock and try to get a reading. All I see are the dashes. Is there something else I need to do? thank you.

Hi,
You can configure some parameters in the pop-up window of the instrument. Trigger Level, sensitivity and AC or DC Coupling. You can try to set these based on your circuit.
If you want us to take a look at it, please post the circuit.
As an alternate, you can also see the freq in the measurement probe. Go to Simulate -> Instruments -> Measurement Probe and click on the net (wire) where you want to check the frequency.
Hope this helps.
Regards,
Tayyab R,
National Instruments.

Word Frequency Counter...

Similar Messages

Maybe you are looking for