Making a phrase counter--word scanning

Hi everybody,
I'm writing a phrase counter, and so far it's working, but I'm trying to optimize it because running over a for loop is really slowing me down.
Here's the problem:
I want to get phrases that repeat more than once, from length 3 words to 10 words. Each time I change the length of the phrase (say, from 10 to 9 to 8, etc down to 3), I have to run over the entire word list all over again, which eventually amounts to 8 total passes over an ArrayList that's almost 45000 elements long right now, and will get longer.
An added complication in this is that there are certain identifiers in the word list that mark the beginnings of chapters (#) and sentences ($); any time they are hit, the iterator moves to the next word.
So, basically, something like...
example example example example example $ # example example example $
...with the repeating loop I have now would give:
0 phrases of length 10, 9, 8, 7, or 6
1 of length 5
2 of length 4
4 of length 3
Could anyone give me advice?
Thanks!
Jezzica85

Yes, you've got that right. Thank you, I think I'll try that. I won't be able to come back on and tell if it worked until sometime tomorrow probably, but in any case, thanks, I think this will get me where I need to go with a little tweaking. Come to think of it, this might be really fast, especially if I delete an index once it runs into an identifier. Very cool.
Thanks again,
Jezzica85
PS--Oh, and by the way I see you'll have been registered here for two years soon. Congrats!
Message was edited by:
jezzica85

Similar Messages

  • Counting words in a textfile

    Hello I am trying to make a program that counts the number of words in a textfile specified below - later i will make it so you can choose the textfile. When I try to compile this I get the errors:
    java:23: char cannot be dereferenced
                   if(inWord && character.isWhiteSpace(character))
    ^
    java:29: char cannot be dereferenced
                   else if(!inWord && character.isLetterOrDigit(character))
    Please could someone tell how to get this working - I know its something to do with the format of the data, maybe I should change it to a string or something?
    cheers anyone
    import java.io.*;
    public class NumberOfWords
         public static void main(String[] argStrings)
                final int EOF = -1;
                int count = 0;
                boolean inWord = false;
                FileReader file = new FileReader("TestFile.txt");
                for(int i = file.read(); i != EOF; i = file.read())
                   char character = (char)i;
                   if(inWord && character.isWhiteSpace(character))
                     // we've come to the end of a word, so count it
                     count++;
                     inWord = false;
                   else if(!inWord && character.isLetterOrDigit(character))
                     // we've just started a word or number
                     inWord = true;
                if(inWord)  // count the last word in the file
                   count++;
          System.out.println(count);
    }

    Hello again, I am making great progress as I now have a program where you can specify the file and it does the required function. I would like to modify it to count the number of words on each line. Please could I have some tips on how to do this.
    cheers
    *Write an application which displays the number of words on each
    *line of a text file. Assume one space between words, and no spaces
    *at the start and end of the lines. Test the application with a
    *suitable input file.
    import java.io.*;
    import java.util.Scanner;
    public class NumberOfWords
         public static void main(String[] argStrings)throws Exception
                final int EOF = -1;
                int count = 0;
                boolean inWord = false;
                System.out.println("Enter file for word counting");
                Scanner scan = new Scanner(System.in);
                String inputFile = scan.nextLine();
                FileReader file = new FileReader(inputFile);
                for(int i = file.read(); i != EOF; i = file.read())
                   char myChar = (char)i;
                   if(inWord && Character.isWhitespace(myChar))
                     // we've come to the end of a word, so count it
                     count++;
                     inWord = false;
                   else if(!inWord && Character.isLetterOrDigit(myChar))
                     // we've just started a word or number
                     inWord = true;
                if(inWord)  // count the last word in the file
                   count++;
           System.out.println();
           System.out.println("Number of words: " + count);
    }

  • ArrayList indices (Phrase counter follow-up)

    Hi everybody--
    I finally figured out how to get code onto this computer I'm working with, so here's what I'm trying to do for anyone that doesn't remember. I'm using TuringPest's suggestion to go through my word list once, and then delete any index that fails on an expansion.
    This is my original post:
    I'm writing a phrase counter, and so far it's working, but I'm trying to optimize it because running over a for loop is really slowing me down.
    Here's the problem:
    I want to get phrases that repeat more than once, from length 3 words to 10 words. Each time I change the length of the phrase (say, from 10 to 9 to 8, etc down to 3), I have to run over the entire word list all over again, which eventually amounts to 8 total passes over an ArrayList that's almost 45000 elements long right now, and will get longer.
    An added complication in this is that there are certain identifiers in the word list that mark the beginnings of chapters (#) and sentences ($); any time they are hit, the iterator moves to the next word.
    So, basically, something like...
    example example example example example $ # example example example $
    ...with the repeating loop I have now would give:
    0 phrases of length 10, 9, 8, 7, or 6
    1 of length 5
    2 of length 4
    4 of length 3
    and this is the reply I'm trying to emulate:
    Find the occurences of the smaller phrases first.
    Index their locations.
    Do all subsequent longer phrase matches only from the saved locations.
    I wrote a self-contained test program, and the problem with it seems to be that the correct indices aren't being saved. The first iteration where we go through every possible index for phrases 3 words long works fine, but every subsequent one doesn't and I'm not sure why. Here's the code:
    EDIT: Sorry about the bump but I just realized I might have left something out for clarity--the newIndexes list is being created each time to hold the places where phrases can still be built, and then replacing the old indexes list (at least, that's what it's supposed to do).
    import java.util.ArrayList;
    import java.util.TreeMap;
    public class Test {
         public static void main(String[] args) {
              ArrayList<String> words = new ArrayList<String>();
                    // Two different arrays to test with
              //String[] raw = {"#", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "$", "#", "eleven", "twelve", "thirteen", "fourteen", "$"};
              String[] raw = {"#", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "$"};
              for( int i = 0; i < raw.length; i++ ) {
                   words.add( raw[i] );
              TreeMap<String, ArrayList<Integer>> phraseList = new TreeMap<String, ArrayList<Integer>>();
              ArrayList<Integer> indexes = new ArrayList<Integer>();
              int lengthOfPhrase = 3;
              for( int i = 0; i < words.size(); i++ ) {
                   indexes.add( i );
              do {
                   TreeMap<String, ArrayList<Integer>> phraseRaw = new TreeMap<String, ArrayList<Integer>>();
                   ArrayList<Integer> newIndexes = new ArrayList<Integer>();
                   int chapter = 0;
                   for( int i = 0; i < indexes.size(); i++ ) {
                        String ready = words.get( indexes.get( i ) );
                        boolean okay = true;
                        if( ready.equalsIgnoreCase( "#" ) ) {
                             okay = false;
                        } else if( !ready.equalsIgnoreCase( "$" ) ) {
                             okay = true;
                             for( int j = 0; j < lengthOfPhrase; j++ ) {
                                  if( j != 0 ) {
                                       ready = ready.concat( " " );
                                       String next = words.get( i + j );
                                       if( next.equalsIgnoreCase( "#" ) || next.equalsIgnoreCase( "$" ) ) {
                                            okay = false;
                                            break;
                                       } else {
                                            ready = ready.concat( next );
                             if( okay ) {
                                  newIndexes.add( indexes.get( i ) );
                                  if( phraseList.containsKey( ready ) ) {
                                       ArrayList<Integer> values = phraseList.get( ready );
                                       values.add( chapter );
                                       phraseList.put( ready, values );
                                  } else {
                                       if( phraseRaw.containsKey( ready ) ) {
                                            ArrayList<Integer> values = phraseRaw.get( ready );
                                            values.add( chapter );
                                            phraseList.put( ready, values );
                                       } else {
                                            ArrayList<Integer> newValues = new ArrayList<Integer>();
                                            newValues.add( chapter );
                                            phraseRaw.put( ready, newValues );
                   lengthOfPhrase++;
                   indexes.clear();
                   indexes = newIndexes;
              } while( lengthOfPhrase <= 10 );
    }Does anyone see where I'm going wrong?
    Thanks,
    Jezzica85
    Message was edited by:
    jezzica85

    OK, I guess I'll try again to explain what's going on and try to be more clear.
    Inside the map, there is a list of phrases, ranging from 3 to 20 words in length. For those phrases that appear in the document more than once (that is, the length of their arrayList value is 2 or more), I need to know if they are contained within any other phrase keys in the map.
    So, if part of the map was like this:
    This is an example=[2,4,6]
    This is an=[2,4,6]
    is another example of this=[1,3,5]
    is another example=[1,3,5]
    The revised map after the substrings were taken out would be:
    This is an example=[2,4,6]
    is another example of this=[1,3,5]
    The second entry of the original map would be taken out because it was contained by and occurred in the same positions as the first, and the fourth entry of the original map would be taken out because it was contained by and occurred in the same positions as the third.
    Is this any clearer?
    Thanks for looking and trying to help,
    Jezzica85

  • Counting words in a single cell in Numbers'09

    Hi there,
    I'm relatively new to Mac world, but I do have years of computer experience from a PC and have also had to do with Macs at the age of first eMacs . I have finally decided to switch to the brighter side of life (hopefully ;)).
    But here is my question: I need to count words in a cell in Numbers'09.
    Is there a specific function combination for achieving this? My idea was: strip excessive spaces, count the occurencies of all space character in a cell, add 1 and voila! Problem is I can not achieve it using formulas in Numbers'09. I have found some help for Excell but the formulas are a little different. And well, I would like to leave the past behind and stick to a Apple programs - if I can. I don't like the idea to install Excell on a Windows Bootcamp partion only for this purpose.
    Any help would be greatly appreciated. Thanks.
    Aleksander

    Badunit wrote:
    Yvan once had a list of all the different localizations. He may still have it.
    I'm late but, I was very busy
    The table with every localized functions names is (and will remain) available on my iDisk :
    <http://public.me.com/koenigyvan>
    Download :
    For_iWork:iWork '09:functionsNames.numbers.zip
    An easy soluce for foreign users (like me) is to duplicate Numbers.app and remove its languages resources minus English.
    Running it you will have it running in English (minus the decimal and the parameters separators, minus also date time formats and default currency).
    It would be easy to enter the formulas given in this forum.
    Once saved, we may open the doc in the 'standard' Numbers and the formulas will be automatically localized.
    Yvan KOENIG (VALLAURIS, France) mardi 2 mars 2010 18:30:45

  • The numbering format keeps changing when making PDF's from Word 2007 ? Using Acrobat 9 Pro Extended

    The numbering in (Contents) format keeps changing when making PDF's from Word 2007 ? Using Acrobat 9 Pro Extended.

    The issue is that I have made up a contract in Word.
    The second page has a list of all contents of the contract.
    gghhjhhbhbhhbhbjbhj....1
    bv v vghvjvjnnnnnnnnn....2
    When we convert to PDF some of the numbers change. Example 20 becomes 201.
    Your help is appreciated.
    Cheers Ocean designs.

  • How to count words in a PDF file?

    Is there any way I can count words in a PDF file without resorting to Acrobat Reader (which apparently has that feature)?
    That's a massive program, which I actually don't like.
    I need to count words in the PDF file because I write my papers with LaTeX, and they're full of my extensive comments.
    Do you know of any alternative?

    that utility IIRC cannot be found on xpdf (the official Archlinux package) anymore and its part of poppler
    edit: its pdftotext btw
    Last edited by dolby (2008-05-11 13:35:05)

  • Counting words in a textArea

    Help !!
    I am trying to count the number of words contained in a text area - Is it possible to do this ?
    I am new to Java and I'm stuck because I only know how to write programs that use FileReader to count the number of words in a file.
    Is it possible to count the number of words in a string directly - or will I have to save the string to file and then apply another program to count the number of words ?

    You inser this code in your source file (TextArea1 is the name of the TextArea, whose you want to count words)
    String texte=TextArea1.getText();
    String rt=String.valueOf((char)13) + String.valueOf((char)10);
    StringTokenizer mots=new StringTokenizer(texte," ,.:;!?\t"+rt);
    int nbremots=mots.countTokens();
    and you place at the header of the file
    import java.util.StringTokenizer;
    nbremots is the number of words The delimitors between words are noticed in the second argument of the constructor of class StringTokenizer. I have choosen the main signes of punctuation like space, coma,stop... the String rt
    symbolise return on Windows. If you work on Unix prefer this code
    String texte=TextArea1.getText();
    StringTokenizer mots=new StringTokenizer(texte," ,.:;!?\t\n");
    int nbremots=mots.countTokens();
    You can also with the class StringTokenizer read the words one by one with this type of boucle
    while (mots.hasMoreTokens())
    String mot=mots.nextToken();
    mots.nextToken();
    It is an useful class. See the API about this class StringTokenizer in the package java.util

  • Count words in each sentece in a file

    Hi all
    i want to ask how can i count words in each sentece in a file??
    ie if i have the follwoing sentece
    i ate the cake.
    today is Sundy.
    i went to school 5 day a week.
    to have the number as
    4
    3
    8
    any ideas??

    you could read the file line per line, put the line
    in a string and use StringTokenizer to split it into
    word and count them. Or you could read file char per
    char, increasing the word counter everytime you find
    a blank char, when the read char is a newline you
    save the old counter and start a new word count for
    the new line.That's an option, but a sentence is not ended by a newline. A sentence ends with a full stop/point.
    Kaj

  • Is there a method for counting words?

    Hi!
    Is there a method for counting words?
    How do I read specific data ( row, column ) out of an array?
    Thx
    Lebite

    There's could be a better way, but this is how I would do it:
            String[][] myArray = { {"Blah Blah Blah"},
                                   {"Blah Blah Blah"},
                                   {"Blah Blah Blah"} };
            int tokens = 0;
            for(int i = 0; i < myArray.length; i++)
                for(int j = 0; j < myArray.length; j++)
    StringTokenizer st = new StringTokenizer(myArray[i][j]);
    tokens += st.countTokens();
    System.out.println(tokens);

  • HELP ! Making a retry counter

    I'm making a retry counter for the logins and I'm using the j_security_check from Tomcat.
    I made a little class that must check if there is a session object.
    if not create one and put the value 1 in it, for the first try.
    If there is a session object, increase the value and put it in the session object.
    Now is my problem that I get a nullpointer exception and I don't know how to fix it.
    This is the code:
    package beans;
    import java.io.Serializable;
    import javax.servlet.http.*;
    import  javax.servlet.jsp.PageContext;
    * @author Sander Stad
    public class LoginCountBean implements Serializable{
         private int count;
         private String loginTries = null;
         private Integer logins;
         HttpSession session;
         HttpServletRequest request;
         PageContext pageContext;
         public LoginCountBean(){
              count = 1;
         public boolean checkSessieObject(){
              boolean retry = true;
              loginTries = (String) pageContext.findAttribute("logins");
              if(loginTries == null){
                   count = 1;
                   logins = new Integer(count);
                   setSessieObject(logins);
              } else{
                   count = Integer.parseInt((String) pageContext.findAttribute("logins"));
                   if(count >= 3){
                        retry = false;
                   } else{
                        count++;
                        logins = new Integer(count);
                        pageContext.setAttribute("logins", logins, PageContext.SESSION_SCOPE);
              return retry;
         public void setSessieObject(Integer logins){
              request.getSession();
              pageContext.setAttribute("logins", logins, PageContext.SESSION_SCOPE);
    }

    How does the bean get the request and pageContext objects?

  • Count word without space in C#

    Dear sir,
    I would like to count word in sentence without space in c# but I could not solve this code.Please solve this one coding is below
                int i = 0,b=0;
                int Count2 = 1;
                for (i = 0; i < Paragraph.Length; i++)
                    if (Paragraph[i] == ' ')
                        for (b = i; b < Paragraph.Length; b++)
                            if (Paragraph[b] != ' ' && Paragraph[b] != '\t')
                                Count2++;
                                break;
                Console.WriteLine("Total Word = {0} .", Count2);

    Dear Sir,
    I want to count words without space in a Sentence but I face a problem that when I run the program space is also count with words but I want count only words Please guide . My C#  program is as under .
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Text.RegularExpressions;
    using System.IO;
    namespace Static_Method_Count_Word
        class Program
            static void Main(string[] args)
                Console.WriteLine("Please Enter Any Paragraph .");
                string Paragraph = Console.ReadLine();
                Count.Coun(Paragraph);
                Console.ReadLine();
        class Count
            public static void Coun(string Paragraph )
                int i = 0,b=0;
                int Count2 = 1;
                for (i = 0; i < Paragraph.Length; i++)
                    if (Paragraph[i] == ' ')
                        for (b = i; b < Paragraph.Length; b++)
                            if (Paragraph[b] != ' ' && Paragraph[b] != '\t')
                                Count2++;
                                break;
                Console.WriteLine("Total Word = {0} .", Count2);

  • How to count words?

    Hello, namless hero~
    Just like the title said, if you has any idear, please reply this post, thanks!!

    There are two basic ways I can think of to do this. Either get the text, and run your own word count alogorithm on it, or use the word boundary methods in ParagraphElement. In the first instance, you have more control over what is considered as a word, and in the second way you let the Player decide.
    Option 1:
    You can get the text of the TextFlow by exporting using the plain text filter:
         var text:String = TextFilter.export(textFlow, PLAIN_TEXT_FORMAT, ConversionType.STRING_TYPE) as String;
    Once you have the text, just scan through it to find the words. This is the best way if you want to define for yourself what constitutes a word.
    If there is going to be a lot of text, you may prefer to look at it paragraph by paragraph. So instead you would loop through the paragraphs in the flow, and call getText() on each one, and then run your algorithimn on the String returned by getText().
    Option 2:
    Otherwise, you could iterate through the paragraphs of the flow, looking for word boundaries. This will get you all possible word boundaries, including between spaces. To do this, it would look something like this:
         var paragraph:ParagraphElement = textFlow.getFirstLeaf().getParagraph();
         do {
              var relativePosition:int = paragraph.findWordBoundary(0);
              while (relativePosition < paragraph.textLength)
                   trace("Word boundary at", paragraph.getAbsoluteStart() + relativePosition);
                   relativePosition = paragraph.findWordBoundary(relativePosition);
              paragraph = paragraph.getNextParagraph();
         } while (paragraph != null);
    I haven't compiled this or debugged it, and I coded it from memory, but hopefully this will serve as a guideline for what you could do.

  • How to count words in a text ?

    Hi all,
    can anyone show me how to count the number of words in a text.
    thank in advandce,
    Toan.

    Hi,
    Are you reading the text from a file or is it stored in a string buffer?
    The best way would be to use a StringTokenizer assuming that all your words are separated by a space ' ' you could use that as your delimiter.
    Something like..
    String text = "This is just a bunch of text stored as a string.";
    StringTokenizer words = new StringTokenizer( text, " ", false );
    int numberOfWords = words.countTokens();I hope that helps,
    .kim

  • IBooks Author seems to have stopped counting words at a bit under 20,000.  Page count is wrong too.  Anyone else seen this or know what's wrong?

    I believe I'm updated to the latest version of iBooks Author, and have crossed over 20,000 words on my book, and it seems to be stuck at 19,267 words.  My page count is up to 91 but iBooks Author says 85 (that one is easier to verify, but word count simply isn't moving at this point.  I browsed a bit but didn't find the issue reported elsewhere.  Is there perhaps an option I'm missing or something?  It seemed to be just fine, and when I got things transferred from Pages the word count was at least similar to what it was so I think it's been working.

    Fabe, thanks for asking clarifying questions. Inspector in iBooks Author tells me there are 85 pages, and the page count next to the thumbnails tells me a different number, saying the last page is 91.  That is not including front matter so, yes, there are more pages.  In fact, many more since there is a quite a bit more white space in the iBooks version.  As for word count and being able to enter more text, I really didn't notice that the word count was not changing until I had checked back several times and noticed it hadn't crossed over 20,000 words and then noticed that the number looked awfully familiar.  In my case, it certainly seems to be stuck at 19.267 words, and I don't see any other negative effects.
    Steve

  • Counting words in a text widget

    Hi,
    Is there a way I can count the number of words in a text box widget?  I realise that means counting the number of words in the attached variable - but can I do this?
    I have tried playing with Javascript (about which I know nothing)
    I put the script below into the script window for a button, but nothing seems to happen!
    Test=Q5WidgetAnswer.split(" ").length-1;
    document.write(Test);
    I want to check for three words in an answer and if there are only two, or four, warn that there are Three answers.
    Would be grateful for any help!

    After much heartache I solved the issue. 
    The code needed to be as follows:
    var objCP = document.Captivate;
    var Answer = objCP.cpEIGetValue('WidgetAnswer');
    var Words;
    if(Answer ==  ''){   /* tests for empty answer */
      Words=0;
    alert("Fred")
    else {
    Words = Answer.replace(/[^ ]/g,'').length+1;    /* counts the number of spaces in answer */
    Where objCP can be any name you want, and cpEISetValue and cpEIGetValue are special Captivate functions
    /* and */ enclose a comment

Maybe you are looking for

  • Can i use copy and paste in callendar and if so how?

    In Outlook you can copy and paste. Is this avilable in callendar and if so how does it work?

  • Gradient Overlay in Photoshop Elements 8?

    Gradient Overlay in Photoshop Elements 8? On the school laptop it has Photoshop elements 8 I want to do a text style with a gradient overlay like here (in the Bubbly text part) : http://www.crunchyroll.com/forumtopic-612453/tutorialsipanda-/?pg=0 But

  • Profile problem

    I have problem with Profile : conn / as sysdba connected: 1. CREATE PROFILE SSPF LIMIT SESSIONS_PER_USER 1; 2. create user test identified by test; grant create session, alter session to test; 3. alter user test quota 10M on users; 4. alter users tes

  • Cost of good sold report in SAP B1

    Hi, Just wonder do SAP B1 has the cost of good sold report? Or any other similar report/query? Thanks

  • Address and Tab Bar in Fullscreen

    When I am in Fullscreen mode, the toolbar at the top disappears and only appears when you mouse-over the top of the screen. Is there a setting somewhere so that when I go to fullscreen mode, I can choose to keep the address bar, tab bar and the bookm