Word frequency data :( ???

I am stuck again...maybe some1 can help me again
i tried writting the public class WFreqRecord but i dont think i understand it at all....i am really stuck now..did the first bit just about. if any1 has time can someone get me started and myabe explain a little how i am meant to do this next step..i post what i have so far...and the page where the instructions are...thanx ppl!!
regards
newbie
import java.io.*;
import javax.swing.*;
import java.util.*;
public class TextHandlerImpl
public static void main (String[] args) throws IOException
JFileChooser chooser = new JFileChooser();
int status = chooser.showOpenDialog (null);
if (status != JFileChooser.APPROVE_OPTION)
System.out.println ("No File Chosen");
else
String fileName = chooser.getSelectedFile().getAbsolutePath();
String lines[] = getTextLines(new File(fileName));
// debug
for (int i = 0; i < lines.length; i++)
System.out.println("Line number " + i + " : " + lines);
public static String[] getTextLines(File f)
FileInputStream fio = null;
BufferedReader br = null;
int i = 0;
ArrayList lines = new ArrayList();
try
fio = new FileInputStream(f);
br = new BufferedReader(new InputStreamReader(fio));
String line = br.readLine();
while (line != null)
lines.add(line);
line = br.readLine();
catch(Exception e)
System.out.println("Got Exception = " + e);
finally
try
if (fio != null) fio.close();
if (br != null) br.close();
catch (Exception e){}
return (String[])lines.toArray(new String[0]);
public void listText(PrintWriter pwrtr, String[] text)
for (int i = 0; i < text.length; i++)
pwrtr.println(text);
http://www.vtr.net/~mwb/java/SYS-1A...2%20and%203.htm
source files;
http://www.vtr.net/~mwb/java/source/WFreqRecord.java
http://www.vtr.net/~mwb/java/source/WFreqSeqAccess.java
http://www.vtr.net/~mwb/java/source/WFreqSeqBuilder.java

sorry, bad link...here it is
thanks!!!!!!!!!!!!
http://www.vtr.net/~mwb/java/SYS-1A4Y%20Spring%202003%20Coursework%202%20--%20Details%20for%20Stages%202%20and%203.htm

Similar Messages

  • Creating of the word-frequency histogram from the Oracle Text

    I need make from the Oracle Text index of the "word-frequency histogram", this is list of the tokens in this index, where each token contains the list of documents that contain that token and frequency this token in the every document. Don´t anybody know how to get this data from Oracle Text index so that result will save to the table or to the text file?

    You can use ctx_report.token_info to decipher the token_info column, but I don't think the report format that it produces is what you want. You can use a query template and specify algorithm=count to obtain the number of times a token appears in the indexed column. You can do that for every token by using the dr$...$i table, as shown below. Formatting is preserved by prefacing the code with pre enclosed in square brackets on the line above all of the code and /pre in square brackets on the line below all of the code.
    SCOTT@10gXE> create table otntest
      2    (doc_id       number primary key,
      3       document  varchar2(100))
      4  /
    Table created.
    SCOTT@10gXE> insert all
      2  into otntest values (1, 'This is a test for generating a histogram')
      3  into otntest values (2, 'Histogram shows the list of documents that contain that token and frequency')
      4  into otntest values (3, 'frequency histogram frequency histogram frequency')
      5  select * from dual
      6  /
    3 rows created.
    SCOTT@10gXE> create index otntest_ctx_idx
      2  on otntest(document)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@10gXE> column token_text format a30
    SCOTT@10gXE> select t.doc_id, i.token_text, score (1) as token_count
      2  from   otntest t,
      3           (select distinct token_text
      4            from   dr$otntest_ctx_idx$i) i
      5            where  contains
      6                  (document,
      7                   '<query>
      8                   <textquery grammar="CONTEXT">'
      9                   || i.token_text ||
    10                   '</textquery>
    11                   <score datatype="INTEGER" algorithm="COUNT"/>
    12                   </query>',
    13                   1) > 0
    14  order  by doc_id, token_text
    15  /
        DOC_ID TOKEN_TEXT                     TOKEN_COUNT
             1 GENERATING                               1
             1 HISTOGRAM                                1
             1 TEST                                     1
             2 CONTAIN                                  1
             2 DOCUMENTS                                1
             2 FREQUENCY                                1
             2 HISTOGRAM                                1
             2 LIST                                     1
             2 SHOWS                                    1
             2 TOKEN                                    1
             3 FREQUENCY                                3
             3 HISTOGRAM                                2
    12 rows selected.
    SCOTT@10gXE>

  • Is there a way to create a Word Frequency list in 8.2.5?

    I would like to be able to create a Word Frequency list for each PDF in a batch of PDFs. I have used Batch Processing to make sure each one is OCR/searchable. Is there a way to do this in Acrobat Pro 8.2.5 / Windows XP?
    Thanks!

    The SMTP banner is not customizable in 3x.
    In 4x and IMS 5x, you can use the configutil <b>service.smtp.banner</b> attribute.

  • I have microsoft word for for my Mac book pro. When I open saved letters /documents on word the date automactically changes to the current date. How do I stop this happening? Many Thanks

    I have microsoft word for for my Mac book pro. When I open saved letters /documents on word the date automactically changes to the current date. How do I stop this happening? Many Thanks

    I suggest you post your qeustion on the Microsoft Mac forums as it's their software you're having issues with and that's where the MS experts hang out
    http://answers.microsoft.com/en-us/mac

  • Where, if any, Word Count and Word Frequency tools on Apple Works or MS Wrd

    5/26/2008. Is there a "word count" and/or "word frequency" tool(s) on Apple Works word document and/or Microsoft Word for Mac word document, and how does one use or activate it, or find such for it on the web?? Many thanks. C. Yopst, Chicago

    Hi C,
    In an AppleWorks WP document, go Edit > Writing Tools > Word count. This will give you a count of characters, words, lines and paragraphs in the document or, if you have selected a portion of the document, in the selection.
    To my knowledge, there's no tool to give a word frequency figure, although that could be done with some fairly easy manipulation involving a spreadsheet.
    For MS Word questions, I'd suggest checking the Word section of Microsoft's Mactopia site. Try searching "count words".
    Regards,
    Barry

  • Best method for collecting low frequency data

    Hello everyone,
    I'm looking for suggestions on the best way to collect relatively low frequency data (about 1 Hz). I know there are a few different ways to do so in labview such as the DAQ assistant or making DAQ mx and making your own virtual channel. Also there are an abundence of different settings to choose from. I'm using an NI 9215 DAQ card and am collecting voltages. I would be interested to here any opinions on a method for doing so and maybe the settings that they would use.
    The reason I'm asking is because I'm just using the DAQ assistant but I'm really not sure if that's what I want to be using. I feel like there is a better way.
    Thank you all!

    winterfresh11 wrote:
    Is this different from triggering? Because this particular DAQ card can't be triggered.
    There is a big difference between triggering and sample clock.  The trigger tells the DAQ to start acquiring data.  The sample clock tells the DAQ when to take a sample.  You trigger once per acquisition.  The sample clock just keeps on going until the acquisition is complete (either aborted or desired number of samples is acquired).
    There are only two ways to tell somebody thanks: Kudos and Marked Solutions
    Unofficial Forum Rules and Guidelines

  • Help in word frequency program for .doc files

    hi,,
    can anybody help me...
    i have implemented a prog to find word frequency of .txt files,
    now i want to use the same prog for .doc files,
    how can this be done???

    Hi,
    I'm sure a few seconds on Google and you would have found the answer. But, take a look at Apache POI. This will allow you to extract the text from the doc files.
    http://poi.apache.org/
    Ben.

  • How load from MS Word document data to Oracle?

    Hello,
    if anybody have any idea of how to load data from microsoft word document data field by field to oracle database, instead of doing through the mailmerge. we have word documents which will have descriptions, titles, financial data for different years say for example from 1 to 10 years data in this case how can i load directly from document to oracle database? I appreciate your inputs.
    Thanks so much....

    Hi,
    Although I use client_ole2 from the webutil package to go the other way i.e load oracle data into a word document, im sure there will be a client_ole2.get_* command to get the value of a field code from a document.

  • Counting word frequency

    Hi all,
    I'm very new to Java, only a few months into it. I am working on a program that compresses a file, thread, string. Firstly it takes a string of undetermided length as input and using the delimiter splits the string, then places that string into a list. Then from that list I take out the recurring words with another list and return the index of where the words are, this has been done without any problems.
    Next I want to get the word frequency and return the most common word into the first index following onto the next and so on. I am not sure how best to go about this. I did use a map but that just gives the frequency of the words without any ordering. Can anybody give me any suggestions that I could try bearing in mind that java is my first language so this is also a confidence building exercise.
    Many thanks to anybody who responds.

    littlejim4 wrote:
    Hi all,
    I'm very new to Java, only a few months into it. I am working on a program that compresses a file, thread, string. Firstly it takes a string of undetermided length as input and using the delimiter splits the string, then places that string into a list. Then from that list I take out the recurring words with another list and return the index of where the words are, this has been done without any problems.
    Next I want to get the word frequency and return the most common word into the first index following onto the next and so on. I am not sure how best to go about this. I did use a map but that just gives the frequency of the words without any ordering. Can anybody give me any suggestions that I could try bearing in mind that java is my first language so this is also a confidence building exercise.
    Many thanks to anybody who responds.Okay, so you have a Map with the 'word' as keys, and the 'frequency' as values, right?
    If so, create a class (WordOccurrence for example) that holds a "word" and "occurrence" variable. Let that class also implement the Comparable interface. Now, when you can do something like this:
    public static List<WordOccurrence> getWordOccurrances(String text) {
        // get the occurrence of each (unique) word
        Map<String, Integer> frequencyMap = getFrequencyMap(text);
        // create a lit to hold you WordOccurrence instances
        List<WordOccurrence> list = new ArrayList<WordOccurrence>();
        // for each word in your map, create a WordOccurrence instance
        for(String word : frequencyMap.keySet()) {
            list.add(new WordOccurrence(word, frequencyMap.get(word)));
        // sort the list of WordOccurrences
        Collections.sort(list);
        // return the sorted list
        return list;
    }

  • Best way to implement a word frequency counter (input = textfile)?

    i had this for an interview question and basically came up with the solution where you use a hash table...
    //create hash table
    //bufferedreader
    //read file in,
    //for each word encountered, create an object that has (String word, int count) and push into hash table
    //then loop and read out all the hash table entries
    ===skip this stuff if you dont feel like reading too much
    then the interviewer proceeded to grill me on why i shouldn't use a tree or any other data structure for that matter... i was kidna stumped on that.
    also he asked me what happens if the number of words exceed the capacity of the hash table? i said you can increase the capacity of the hash table, but it doesn't sound too efficient and im not sure how much you know how to increase it by. i had some ok solutions:
    1. read the file thru once, and get the number of words in the file, set the hashtable capacity to that number
    2. do #1, but run anotehr alogrithm that will figure out distinct # of words
    3. separate chaining
    ===
    anyhow what kind of answeres/algorithms would you guys have come up with? thanks in advance.

    i had this for an interview question and basically
    came up with the solution where you use a hash
    table...
    //create hash table
    //bufferedreader
    //read file in,
    //for each word encountered, create an object thatWell, first you need to check to make sure the word is not already in the hashtable, right? And if it is there, you need to increment the count.
    has (String word, int count) and push into hash
    table
    //then loop and read out all the hash table entries
    ===skip this stuff if you dont feel like reading too
    much
    then the interviewer proceeded to grill me on why i
    shouldn't use a tree or any other data structure for
    that matter... i was kidna stumped on that.A hashtable has ammortized O(1) time for insert and search. A balanced binary search tree has O(log n) complexity for the same operations. So, a hashtable will be faster for large number of words. The other option is a so-called "trie" (google for more), which has O(log m) complexity, where m is the length of the longest word. So if your words aren't too long, a trie may be just as fast as a hashtable. The trie may also use less memory than the hashtable.
    also he asked me what happens if the number of words
    exceed the capacity of the hash table? i said you can
    increase the capacity of the hash table, but it
    doesn't sound too efficient and im not sure how much
    you know how to increase it by. i had some ok
    solutions:The hashmap implementation that comes with Java grows automatically, you don't need to worry about it. It may not "sound" efficient to have to copy the entire datastructure, the copy happens quickly, and occurs relatively infrequently compared with the number of words you'll be inserting.
    1. read the file thru once, and get the number of
    words in the file, set the hashtable capacity to that
    number
    2. do #1, but run anotehr alogrithm that will figure
    out distinct # of words
    3. separate chaining
    ===
    anyhow what kind of answeres/algorithms would you
    guys have come up with? thanks in advance.I would do anything to avoid making two passes over the data. Assuming you're reading it from disk, most of the time will be spent reading from disk, not inserting to the hashtable. If you really want to size the hashtable a priori, you can make it so its big enough to hold all the words in the english language, which IIRC is about 20,000.
    And relax, you had the right answer. I used to work in this field and this is exactly how we implemented our frequency counter and it worked perfectly well. Don't let these interveiewers push you around, just tell them why you thought hashtable was the best choice; show off your analytical skills!

  • Way to save MS Word file as a PDF with original MS Word file date?

    For archiving purposes, I'm betting PDFs will be readable (backwards compatible) long after my current file saved in Microsoft Word becomes out of date. Anyone know of a way I can save my Word file as a PDF while retaining the original Word file last-modified date (for patent protection reasons to prove date of authorship)? (or Word to text file or ASCII file would also be equivalent as long as original date is kept)

    I doubt that a computer date would suffice for legal proof of date. It is easily changed for those that know their way around the computer. I would have to play a bit, but could probably do it without too much trouble. About the only way to satisfy the need for a patent is to properly insert the pages in a patent notebook and get the information notorized. From what I have seen, the legal profession generally requires hard copy, not a computer file. A loose-leaf form is also typically not acceptable, but a bound notebook properly annotated. The notorized aspect is a way around some of this I think. In any case, it sounds like you need to consult a patent attorney.

  • Importing word documents -  data fields

    Hi..
    When trying to import a word document with bookmarks and datafields the bookmarks came correctly but the data fields did not.
    the data fields were created in the word document using an executable file where the fileds in the excel sheet were picked up and filled to the word document. (the descriptions in the relevent fields table in the attached document) when they are imported to robohelp the bookmarks are missing. let me know how i should correct it.

    I'm not surprised about the data fields. That data isn't actually in the Word document, only a reference, so there isn't anything to import to RH. This can't be fixed from RoboHelp; the answer (if there is one) lies in Word, or possibly the executable that gets the data from Excel.
    If you find an answer, please post it - I am having a similar Word problem. I am trying to convert cross-references to regular text - a little bit like "Paste Special > Values" converts formulas to their resulting values in Excel.

  • Manipulate Frequency Data

    I'm reading in sound data, getting the FFT and displaying the data - frequency Vs.. Magnitude. I wish to set certain frequency bands to zero.
    The output data type following the FFT (PS/PSD VI) is a cluster of 3 elements, fo, df and magnitude (1-D array). I've unbundled the output of the PS/PSD VI by name.
    I'm then setting certain elements in the 1D magnitude array to zero.
    The problem ...
    I can't seem to re-create the cluster of 3 elements, fo, df and magnitude such that I can graph the result.
    Help is much appreciated !
    Solved!
    Go to Solution.

    In newer versions of LabVIEW, you can also use the in place element structure. Same difference.
    LabVIEW Champion . Do more with less code and in less time .
    Attachments:
    ChangeY2.png ‏3 KB

  • Word and Data Programs for G5

    Just updated the PowerMac G5 to 10.5.8 (Apple sold me an update).  Now Classic Environment (ClarisWorks) won't work and all my word documents and data base (have them backed up) won't work.  What should I buy to do word processing and data that will also read the old documents?

    Hello, Appleworks 6 still runs in 10.5/10.6, if you can find a copy, & it opens most if not all .cwk files.
    Also see this...
    https://discussions.apple.com/thread/3789280?start=0&tstart=0
    http://www.fileinfo.com/extension/cwk

  • Function LabVIEW to acquire high-frequence data

    Hi,
    Actually, in a program created with LV 4.1, we use "AI Sample Channel" in a while loop to acquire data from 3 channels on a PCI-6025E card. But the frequence is limited to about 0.08s. Now, we try to port the program to LV 6.1. My question is what the function in LV we should use to perform acquisions at highest frequence. We found "AI Acquire Waveforms" in LV's standard library, but it implies a lots of modifications and we don't know if it's the fastest method.
    Thanks.

    I would advise you to study the data acquisition examples and the manual of LV.
    By using another way of doing DAQ, together with a slight modification of your program, your are able to at least a 10 or 100 times increase in throughput.
    The trick is that you don't have to re-initialise all channels, sample-rates,gains, etc, etc. when doing DAQ. This is what happens when you use some of the basic DAQ input vi's, but these are not intented for measuring data in a fast repeating way.
    Just let your LabVIEW application sample the 3 channels as a background job.
    Reading out buffers will give you the acquired values, one of the powers of NI-DAQ.
    Patrick de Boevere

Maybe you are looking for