Best way to implement a word frequency counter (input = textfile)?

i had this for an interview question and basically came up with the solution where you use a hash table...
//create hash table
//bufferedreader
//read file in,
//for each word encountered, create an object that has (String word, int count) and push into hash table
//then loop and read out all the hash table entries
===skip this stuff if you dont feel like reading too much
then the interviewer proceeded to grill me on why i shouldn't use a tree or any other data structure for that matter... i was kidna stumped on that.
also he asked me what happens if the number of words exceed the capacity of the hash table? i said you can increase the capacity of the hash table, but it doesn't sound too efficient and im not sure how much you know how to increase it by. i had some ok solutions:
1. read the file thru once, and get the number of words in the file, set the hashtable capacity to that number
2. do #1, but run anotehr alogrithm that will figure out distinct # of words
3. separate chaining
===
anyhow what kind of answeres/algorithms would you guys have come up with? thanks in advance.

i had this for an interview question and basically
came up with the solution where you use a hash
table...
//create hash table
//bufferedreader
//read file in,
//for each word encountered, create an object thatWell, first you need to check to make sure the word is not already in the hashtable, right? And if it is there, you need to increment the count.
has (String word, int count) and push into hash
table
//then loop and read out all the hash table entries
===skip this stuff if you dont feel like reading too
much
then the interviewer proceeded to grill me on why i
shouldn't use a tree or any other data structure for
that matter... i was kidna stumped on that.A hashtable has ammortized O(1) time for insert and search. A balanced binary search tree has O(log n) complexity for the same operations. So, a hashtable will be faster for large number of words. The other option is a so-called "trie" (google for more), which has O(log m) complexity, where m is the length of the longest word. So if your words aren't too long, a trie may be just as fast as a hashtable. The trie may also use less memory than the hashtable.
also he asked me what happens if the number of words
exceed the capacity of the hash table? i said you can
increase the capacity of the hash table, but it
doesn't sound too efficient and im not sure how much
you know how to increase it by. i had some ok
solutions:The hashmap implementation that comes with Java grows automatically, you don't need to worry about it. It may not "sound" efficient to have to copy the entire datastructure, the copy happens quickly, and occurs relatively infrequently compared with the number of words you'll be inserting.
1. read the file thru once, and get the number of
words in the file, set the hashtable capacity to that
number
2. do #1, but run anotehr alogrithm that will figure
out distinct # of words
3. separate chaining
===
anyhow what kind of answeres/algorithms would you
guys have come up with? thanks in advance.I would do anything to avoid making two passes over the data. Assuming you're reading it from disk, most of the time will be spent reading from disk, not inserting to the hashtable. If you really want to size the hashtable a priori, you can make it so its big enough to hold all the words in the english language, which IIRC is about 20,000.
And relax, you had the right answer. I used to work in this field and this is exactly how we implemented our frequency counter and it worked perfectly well. Don't let these interveiewers push you around, just tell them why you thought hashtable was the best choice; show off your analytical skills!

Similar Messages

  • What is the best way to assign time stamps to counter input data?

    Compact DAQ
    NI 9411
      Hi Everyone,
      I am creating a chart recorder for collecting various engine data.  I need to plot engine speed, crank angle, and various analog data on an XY graph.  I am using a counter and a mag pickup to continuously measure the frequency of the flywheel teeth.  From this data I create an array of timestamps based on accumulated periods for each frequency measurement.  My problem is, how do I determine the absolute timestamp for the first frequency of the buffered data?  I need to sync the frequency data with the analog data so I can plot it all vs time on an XY graph.  The flywheel has 201 teeth and the engine runs at 600 RPM.
      I am using a second mag pickup and counter to measure a single index pulse on the flywheel.  This will be used to determine the crank angle.  I also need to create a timestamp for each index pulse.  What is the best way to do this?
      Is it possible to treat the two mag pickups as an encoder and not use the B phase (direction) pulse? 
      I am fairly new to LabView and any help would be greatly appreciated.
      Thanks in advance,
      Kris

    Hi Kris,
    With analog and digital input tasks in DAQmx, it is possible to acquire data of the type 'waveform'.This includes timestamp information for the acquired data. This example is a good one to reference to understand how this is done: http://decibel.ni.com/content/docs/DOC-3749 . 
    With counter tasks in DAQmx, however, your best bet would be to use the 'Get Date/Time in Seconds' VI to obtain the absolute time. You can set this up so that the absolute time value is obtained right before the DAQmx read function is called, as shown below:
    Are you trying to use the 9411 to read from the mag pickup? What type of data does it output?
    For information on programming with NI-DAQmx, please refer to the following webpage: http://zone.ni.com/devzone/cda/tut/p/id/5438 . It is a very useful resource to get started on DAQmx applications! I hope this helps. 
    Vivek Nath
    National Instruments
    Applications Engineer
    Machine Vision

  • What is the best way to work with Word documents in The InDesign CS4???

    I work in Microsoft Word 2007 and all my documents have *.doc format.
    What is the best way to work with Word documents in InDesign CS4???
    David Blatner says to avoid copying and pasting text from Word instead of placing it (Ctrl+D).
    How about pasting RTF or Text Document???
    I want to make book's layout in ID CS4 and its main feature is that there is the left page with text and the right - with graphics.
    So, as I understand to place the text on each page I must create for example 70 Word documents and place each item on 70 left pages???
    It loks like wasting time. I sthere another way of making such layout???  What kind????

    It's best to place any text.
    You can have all of your text in one file and use auto-flow to add threaded text frames and pages as required (Hold down the Shift key when you click the loaded text cursor), but it's a little non-standard to have the thread only on one side of the spread from the auto-flow perspective, so you'll have to set up properly.
    This is one case where a master text frame will work to your advantage. On your master page spread, add a text frame to the left page, but not to the right (or at least not threaded to one on the right -- for some other project you might actually want two independent text threads). Hold the loaded cursor over a frame on the left side of a document page and auto-flow. ID will add new spreads as necessary, but only put the text on the left side.
    Peter

  • Best way to implement m to n relation?

    Could you please give me some advice on the best way to implement m-n relations in apex?
    Example: I want to store server and database info. A server can have multiple database of course. But in case of a RAC database, the database can be running on multiple servers. So I have tables:
    create table SERVERS (id number primary key, name varchar2(30));
    create table DATABASES(id number primary key, name varchar2(30))
    and an m-to-n table
    create table SERV_DB(serv_id number references SERVERS(id), db_id number references DATABASES(id), instance_name varchar2(30))
    So the table SERV_DB can tell me e.g. that database PROD is running on server 'prdsrv1' with instance PROD1 and on server 'prdsrv2' with instance name PROD2
    How would you design an apex page to maintain this information (adding relations, updating instance names, etc)? I have a solution using checkboxes and 2 for-loops over htmldb_application.g_f40 (to process checkboxes) and g_f41 (to process text fields with instance names) and some delete statements but the logic behind it looks too complex for me. I am convinced that this can be done more simpler. Seems like a common problem to me, so I wonder if there is no out-of-the-box solution in apex for this?
    Could you please show me or create a small demo application with the solution that looks most elegant to you?
    Thanks,
    Geert

    Thanks for your reply. You modified the question slightly, but conceptually it is still the same. What you call the instances table corresponds with my serv_db table. I understand the solution you propose using the tabular report. If I see it correctly, you would have an insert button above the tabular report for each new relation (instance - server) you would want to add. This is ok for the case i used (databases, servers, instances) because there are relatively few relations. However I would not like this solution for other cases. E.g.:
    case: you have a list of persons and a list of tasks. You want to assign tasks to persons in a way that each person has multiple tasks to do and each task can be assigned to multiple persons.
    Suppose you add a new person, and you want to assign 15 tasks to him. The solution above (with the tabular report) would be quite some work because you would have to click 15 times on the insert button and on each click select a task from the select list. In this case it would be more appropriate to, after selecting the new person, see a list of all tasks (e.g. 30 tasks) with a checkbox in front, so you can mark 15 out of the 30 checkboxes and press submit. It gets more complex when you want to assign also an attribute to the relation, i.e. showing the list with all tasks, a checkbox in front and a select list next to each task where you can choose from e.g. "priority high" or "priority low" to indicate that this task is high or low priority for that person. Is there an easy way to implement that?

  • Displaying Multiple Values on GUI components - best way to implement

    Hi,
    my program needs to implement a basic function that most commercial programs use very widely: If the program requires that a GUI component (say a JTextField) needs to display multiple values it either goes <blank> or say something more meaningfull like "multiple values". What is the best way of implementing it?
    In particular:
    My data is a class called "Student" that among other things has a field for the student name, like: protected String name; and the usual accessor methods (getName, setName) for it.
    Assuming that the above data (i.e. Student objects) is stored in a ListModel and the user can select multiple "Students", if a JTextField is required to display the user selection (blank for multiple selections, or the student "name" for a single selection), what is the best (OO) way of implementing it? Is there any design pattern (best practice) for this basic piece of functionality? A crude way is to have the JTextField check and compare all the time the user selections one by one, but I'm sure there must be a more OO/better approach.
    Any ideas much appreciated.
    Kyri.

    Ok, I will focus on building a solution on 12c.
    right now I have used a USER_DATASTORE with a procedure to glue all the field together in one document.
    This works fine for the search.
    I have created a dummy table on which the index is created and also has an extra field which contains the key related to all the tables.
    So, I have the following tables:
    dummy_search
    contracts
    contract_ref
    person_data
    nac_data
    and some other tables...
    the current design is:
    the index is on dummy_search.
    When we update contracts table a trigger will update dummy_search.
    same configuration for the other tables.
    Now we see locking issues when having a lot of updates on these tables as the same time.
    What is you advice for this situation?
    Thanks,
    Edward

  • Best way to import Microsoft Word text?

    Newbie question. I've been importing Microsoft Word text just using the clipboard. That is, I'll open Microsoft Word, open my document, select all, copy, and then go to InDesign and paste. InDesign gives me a text block with my text and I just re-size the columns. The text comes in all weird and I have to stylize it, of course. Sometimes I get some junk like nonbreaking characters and other hidden characters which screw me up until I figure out what's going on. My question is: what's the best way to import Microsoft Word text that will introduce the fewest errors? Thanks.

    JoJo Jenkins wrote:
    Whooops...posted in the wrong forum...sorry!
    Don't worry, it happens all the time. 

  • What is the best way to implement a cluster-wide object ID generator?

    What is the best way to implement a cluster-wide object ID generator?

    What is the best way to implement a cluster-wide
    object ID generator?Always use 3 because it is prime.
    Alternatively more information about the system and the needs of the system might prompt alternative ideas some of which are likely to be better than others for your particular implementation and system constraints.

  • Best way to implement a shared Blocking Queue?

    What's the best way to implement a shared Blocking Queue that multiple JVMs can enqueue objects in and multiple JVM's can dequeue from simultaneously?
    Also, I see references on the web to com.tangosol.coherence.component.util.queue.ConcurrentQueue but I don't see it in the current API docs...
    Thanks

    Hi snidely_whiplash,
    snidely_whiplash wrote:
    What's the best way to implement a shared Blocking Queue that multiple JVMs can enqueue objects in and multiple JVM's can dequeue from simultaneously?
    Also, I see references on the web to com.tangosol.coherence.component.util.queue.ConcurrentQueue but I don't see it in the current API docs...
    ThanksThat class is an internal class, AFAIK.
    As for implementing a queue, you might want to look at Ashish Srivastava's ezMQ component for some ideas:
    http://ezsaid.blogspot.com/2009/01/implementing-jms-queue-on-top-of-oracle.html
    Best regards,
    Robert

  • Best way to implement app wide process

    I am working on an application that has a list of art work in an sql report.. I want to be able to add each piece to a collection of items (which is stored in a table), so I added another column to the sql report to store the id of the art work. I am just wondering what is the best way to implement the process to insert the item to the collection of items. Since I want to do this from multiple pages - link from art work sql report; link from the edit page for a particular art work; and on the artists page where there is also an sql report, I thought it may be best to have an application process, to save duplicating the process on multiple pages.. however I don't think there's a way to pass the id of the item to the application process? as I was thinking best to have the application process conditional based on request, and then have the link as a doSubmit().
    So is it best just to have the link redirect to the same page, passing the id of the item, and then having a process that runs before header that inserts the item, then assigning the item to NULL?
    Thanks in advance,
    Trent

    Daniel:
    Thanks for the info. I have found that the IFRAME works nicely for some applications,
    however, some applications I want to re-write the front end to look better in
    the portal rather than screen scrape.
    Thanks for the help
    Ryan
    "Daniel Selman" <[email protected]> wrote:
    Ryan,
    Check out the thread (a few days old) titled "Different ways of creating
    portlets".
    Sincerely,
    Daniel Selman
    "Ryan Richards" <[email protected]> wrote in message
    news:3d2eda12$[email protected]..
    I have a few existing applications that I need to port over to portletsin
    Weblogic
    Portal 4.0. One application is a servlet based web application witha few
    html front-end
    screens. I am trying to determine how to do this in the best way. Ihave
    noticed
    that portlets behave differently inside the portal than do stand-aloneweb
    apps.
    Any help would be appreciated.
    The other web application is the Microsoft Outlook Web Access. (Thisone
    is going
    to be difficult because it is actually an ASP app. I dont know if thisis
    possible
    without building some proxy code between bea and iis.)
    Thanks
    Ryan

  • Best way to implement this code in labview

    Hi
    What is the best way to implement this code in labview programming.
    I have an analogue input which triggers a boolean light when it reaches a certain voltage. but at the same time i would like it to enable two other outputs one for a set period of time and the other stay on until another statement becomes true.
    For example
    case 1:
    Set output high
    Delay(2000ms)
    Set output low
    Case 2:
    Set output high
    If statement 2 is true
    then set out put low
    if not then repeat until statement is true
    Thanks for your help

    Hi David,
    The code you posted will work, although note that the front panel becomes 'unresponsive' - as changes in the controls are only read once per iteration.  The wait function is an example of an execution timing VI, however if we want to do software timing (like a 2 hour wait) - we should use software timing VIs.
    Check out the following example (note we can stop execution during run-time):
    Regards,
    Peter D
    Attachments:
    SoftwareTiming.vi ‏26 KB

  • Best way to implement FREE purchase?

    Hi,
    I have purchases that are free for logged in users. What's the best way to implement a no-payment solution? Would having the payment fields hidden and COD auto-selected if the amount field is 0.00 a good idea? Is there a better way anyone has implemented this? And how about security flaws?

    Hi
    Here is an article about "no cost" orders
    COD payment method will not work for zero value orders , you will need to use the Free payment method described
    in this article http://kb.worldsecuresystems.com/893/bc_893.html
    Hope this helps!

  • What is the best way to implement Carousel i.e. web part in a site page on office 365 site?

    We can implement the Carousel web part in many ways like content By Query Web Part , jquery (nivo) plugin or content search web part etc. But among these which one is the best way to implement to get best performance of the page?
    Thanks

    content search web part always provides best performance because it uses the search. only consideration you need to take is, it does not display the changes immediately and you need to wait for the incremental crawl to happen.
    My Blog- http://www.sharepoint-journey.com|
    If a post answers your question, please click Mark As Answer on that post and Vote as Helpful

  • Best way to implement service in 3-tier webarchitecture

    Hi,
    What would be the best way to implement a service in the following 3-tier TopLink architecture: no ejb and a webclient (jsp/servlets)?
    Currently I have a server session type and a service pojo using a singleton SessionFactory. The service pojo is used by the jsp/servlets.
    Should the service pojo be a singleton? Should a SessionManager be used?
    In the examples on OTN a clientsession is directly acquired in the jsp/servlets.
    Thanks,
    Ronald

    Ronald,
    There are numerous ways to do this. I would recommend using the SessionFactory. The SessionFactory makes use of the SessionManager, which holds onto the singleton Server session. You can then acquire client sessions using the oracle.toplink.sessions.Session interface from the SessionFactory. There is no need to hold the SessionFactory in a singleton yourself and you do not need to reference the ClientSession directly either.
    We have introduced this latest approach to simplify coding. It will work in both the web architecture with JSP/Servlets accessing the sessions as well as within an EJB tier (Session/Message Driven beans).
    Doug

  • What is the best way to implement writable many-to-many relations

    As everyone knows, the many-to-many associations provided by BC4J are quite good to read many-to-many associations, but they are not able to write such associations.
    There are other solutions using composite associations or cascading-delete foreign keys, but as I was tought now, they are also quite problematic. See thread Internal error: Entity.afterRollback.status_dead   -- What does this mean? for more details.
    So I ask you: What is the best way to implement writable many to many associations? Do you really have to manage them "by hand"?
    Thanks for your ideas
    Frank

    I'd appreciate any hint
    Thanks

  • Event Structures​-best way to implement this UI?

    I am trying to write a VI to control & read data from 4 different "channels" (each measuring a DUT) at once.  I have written all the VI's for initializing instruments, communicating with the devices (VISA, GPIB), setting bias, reading data, etc...that has all completed. I just need to write the overall program with the user interface to allow the user to control these 4 channels & display the measured data.....as it turns out, this is the tricky part! My head is spinning from trying to figure out how to handle all the possible events.
    Basically for each channel, I want the user to be able to
    -enable/disable it  for measurement (e.g. if  there is no device loaded in Ch.3, we don't want to measure Ch.3..maybe disable/grey everything)
    -set bias conditions (only if channel enabled). Allow user to change bias "in real-time" by increment/decrementing (e.g. incrementing from 5.00 V to 5.01 V, for example).
    -turn biasing on/off (again, only if channel is enabled)
    Also,  I want each channel to display its measured data (e.g current, temperature reading)..every second or so. No graphs or anything fancy (for now! ), just numeric indicator. 
    Honestly, this all sounds so simple but I'm having trouble figuring out the best way to implement this, due to the fact that 1) there are multiple channels needing to be monitored for events  2) large number of user events that could occur (seems like at least 4 per channel - enabling/disabling, turning bias on/off, incrementing/decrementing bias values, etc ), Also the if a channel IS enabled, i want to be continously reading/displaying the data.  What is the best way to handle this? Should i have 4 separate while loops, each with an event structure to handle events for that particular channel..or will that give me grief somewhre? 
    Also, I have another nagging question. Pretty much all the examples I see re: Event Structures and booleans involve latched booleans, eg. buttons that are just pressed once and pop back up...e.g. buttons you press to tell it to complete a task (e.g. "Acquire Data" or "Stop") , and once it's pressed it's over and reset.  In my case, some of the booleans would not be latched...e.g. the "Enable Ch.2" button would be 'TRUE" as long as i want Ch. 2 to be read....does that make sense? Then, say hours later,  if i did want to disable that channel,  i would change it to "FALSE" and while that would be an "value change", the new value would not be "TRUE"..does that make sense? So  not sure if that would be dealt with the same way in an Event Structure. 
    Hope this all makes sense and many thanks in advance for any help!!!

    You're halfway there. I'd say the best solution is a producer/consumer structure, the event structure is used to generate queued commands to the consumer loop.
    All data is handled in the consumer loop, where you among other things have an array of clusters of channel/instrument settings. (I usually have several cluster, one for test data, one for instrument settings, one for general settings and so on)
    The event structure can have a 1 sec timeout, in which you queue up a Measure command.
    In the consumer, in the measure state you loop through your instruments and if enabled you measure them and update their indicators.
    The general (smart) way to setup the queue is with a cluster containing 2 elements, a typedef'd Command and a variant.
    This way you can send data with the command in any form which is then interpreted in the consumer.
    If, e.g. you press the Enable button on a channel, you can enqueue Enable and the channel number.
    /Y
    LabVIEW 8.2 - 2014
    "Only dead fish swim downstream" - "My life for Kudos!" - "Dumb people repeat old mistakes - smart ones create new ones."
    G# - Free award winning reference based OOP for LV

Maybe you are looking for