Remove words from file

Hello, I have created a class which is supposed to be an array of stopwords (words that I want to remove from a file).
Then I want to create another class which reads in a file. It reads the file. Then it looks through the array of stopwords. If it encounters a word in the file which is the same as the word in the array, it replaces it with nothing.
I want this class to be separate from the main class so that in the main class,I can specify any file, and the stopwords will be removed. Here is some code, thanks
import java.io.*;
import java.util.*;
public class StopWordArray {
     //This class represents an array of stopwords
     //The stopwords are read from a text file into an array
     private String [] tokens;
     //constructor
    public StopWordArray(){          
         try{
//         open input stream
         File inputFile = new File("/home/myrmen/workspace/Chisquare/extract/stopWordList.txt");
         InputStream inStream = new FileInputStream(inputFile);
         InputStreamReader inreader = new InputStreamReader(inStream, "8859_1");
         BufferedReader reader = new BufferedReader(inreader);
        String line;
        //Fills array
        while ((line = reader.readLine()) != null) {
        StringTokenizer st1 = new StringTokenizer(line);
        int numberOfElements = st1.countTokens();
         tokens = new String[ numberOfElements ];
        int index = 0;
         while( st1.hasMoreTokens() )
           String element = st1.nextToken();
         tokens[ index ] = element;
                   index ++;
        reader.close();
       catch (Exception ex) { ex.printStackTrace(); }
import java.io.*;
import java.util.StringTokenizer;
public class RemoveStop {
     //Fields
     private StopWordArray myArray = new StopWordArray();
    private String myString;
     //Constructor
    public RemoveStop(String myString){
import java.io.*;
public class Main {
     public static void main(String[] args) {
          StopWordArray theWords = new StopWordArray();
        theWords.printArray();
        String file = "/home/myrmen/workspace/Chisquare/extract/Turing";
        RemoveStop remove = new RemoveStop(file);
}

Ok, here is an update:
private String [] tokens;
   //Method which returns each element in the array as a string
    public String word()
    String word = null;
    for(int i = 0; i < tokens.length; i++)
    word = tokens.toString();     
return word;
public void readFile(){
String output = null;
StringTokenizer st1 = new StringTokenizer(aString);
while(st1.hasMoreTokens())
String element = st1.nextToken();     
//System.out.println(myArray.word());
if(element.equals(myArray.word()))
output = element.replaceAll(myArray.word(), "WAY");
//else{System.out.println("ERROR");
System.out.println(output);
But with the readFile() method, I only access the LAST element of the array. How can I iterate through the whole array, and search for the words that are stored in the array in the input file?

Similar Messages

  • [solved] Remove words from file using a list variable?????

    Is it possible with bash, to create a script that will scan a text file and remove words that are set in a list?
    I have a file with words like so:
    word1
    word2
    word3
    I would like to be able to set script to scan file and remove only the words (and spaces left) that are in a list,
    that is inside script. Is this possible???
    Last edited by orphius1970 (2010-04-22 12:05:45)

    sorry I am new to scripting and Have NEVER used sed.
    Can you explain a little better?
    Ok i tried it and understand a little better. However,
    It left space behind. How do I fix that?
    Also, how do I get it to save the file with the changes?
    Sorry If I seem so noobie
    Last edited by orphius1970 (2010-04-22 11:27:18)

  • Removing date from file name

    Hello,
    Is there any way to move all the file from one location to another and remove the datepart from filename.
    for eg:- if file name is abc_20150411.xls change to abc.xls. if file name does not contain date part then ignore it.
    Can any one suggest if its possible.
    Thank You

    Hi ,
    I have created package based on your question and decribed in detail in below post:-
    https://msbitips.wordpress.com/2015/04/12/ssis-removing-datestamp-from-file-name-when-moving-from-one-location-to-other/
    You need package code, just add comment in this thread.
    Thanks
    Prasad
    Mark this as Answer if it helps you to proceed on further.

  • How to remove DRM from files

    I thought iTunes Match was supposed to remove DRM from files

    royfrommanhattan wrote:
    I thought iTunes Match was supposed to remove DRM from files
    It doesn't actually  "remove" anything.  But it does allow you to delete the DRM versions and download non-DRM versions, if available.
    Go to this document, and read the section on "Can I upgrade my previously purchased music to iTunes Plus?":
    http://support.apple.com/kb/HT1711

  • Remove word from spell_checker

    The built in deault spell-checker (OS X 10.7.2) seems to think the word "Adrain" is correctly spelled.  How can I remove this word from it's dictionary?  Note this is NOT a question about personal dictionary additions or learned words.  I do not see an "Unlearn Spelling" option for this word.  Also, I am UNIX savy so directions that involve manually editing files would be fine.

    I am running 10.6.8 and this screen shot shows where my LocalDictionary is located.
    It is Homefolder\Library\Spelling\LocalDictionary.
    Back when I was running Tiger, that file was called something else, and I used TextEdit to change it.
    I do not know if SnowLeopard works same.
    I just tested it, and I'd say that it works same, that is I edited it with TextEdit.
    Message was edited by: db24401
    Message was edited by: db24401

  • Remove words from Swype?

    Hello,
    Somehow I managed to get some words in Swype and I want to remove them.
    How do I remove them from my phones list of words?
    Thanks,
    VeePee
    Nokia 3395
    Nokia 6600
    Nokia N95 8GB
    Nokia N8 (Anna)
    Nokia Lumia 930
    Solved!
    Go to Solution.

    Refer this ..
    ...and Some more tips here ..

  • Remove word from firefox dictionary

    In the dictionary I had installed, there are misspelled words. Originally, I thought I have added them accidentally, however, after going to my profile (%APPDATA%\Mozilla\Firefox\Profiles\as6zjsxj.default) and editing persdict.dat I found out that they come from the original dictionary, not my modifications.
    I just don't understand why there's no "Remove from dictionary" context menu item.

    You can use Add to Dictionary to add you own modifications to the spell checker.
    To modify entries in the built-in or in an installed dictionary you would have to edit the used dictionary file directly on disk, either in the dictionaries folder in the Firefox program folder or in the extensions folder in the Firefox profile folder.
    *http://kb.mozillazine.org/Dictionaries
    *http://kb.mozillazine.org/Spell_checking

  • [solved]remove backslashes from file names?

    Hi!
    I have a lot of files that contain backslashes in file name, spread over many many folders.
    How do I remove that character from all file names? The approaches / scripts / programs I usually work with can't handle the invalid file names (e.g. loose the '\' and then 'cant stat()' -.-").
    Thanks!
    edit: Argh... now I've got copies of half of the files without the "\" in the same folders, because copying did work on most but deleting only on some... also multiple \\\ because I messed up one sed line. and "echo" failed to warn me about the result, just dropping a few \ while mv didn't... or something. So confused.
    Last edited by whoops (2011-02-22 00:22:58)

    sisco311 wrote:#!/usr/bin/env bash
    while IFS= read -rd '' file; do
    dirname="${file%/*}"
    basename="${file##*/}"
    mv --backup=numbered -- "${file}" "${dirname}/${basename//\\/_}"
    done < <(find /full/path/to/dir -depth -name '*\\*' -print0)
    Thanks, that did it!
    Don't know what exactly I've been doing wrong, but it seems not to happen any more since I try to do stuff with "while, read, ${}"  instead of "for, $(), echo, sed"
    pulce wrote:Not sure that's the point, but backslash is an escape
    Yes, I know, that was the problem. That's why those stupid things always keep either disappearing or multiplying - I guess I just suck at keeping track of what program handles them which way and depending on what.
    Last edited by whoops (2011-02-22 00:15:30)

  • How to remove word from personal dictionary?

    I've searched online for this and it almost sounds like it can't be done in DW CS5, but that just seems crazy.  I added a word to my dictionary that can be spelled two ways and now, for consistency sake, want to remove one version.   Can anyone tell me if this is possible and how?
    Thank you.

    Thanks, Nancy, but that's for older versions.   According to Adobe, "Dreamweaver does not provide a way of deleting entries that have been added to personal dictionaries." 
    Maybe I'm just stubborn, but I found it hard to believe that if you accidently add a word you can never remove it.  What if you accidently clicked to add something that wasn't spelled correctly?  As I was replying to you, I poked around some more and found in C:\Users\my name\AppData\LocalLow\Adobe\Linguistics\Dictionaries\Adobe Custom Dictionary\eng (I'm running Windows 7 64 bit, so your location may vary). a file called added.clam. There is also an added.txt and exceptions.txt but there's nothing in them.
    I copied added.clam and renamed it to added.clam_Copy.txt.  Opened it up with Notepad and sure enough, my word is in there! But there are other markings in there that I don't know what they mean. I edited my word out and when I ran spell check in DW it stopped on a word that I had left in my personal dictionary.   I didn't have that many in so I went back and cleaned them all out and saved a blank added.clam.   Then when I went back to DW, ran it again, and added the words I want.
    There might be a cleaner way to fix the problem, but that worked for me.  It seems crazy that Adobe doesn't allow you to edit the dictionary but if someone using CS5 is really stuck, this will work!

  • Regex: Remove Date from File Names Sitewide

    I'm trying to avoid  hundreds of 301 redirects across my site. I have two kinds of links that need a regex search string which will simply remove numbers that precede text within html files.
    The existing .html files are named with text only (e.g., name.html), whereas the .jpg files have the date and text (e.g., 130415name.jpg). But the new html engine I am using creates files named like the .jpg files (e.g., 130415name.html).
    I can strip the numbers out of file names in Name Munger (Strip Characters > Characters: 0123456789).
    Now I need a regex search string that will strip the same characters within the html files.
    The changes would look like this:
    1.
    Before:
    <ul>
    <li class="previous"> <a class="paginationLinks detailText" href="../content/090914name.html">Previous</a> </li>
    <li class="index"> <a href="../index_4.html" class="detailLinks detailText">Index</a> </li>
    <li class="next"> <a class="paginationLinks detailText" href="../content/090125name.html">Next</a> </li>
    </ul>
    After:
    <ul>
    <li class="previous"> <a class="paginationLinks detailText" href="../content/name.html">Previous</a> </li>
    <li class="index"> <a href="../index_4.html" class="detailLinks detailText">Index</a> </li>
    <li class="next"> <a class="paginationLinks detailText" href="../content/name.html">Next</a> </li>
    </ul>
    2.
    Before:
    onclick="window.location.href='content/020922name.html'"
    After:
    onclick="window.location.href='content/name.html'"
    What is the regex search string for these examples (any numbers preceding any letters preceding .html)?

    Set the Search option to Source Code
    In the Find field:
    (content/)\d{6}(\w+\.html)
    In the Replace field:
    $1$2
    Select the "Use regular expression" checkbox.

  • HT2496 How do I remove words from my personal dictionary?

    I added a mispelled word to my Dictionary using the "Add to Dictionary" menu option. I need to remove it now and can't figure out how to.

    Stored in /Users/"username"/Library/Spelling/LocalDictionary
    Open it in TextEdit and delete the erroneous entry. Getting to the stupidly hidden Library folder is left as an exercise. I've permanently unhid it using this Terminal command:
    chflags nohidden ~/Library
    Might take a restart to take affect.

  • Delimited_hdr-=no in oracle 9i causes remove  header from file

    i am converting my output of reports into excel file i am using desformat=delimiter and delimited_hdr=no if i use this in my command line i got output without header and if i do delimited_hdr=yes then repetitive header displayed. how to overcome that problem i am using oracle 9i

    Hello All,
    I tried the resize procedure but the corruption doesn't cleared.
    SQL> ALTER DATABASE DATAFILE 'E:\SC\SC12.1\DATABASES\ORACLECONFIG\SYS1NM45.ORA' RESIZE 246 M;
    Database altered.
    SQL> ALTER DATABASE DATAFILE 'E:\SC\SC12.1\DATABASES\ORACLECONFIG\SYS1NM45.ORA' RESIZE 245 M;
    Database altered.
    SQL> ALTER DATABASE DATAFILE 'E:\SC\SC12.1\DATABASES\ORACLECONFIG\SYS1NM45.ORA' RESIZE 244 M;
    ALTER DATABASE DATAFILE 'E:\SC\SC12.1\DATABASES\ORACLECONFIG\SYS1NM45.ORA' RESIZE 244 M
    ERROR at line 1: ORA-03297: file contains used data beyond requested RESIZE value
    SQL> ALTER DATABASE DATAFILE 'E:\SC\SC12.1\DATABAES\ORACLECONFIG\SYS1NM45.ORA' RESIZE 400 M;
    Database altered.
    Can you please let me know how to recover corrupted Block or how to recover the corrupted file only.
    Thanks
    With Regards
    Hemant Joshi.

  • [SOLVED] Removing brackets from file - python/sed

    I wrote a small python-script in order to do some manipulation of columns. I'm quite unexperienced, and i stumbled upon a problem i didn't manage to solve:
    When the script is finished with the data manipulation, everything is stored as data type "list", which implies the following form when i output the data with print: ['2321321', '321321321', '55555']. I don't want the brackets and ' (since i want to import the data into a spreadsheet). I've tried to make a loop to print out list content, but then i get everything in one column, which i don't want. (Something like: for l in line: print l )
    Is there a way to print lists without the brackets and '? I tried to solve it with sed, but my knowledge of that program is zero, and no success. My solution was to open the output in a text editor and use search and replace, which is very tedious work for 50 files.

    I think the quicker way is using the join() method:
    [andyr@roo ~]$ python
    Python 2.4.1 (#1, Apr 5 2005, 11:00:51)
    [GCC 3.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> items = ["one", "two", "three"]
    >>> allitems = " ".join(items)
    >>> print allitems
    one two three
    Of course, you could have "," or "t" (tabs), etc to delimit the columns rather than the space in the above example.

  • THE MENU BAR CHANGED FROM THE WORD FIREFOX TO THE TRADITIONAL LIST OF WORDS FROM FILE TO HELP. CAN THIS BE CORRECTED?

    EVERYTHING ABOVE THE TABS DISAPPEARED AND WHEN i CLICKED ON MENU BAR i GOT THE OLDEERR STYLE VERSION 29.0.1

    # Right-click an empty area of the tab bar and choose Customize.
    # If you want to get rid of the menu bar (File…Help), click the Show / Hide Toolbars button and uncheck "Menu Bar".
    # If you want to display the title bar that shows the web page title followed by the application title, click the Title Bar button in the lower left.
    # Click Exit Customize when done.
    There is no Firefox button in Firefox 29. Use the ≡ Menu Button on the right side of the window.
    * [[Learn more about the design of the new Firefox]]

  • Seachring word from text file

    Hi...There..
    I h'va wrirtten Search application which search words from Simple text files.
    My file contains list of words separated by "\n"(new line).
    i am using java.io.BufferedReader for reading file.
    i'want to search word from file within few milliseconds, but when my file containo more then some 2lake words(200000) my process of readind comsumes more then 5 sec. time to search.
    pl. suggest me effective method to search word from file so i can make it rapid search.
    Actually i 've to provide search on "TEXT VALUE CHANGED EVENT" so even if my process takes more then one seconds it is not physible for me.
    Thanks in Advance.
    Timir Patel.

    Try this:
    import java.io.*;
    import java.util.*;
    public class searcher
              private static long [] indexes;
         private static class temp_data
              public final String text;
              public final long starts_at;
              public temp_data(String t, long l)
                   text = t;
                   starts_at = l;
         private static class temp_cmp implements Comparator
              public int compare(Object o1,Object o2)
                   return ((temp_data)o1).text.compareTo(
                             ((temp_data)o2).text);
         /** creats index table. You should do it once, and rather store index
         table in file then. This method has high peak memory usage but it is
           easy to optimize it.*/
         private static void buildIndex(RandomAccessFile file)throws Exception
              List temp = new LinkedList();
              String st;
              long p = file.getFilePointer();
              while((st = file.readLine())!=null)
                   temp.add(
                        new temp_data(st,p)
                   p = file.getFilePointer();
              Collections.sort(temp,new temp_cmp());
              indexes = new long[temp.size()];
              int i=0;
              for(Iterator I=temp.iterator();I.hasNext();i++)
              temp_data tt = ((temp_data)I.next());
               System.out.println("indexing :"+tt.text+" ["+tt.starts_at+"]");
                   indexes=tt.starts_at;
         /** returns position at which text starts or -1 if not found */
         public static long find(String text,RandomAccessFile file)throws Exception
              int ncp = indexes.length/2;
              int n = 2;
              int cp;
              do{
              cp = ncp;
                   file.seek(indexes[cp]);
                   String tt = file.readLine();
              System.out.println("comparing with "+tt);
                   int cmpr = text.compareTo(tt);
                   if (cmpr==0)
                        return indexes[cp];
                   else
                   if (cmpr>0)
                        ncp = cp+(indexes.length / (1<<n));
                   else
                        ncp = cp-(indexes.length / (1<<n));
                   n++;
              }while(ncp!=cp);
              return -1;
         public static void main(String args [] )throws Exception
              RandomAccessFile f = new RandomAccessFile(args[0],"r");
              buildIndex(f);
              for(int i=1;i<args.length;i++)
              System.out.println("searching for \""+args[i]+"\"");
              System.out.println("found at:"+find(args[i],f));
              f.close();
    It should work, however I gave it less than five minutes testing.

Maybe you are looking for