Thesaurus

Dear All,
I have a program that do the following
1. User enters key word(s)
2. The program surfs the website (s) to bring documents that are more relevant.
3.Another program will extract all the important words from those documents. Calculate the term frequency and also the document frequency.
I need to do the following
From those extracted words I need to create a thesaurus table in the following form
Index : Term
0 : Computer
0 : Computers
1 : Data
2 : Database
2 : Databases
Note: Similar Terms should have same index.
Can I use the term frequency/ document frequency to come up with the index value??
Looking for your smart ideas!!!
Cheers,
--Vj

here's a lightweight class to give you a basic thesaurus...
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;
public class Thesaurus {
  private Map indexWords = new HashMap();
  private Map wordIndexes = new HashMap();
  public void addWord(int index, String word) {
    // Make sure word is not already in Thesaurus
    if(!wordIndexes.containsKey(word)) {
      Integer indexObject = new Integer(index);
      // Place word in word list
      wordIndexes.put(word,indexObject);
      // maintain index of matching words
      List words;
      if(indexWords.containsKey(indexObject)) {
        words = (List) indexWords.get(indexObject);
      else {
        words = new ArrayList();
        indexWords.put(indexObject,words);
      words.add(word);
  public List getWordMatches(String word) {
    // Check if word is in Thesaurus
    if(wordIndexes.containsKey(word)) {
      Integer indexObject = (Integer) wordIndexes.get(word);
      List words = (List) indexWords.get(indexObject);
      List result = new ArrayList(words);
      result.remove(word);
      return result;
    else return new ArrayList();
  public static void main(String[] args) {
    Thesaurus t = new Thesaurus();
    t.addWord(0,"Computer");
    t.addWord(0,"Computers");
    t.addWord(1,"Data");
    t.addWord(2,"Database");
    t.addWord(2,"Databases");
    System.out.println(t.indexWords);
    System.out.println(t.wordIndexes);
    List list = t.getWordMatches("Computer");
    System.out.println(list);
}Hope that helps...

Similar Messages

  • Oracle Thesaurus and its usage

    Hi,
    I am new to using Oracle Text index and usage of Oracle thesaurus. My question may be too naive.
    Can someone please help with letting me know how to create the thesaurus,add phrases to it and linking it to the created Oracle text index. How can this created dictionary be used in the query?
    Thanks,
    VijayK

    Hi,
    Consider I create a thesaurus with the following stmts.,
    BEGIN
    ctx_thes.create_thesaurus('test_col_1', FALSE);
    END;
    And then create phrases using
    BEGIN
         CTX_THES.CREATE_RELATION('test_col_1','corp','SYN','Corporation');
    EXCEPTION
         WHEN OTHERS THEN
              DBMS_OUTPUT.PUT_LINE('SQLERRM:'||SQLERRM||'~'||'SQLCODE:'||SQLCODE);
    END;
    And then use the following query,
    SELECT SCORE(1) score,id,col_1
    FROM tbl
    WHERE CONTAINS(col_1, 'SYN(corp)', 1) > 0
    ORDER BY SCORE(1) DESC;
    Though the column col_1 has values "Corporation", when I use the above query after creating the relation, still it returns me no rows. Am I missing anything between creation of the relation and the select query to "link" the thesaurus with the index ?
    Or should I change my query?
    Thanks,
    VijayK

  • Writing Tools Dictionary Thesaurus Doesn't Work or Pop Up

    This has been driving me crazy for months. If I want to look up a word or get thesaurs recommendations, I right click a Word in Pages and the dictionary pop up is revealed. Except, it doesn't. At least not all the time.
    I can't put my finger on when or why, but it's quirky and annoying. The beauty of the popup contextual menu is to quickly get work done. Sometimes I want to get a definition or perhaps choose a word from the thesaurus. But I can't
    Even choosing the selection from the drop down menu does nothing. The computer just sits there.
    I could log on tomorrow and perhaps start a new document and it will. This is so darn simple it should work all the time.
    Anyone else have this issue? What's the fix?

    Dictionary is an application which you can find in the Application folder.
    If you can't make it work the way I have described it you are doing something wrong.

  • How do I create an accessible PDF for Thesaurus with many chapters, from InDesign CS 5.5 and Acrobat

    Hi folks,
    I have redesigned a Thesaurus (controlled vocabulary for an Agency's archives) in InDesign CS 5.5. I am now preparing an accessible PDF from the many files (using a Book created in InDesign). The front cover, front matter and back cover are not part of the Book, to keep the page numbering simple.
    The Book includes two main sections, an Alphabetical display and a Hierarchical display of terms and their relations. I created chapters per alphabet listings, i.e. Alphabetical Display A, B, C, etc. So there are over 50 chapters, including cover, front matter, etc.
    I've successfully made the front cover and front matter PDFs after viewing videos here: http://tv.adobe.com/watch/accessibility-adobe/preparing-indesign-files-for-accessibility/ and downloading and using this recommended Action for Acrobat: InDesign CS5_5 Accessibility Touchup.sequ
    Several questions specific to this project don't seem to be addressed in the videos, however.
    First, I'd like to know if I can create an accessible PDF using the Book function > Export Book to PDF. Or do I need to make a PDF per chapter? The book has over 50 chapters (by alphabet, twice), so creating them one by one will take a lot more time, but I'll do it if that's the best practice.
    After creating the PDFs, if I use (in Acrobat): Create > Combine Files into PDF to make one full PDF (over 600 pages BTW), will the final PDF retain accessibility settings? Do I need to run the Accessibility Report again for the combined PDF?
    I used InTools.com Power Headers plugin to add a page header that automatically shows the new first term used per page. So, one chapter (with Chapter Title as H1) will have a different page header (which will be H2) per page, however the text flows through the whole chapter. I don't see where to add the page headers to the Article Window in InDesign. Do I add in this order: H1, H2, text (for whole chapter), H2, H2, H2, etc. Will I need to work on the PDF in Acrobat, where pages will be shown, in order to get the correct H2 with the correct text on the page? Am I missing something?
    Will I have any issues with Bookmarks that requires a specific workflow?
    I think that's about it, though I might run into more questions as I progress through the project.
    Thanks, Marilyn

    I understand why you need updated running headers in your book. To a sighted reader these serve as a guide to where you are and help you find things quickly.  In addition, if you are exporting your data to XML or HTML from the tagged PDF it would also be important to have these in the proper location. 
    But for accessibility purposes, it doesn't have to be there because the screen reader reads everything in linear order, line by line.  No one is looking at the page.  A user listening to the screen reader read the page is going to hear this heading, just before the actual word itself. So they will hear the first word on the page twice.  It's not the end of the world if it's there, but such headings are not necessary for accessibility unless they are not repetitive and contain information that is not otherwise available.
    So I would say, fine if you need them or want them there, it's just one word. 
    I think you should try exporting your book to PDF (or even just a chapter of the book) and look at the tags panel in Acrobat to see if you are getting the result you want.  I can't tell you exactly what you should do to get those results, you are using a plug-in I don't have. 
    I can tell you I didn't have to add the headers to any article at all, they just automatically export if the other articles in the file are added and you don't select the header style option "not for export as XML."
    You may not experience the same results with your plug-in, but I think it will probably work the same way. 
    Give it a try and best of luck.

  • How can I make the thesaurus work for spanish words?

    I have been using pages for about 4 months now, and usually I use it in english. I am currently writing a document in spanish but I can't manage to make the thesaurus, dictionary and wikipedia work for spanish words. I really need this to find synonyms.
    Any idea what do I need to do? I don't want to change the whole pages into spanish, nor I want the spanish spell check (I know how to do that), I only need the thesaurus.

    Dear me, doesn't Español work?

  • What's the best thesaurus app for iPhone 4S?

    I'm looking for the best thesaurus app for my new iPhone 4S.
    The best I've seen listed to so far is the Oxford English version for $24.99 Canadian. All the rest are Roget II or those A-Z thesauri which I've never liked.
    I prefer the Roger's International Thesaurus (remember in school that yellow paperback thesaurus with the kangaroo on the front?) which has all the words divided by subject in the front, and all the words listed alphabetically in the back, subdivided by meaning, and have a number listed as to where it is in the front (ie GREEN noun greenness 44.1 verdigris 44.3 lawn 310.7 adjective inexperienced 414.17 new 841.7 etc etc etc).
    I would like to find an app version of that (320,000 words and phrases in 1072 categories according to the back of the book) or the closest version.
    Is there such a thing or is the Oxford English version the closest I will come to it? I consider this one of the most important apps I will put on my phone. Even more important than Facebook. haha

    depends
    some people dock to
    sync with their computer
    some to charge
    some to use audio out
    the answer to your question really comes down to what you are trying to do

  • Unable to use the thesaurus in a relaxation template

    I am trying to get a query relaxation template to use the thesaurus but I can't get the syntax correct. Is it possible? If so, please can someone tell me where I'm going wrong?
    create table test_table(company_name varchar2(100));
    insert into test_table values ('Test Limited');
    insert into test_table values ('Test Ltd');
    create index idx_test on test_table(company_name) indextype is ctxsys.context;
    If my query looks like this:
    select company_name, score(1)
    from test_table
    where CONTAINS (company_NAME,
    '<query>
    <textquery lang="ENGLISH" grammar="CONTEXT">test ltd
    <progression>
    <seq><rewrite>transform((TOKENS, “{”, “}”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “!”, “%”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “${”, “}”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “SYN(”, “,legal_form)”, “ ”))</rewrite></seq>
    </progression>
    </textquery>
    <score datatype="INTEGER" algorithm="COUNT"/>
    </query>',1)>0;
    I get the matching record back
    COMPANY_NAME SCORE(1)
    Test Ltd 75
    But if I move the SYN line to the top like this:
    select company_name, score(1)
    from test_table
    where CONTAINS (company_NAME,
    '<query>
    <textquery lang="ENGLISH" grammar="CONTEXT">test ltd
    <progression>
    <seq><rewrite>transform((TOKENS, “SYN(”, “,legal_form)”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “{”, “}”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “!”, “%”, “ ”))</rewrite></seq>
    <seq><rewrite>transform((TOKENS, “${”, “}”, “ ”))</rewrite></seq>
    </progression>
    </textquery>
    <score datatype="INTEGER" algorithm="COUNT"/>
    </query>',1)>0;
    I get an error which I think means that the XML line is not valid:
    ORA-29902:error in executing ODCIIndexStart() routine
    ORA-20000: Oracle Text error:
    DRG-50901: text query parser syntax error on line 1, column 35
    What is the correct format for the line that will apply the thesaurus synonym between Limited to LTD?

    There are a lot of things that work well individually, but not in combination with one another. It looks like something goes wrong when you try to combine transform with syn. One possible workaround is to use replace to do your own transformation. Please see the reproduction and solution below.
    SCOTT@10gXE> -- test environment:
    SCOTT@10gXE> create table test_table(company_name varchar2(100));
    Table created.
    SCOTT@10gXE> insert into test_table values ('Test Limited');
    1 row created.
    SCOTT@10gXE> insert into test_table values ('Test Ltd');
    1 row created.
    SCOTT@10gXE> create index idx_test on test_table(company_name) indextype is ctxsys.context;
    Index created.
    SCOTT@10gXE> EXEC CTX_THES.CREATE_THESAURUS ('legal_form')
    PL/SQL procedure successfully completed.
    SCOTT@10gXE> EXEC CTX_THES.CREATE_RELATION ('legal_form', 'Limited', 'SYN', 'Ltd')
    PL/SQL procedure successfully completed.
    SCOTT@10gXE> COLUMN company_name FORMAT A30
    SCOTT@10gXE> -- reproduction of problem:
    SCOTT@10gXE> select company_name, score(1)
      2  from test_table
      3  where CONTAINS (company_NAME,
      4  '<query>
      5  <textquery lang="ENGLISH" grammar="CONTEXT">test ltd
      6  <progression>
      7  <seq><rewrite>transform((TOKENS, “SYN(”, “,legal_form)”, “ ”))</rewrite></seq>
      8  <seq><rewrite>transform((TOKENS, “{”, “}”, “ ”))</rewrite></seq>
      9  <seq><rewrite>transform((TOKENS, “!”, “%”, “ ”))</rewrite></seq>
    10  <seq><rewrite>transform((TOKENS, “${”, “}”, “ ”))</rewrite></seq>
    11  </progression>
    12  </textquery>
    13  <score datatype="INTEGER" algorithm="COUNT"/>
    14  </query>',1)>0
    15  /
    select company_name, score(1)
    ERROR at line 1:
    ORA-29902: error in executing ODCIIndexStart() routine
    ORA-20000: Oracle Text error:
    DRG-50901: text query parser syntax error on line 1, column 7
    SCOTT@10gXE> -- possible workaround:
    SCOTT@10gXE> VARIABLE search_string VARCHAR2(30)
    SCOTT@10gXE> EXEC :search_string := 'test ltd'
    PL/SQL procedure successfully completed.
    SCOTT@10gXE> select company_name, score(1)
      2  from test_table
      3  where CONTAINS (company_NAME,
      4  '<query>
      5  <textquery lang="ENGLISH" grammar="CONTEXT">
      6  <progression>
      7  <seq>' || 'SYN(' || REPLACE(:search_string, ' ', ',legal_form) AND SYN(') || ',legal_form)' || '</seq>
      8  <seq>' || '{'    || REPLACE(:search_string, ' ', '} {')                 || '}'           || '</seq>
      9  <seq>' || '!'    || REPLACE(:search_string, ' ', '% !')                 || '%'           || '</seq>
    10  <seq>' || '${'   || REPLACE(:search_string, ' ', '} ${')                 || '}'           || '</seq>
    11  </progression>
    12  </textquery>
    13  <score datatype="INTEGER" algorithm="COUNT"/>
    14  </query>',1)>0
    15  /
    COMPANY_NAME                     SCORE(1)
    Test Limited                           75
    Test Ltd                               75
    SCOTT@10gXE>

  • Special Characters in Thesaurus

    I have a requirement to be able search on all variations of a name. I tried using the RT tag in a thesaurus which worked for names without special characters.
    The names I need to search on include quotes and dashes. I have not been able to come up with a right combination of escaping the search and populating the thesaurus.
    Example: 'AB-AC 'AA
    How would I escape this on the query?
    How to I enter this in the thesaurus? Do I include the special characters?
    Thanks

    Check this
    http://download-east.oracle.com/docs/cd/B10501_01/text.920/a96518/cqspcl.htm#1360

  • Microsoft Word 2008 - Tool icons such as Dictionary/Thesaurus

    Hi guys i have a small problem with this, the only way i can display these tools (temporarily) is by clicking a double right arrow on the screen, the kind which you have when you don't have enough space i think. There are gaps between centre, justify, font size etc so how would i go about inserting the dictionary + thesaurus icon in the toolbar. View - Toolbar doesn't seem to help! Incidentally i personally think the 2004 version was better. Thank you all.

    Hi guys i have a small problem with this, the only way i can display these tools (temporarily) is by clicking a double right arrow on the screen, the kind which you have when you don't have enough space i think. There are gaps between centre, justify, font size etc so how would i go about inserting the dictionary + thesaurus icon in the toolbar. View - Toolbar doesn't seem to help! Incidentally i personally think the 2004 version was better. Thank you all.

  • Thesaurus Enties are not working 3.1.2

    Hi,
    i am getting below error while promoting content:
    22.01.2014 12:51:04.920 *ERROR* [127.0.0.1 [1390423864892] POST /ifcr/system/endeca/mdexPublisher HTTP/1.1] com.endeca.ifcr.publish.impl.MdexPublishServlet Failed to publish for site Discover com.endeca.ifcr.publish.impl.PublishException: Expected node /sites/Discover/content to exist
        at com.endeca.ifcr.publish.impl.PublishService.getContentOfType(PublishService.java:413)
        at com.endeca.ifcr.publish.impl.PublishService.publishRulesFromRepo(PublishService.java:448)
        at com.endeca.ifcr.publish.impl.PublishService.publishAllRules(PublishService.java:326)
        at com.endeca.ifcr.publish.impl.MdexPublishServlet.doPost(MdexPublishServlet.java:96)
    1. Added multiway thesaurus entry in workbench
    2. run the promote-content.sh
    3. verified the content thru http://localhost:8006/ifcr/system/endeca/promotionStatus?site=Discover
    4. Got error: {"ERROR":"The site \"Discover\" does not exist or has no connected clients"}
    5. Verified reference app not getting expected results (sd card = memory card or sd= memory)
    Please help me to resolve this issue....:)

    There are some under the hood metadata changes that aren't backward compatible (i.e. they don't exist or use a different format), so you may need to resync the catalog with the files

  • Thesaurus in advanced search

    We are writing a custom search using the advanced search but want to use the thesaurus with it. I know it only works with the banner search but is there a way to include it in advanced search......
    Ray

    Yes. Add the URL parameter in_hi_req_thesaurus=1to the end of each advanced search URL that should use the thesaurus (you should be able to try this in your browser to verify that it works). Or, if you want every advanced search query to use the thesaurus automatically, write an advanced search PEI that sets the value of PT_SEARCHSETTING_THESAURUS to Boolean.TRUE.
    Craig

  • How to use multiples thesaurus in a aplitation

    queremos trabajar con dos thesaurus, uno en espaqol y otro en valenciano, realizando busquedas en los dos mediante una aplicacisn desarrollada en developer y oracle 8.1.7
    ?Como se debe de configurar el parametro nls_lang?

    Please post in english so other readers can learn. The NLS_LANG parameter has three components: language, territory, and charset. In english (and from UNIX) will be:
    $ sentenv NLS_LANG American_America.WE8ISO8859P1
    Please refer to the Oracle National Language Support Guide book where you find the setting s for spanish.
    null

  • Thesaurus w/ terms in table and selecting hierarchically

    Hello all,
    I am developing a web application where users must be able to search within a thesaurus and select from a list of thesaurus terms. I tried to set up a prototype similar to the one found in:
    Oracle Text - Knowlegde base - Use of ABOUT
    1. I use a hierarchical query to select the terms I want. Using LIKE I can also obtain the terms which match the search criteria only partially. But there is a problem: The hierarchical query which uses the CONNECT BY and START WITH clause returns the same terms more than just once. Should I resolve this using DISTINCT or is there a more elegant way?
    2. Another question: Is this structure in the above thread suitable for using multiple relationships (NT, BT, SYN, etc.)?
    Thanks in advance,
    Martin

    Hi,
    We have encountered these sorts of issues too.
    200000 records isn't a lot; collecting stats should be possible.
    Which version of Oracle are you on?
    I'm going to investigate using dynamic sampling; the default sampling level in 10gR2 is 2, but I'm looking at using one of the higher levels (= sample more blocks, as I understand it).
    Cheers,
    Colin

  • Can't Download English Dictionary and Thesaurus After Payment

    I paid for this application, English Dictionary and Thesaurus, and $4.50 was reducted from my credit card. Immdiately after payment,application world dispalyed a message that it was unable to authenticte my credit. However, I was amazed when I received alert from my bank confirming payment of $4.50 to APP WORLD ORDERFIND.COM, and I logged into my internet bank account and confirmed that.
    I have been trying to download the application all to no avail. Please I want the support team to help me sort this out so that I can download the application or my money be refunded to me.
    I have document to back up my claim.
    Best Regards
    Utibe
    EDIT: Removed personal information to comply with Community Guidelines and Terms and Conditions of Use.

    See
    After Effects CS6 and Premiere Pro CS6 not available for installation for 32-bit systems
    http://forums.adobe.com/thread/1002746
    http://www.adobe.com/au/products/creativesuite/faq.html#64bit-support

  • How to list phrases defined in thesaurus

    Hi !
    I'm currently writing a front application (Windows), so that the
    user is able to maintain a thesaurus. Up to now I only give the
    user the option to create BT/NT, SYN and TR phrases.
    I want to display all currently defined phrases in the
    thesaurus, so that no longer needed or wrong phrases can be
    deleted again. There is no predefined view to display this
    information.
    I could use CTXSYS.DR$THS*-tables and CTX_THES.HasRelation to
    get this working. I'm just asking if there is a better way.
    Thanks for any hints,
    Stefan.

    I found the answer myself. I'm using a select on
    CTX_USER_THES_PHRASES and CTX_THES.HAS_RELATION() (this function
    is not documented) and BT, NT, TR and SYN() functions.
    I don't know how to create a view that would have the columns
    PHRASE, RELATION, TERM.
    Stefan.
    P.S.: You could also use interMedia Text Manager which can be
    found in the group "Extended Administration". But this requires
    a full client installation on all clients. This is a nono in my
    case.

  • Endeca Thesaurus entries not working

    I am using IAP Workbench 2.1.1 in Unix environment. The thesaurus entries went empty after baseline update. I tried the emgr_update commands to push back the backed up configurations into the workbench, but it didn't work. However I have replaced the THESAURUS.thesaurus.xml file with the backup copy I had, inside "/apps/endeca/Workbench/workspace/state/emanager/instances/Prodo/resources/" location. All the existing one-way and two-way synonyms are now working properly. But no new entries are working properly, when I am defining any new synonym from the IAP Workbench. Even if I am deleting any entry, it is not getting reflected. Please advise.

    The best way to verify that it's running is to go to the endeca-cmd directory and run the "endeca-cmd version" command. The endeca-cmd commands make calls to the control web service, so if the version command works, then everything is ok.
    By the way, you are using the correct syntax to get the control wsdl (assuming you are using the default 7770 port).

Maybe you are looking for