Clustering only a subset of documents

I am using unsupervised classification with CTX_CLS.CLUSTERING. Our document table has several million rows and the user might only want to cluster a few thousand. Because this method of clustering uses the text index on the documents table is it possible to only cluster those few thousand without copying the text off to another table and creating a separate text index for the clustering? When the user wants to cluster 10-20K of content this method is too resource and time consuming.
Here is the example given in the documentation
/* set the preference */
exec ctx_ddl.drop_preference('my_cluster');
exec ctx_ddl.create_preference('my_cluster','KMEAN_CLUSTERING');
exec ctx_ddl.set_attribute('my_cluster','CLUSTER_NUM','3');
/* do the clustering */
exec ctx_output.start_log('my_log');
exec ctx_cls.clustering('collectionx','id','restab','clusters','my_cluster');
exec ctx_output.end_log;
Thanks

Sorry, I missed this first time round.
There is no direct way to create a cluster on a subset of the table.
However, although you have to create an index to base the clustering on, you do NOT need to populate that index. As in the example at
http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#i1007174
create index collectionx on collection(text)
   indextype is ctxsys.context parameters('nopopulate');So this is effectively instant. But you still need to copy the subset of the data. Can we avoid that?
Yes, it seems we can. The CONTEXT index is really there just to tell clustering how to get and process the data. If we use a USER_DATASTORE procedure for the index, we can specify which rows should be processed and which should not. I've adapted the example from the doc to add a column to the table called USE_THIS_ROW. If set to 1, then the row is used. If set to 0, then it isn't. And you can see that the clustering results do not include any rows with USE_THIS_ROW = 0.
-- Clustering example from the docs, adapted to use a user_datastore to decide which rows to process
/* collect document into a table */
drop table collection;
create table collection (id number primary key, text varchar2(4000), use_this_row number);
insert into collection values (1, 'Oracle Text can index any document or textual content.', 1);
insert into collection values (2, 'Ultra Search uses a crawler to access documents.', 0);
insert into collection values (3, 'XML is a tag-based markup language.', 1);
insert into collection values (4, 'Oracle Database 11g XML DB treats XML as a native datatype in the database.', 1);
insert into collection values (5, 'There are three Text index types to cover all text search needs.', 0);
insert into collection values (6, 'Ultra Search also provides API for content management solutions.', 1);
create or replace procedure my_proc
     (rid in rowid, tlob in out nocopy clob) is
begin
     -- this "for loop" will only execute once but it's easier this way than declaring a
     -- separate cursor
     for c in ( select text, use_this_row from collection
                where rowid = rid ) loop
          if c.use_this_row = 1 then
                tlob := c.text;
          else
                tlob := '';
          end if;
     end loop;
end;
list
show errors
exec ctx_ddl.drop_preference('my_datastore')
exec ctx_ddl.create_preference('my_datastore', 'user_datastore')
exec ctx_ddl.set_attribute('my_datastore', 'procedure', 'my_proc')
create index collectionx on collection(text)
   indextype is ctxsys.context parameters('datastore my_datastore nopopulate');
drop table restab;
/* prepare result tables, if you omit this step, procedure will create table automatically */
create table restab (      
       docid NUMBER,
       clusterid NUMBER,
       score NUMBER);
drop table clusters;
create table clusters (
       clusterid NUMBER,
       descript varchar2(4000),
       label varchar2(200),
       sze   number,
       quality_score number,
       parent number);
/* set the preference */
exec ctx_ddl.drop_preference('my_cluster');
exec ctx_ddl.create_preference('my_cluster','KMEAN_CLUSTERING');
exec ctx_ddl.set_attribute('my_cluster','CLUSTER_NUM','3');
/* do the clustering */
exec ctx_output.start_log('my_log');
exec ctx_cls.clustering('collectionx','id','restab','clusters','my_cluster');
exec ctx_output.end_log;
select docid, clusterid, score from restab order by clusterid, docid;You could probably get the same results by using a FILTER column, with some rows set to IGNORE. I've not tried that.

Similar Messages

  • How can I view only a subset of my contacts in iOS?

    I have a large number of contacts.  I sync my contacts with iPad, iPhone, and Mac Contacts.  How can I group my contacts, then choose to view only one group?
    I see how to create groups in OSX, although it is a little tedious to begin with to create a field for grouping for each and every contact.  But I cannot see how to use these groups in iOS7, or maintain the grouping field, or view only a subset of all accounts.  My goal is to create a group of personal contacts, like favorites, then see only those contacts by default but have access to all contacts if I need to.

    How can I select only a portion of my screen to be recorded with QuickTime?  I've seen this as an option online but can't figure out how to make the program able to highlight a certain area of my screen to be recorded.
    If you are still using Snlow Leopard, then you can't. This functionality was added to QT 10.1 which runs under Lion.

  • Hello, I use photoshop cc 10 days and I did a lot of files with layers and channels. For two days in two different locations that only happens in some documents when you reopen the job done no more .. Example 6 channels on the facts I see only one .. Than

    Hello, I use photoshop cc 10 days and I did a lot of files with layers and channels.
    For two days in two different locations that only happens in some documents when you reopen the job done no more .. Example 6 channels on the facts I see only one ..
    Thank you for your attention.
    Annalisa 

    Don't understand what you writing here.  Screen shoots would be most helpful.
    Supply pertinent information for quicker answers
    The more information you supply about your situation, the better equipped other community members will be to answer. Consider including the following in your question:
    Adobe product and version number
    Operating system and version number
    The full text of any error message(s)
    What you were doing when the problem occurred
    Screenshots of the problem
    Computer hardware, such as CPU; GPU; amount of RAM; etc.

  • PO from PR only for a particular document type

    Dear Gurus
    I want purchase orders should be raised with reference to purchase requisitions only. System should allow to make purchase orders with reference to purchase requisitions only for a particular document type only.
    I have configured in OMET and assigend this in SU3, now the system is not allowing to create PO for all document types, but I want to restrict this for a particular document type only
    Is there any configu change required?
    Regards
    Muthukumar

    Hi
    You can make it in MM - purchasing -> purchase orders -> define document types -> check the document type -> click on link purchase requisitions.
    And in define screen layout at document level - make PR mandatory for your purchase order document type (field reference).
    Thanks
    Edited by: Praveen Raghavendra on Jan 12, 2009 2:07 PM

  • Output only the Interactive Reporting document in an IR job/output?

    Hello,
    I have created an Interactive Reporting job using the File->Import->File as Job. I have walked through the various screens to supply oce connection information, output directory etc. The output however, shows up as a Table of Contents - 1) Interactive Reporting Document 2) Interactive Reporting Document (Web Client) 3) A HTML file that contains log information.
    Is there a way to specify that only yhe Interactive Reporting Document be generated and placed in a target directory of choice?
    I don't want to produce an output everyday and have my users open up the Table of Content and have them pick the document. I just want the document output standalone.
    Thanks in advance.

    Wasn't there an Acrobat issue that affected this as well? Make sure that is updated, too.
    While I'm on it, this is being viewed in Acrobat/Reader, right? Anything else, especially Mac Preview is a bit of crapshoot.

  • Printer will only print the first document then will do nothing.

    I am running a dell system with windows 7 pro 64 bit.
    I just got an HP LaserJet Pro 400 m401n printer.
    I downloaded the most current drivers for windows and updated the printer from the HP website.
    The printer will let me print the first document after the computer is restarted. Sometimes I can print a second document if I print it back to back with little to no time between printing the first document. If more than a few minutes passes and I try to
    print again the program im using to print enters a "not responding" state and goes into recovery mode. This happens with notepad, excel, word, or even printing directly from the internet.
    I can open the printer properties and see the document just sitting with "spooling" out beside it. I assumed it was a printer spooling problem and tried going to the Services.msc and checking the printer spooler. Ive started it and stopped it several
    times. Ive also gone into %WINDIR%\system32\spool\printers and deleted all files in this folder. This does not help. It still gets hung at spooling.
    Next I just went to printer property and turned the spooling off completely, by having it send the document directly to the printer. Again the exact same problem but instead of getting hung in "spooling" it now just hangs "printing".
    I have also gone to the printer properties on the ports tab to see if disabling the bidirectional support was enabled as per a google search result showing a potential solution. But im connecting the printer directly to the computer via a usb port so that
    option is not even available.
    The only thing that allows me to print again is doing a restart on my computer, then again I can only print the first document sent. Restarting the printer itself does nothing.

    Hi James,
    Did this printer work correctly before ?
    To verify whether it is related to the printer ,please plug it in another machine to have a check if it is possible.
    Considering it is a USB device ,plug in  with another port to have a check or try to update the USB controller driver or reinstall the USB controller driver to have a check.
    Please check whether the Event Viewer contains  errors or warnings related to this issue .
    I found a similar symptom in the HP forum ,the solution in it may be helpful:
    HP LaserJet Pro 400 M401dne print only one page (document)
    (720 Views)
    http://h30499.www3.hp.com/t5/Printers-LaserJet/HP-LaserJet-Pro-400-M401dne-print-only-one-page-document/td-p/6456104#.VK5gLHkfrwo
    You may need to look for help from the HP forum at the same time .
    Best regards

  • Only reseting the cleared documents with exchange rate differences in FBRA

    Hello,
    we want only reset the cleared documents . We dont want to reverse the cleard documents. But When we try to do it we get a message Exchage difference are posted do you want to reverse it. Want we want to reset are in Doc.. currency. postings for August.
    As i know when we have exchange rate difference we  dont get an option only resettng but we get reseting and reversing.
    May i know how to slove this situation.
    Thank you
    Chaithra

    HiI
    f you set this selection field then the settings in the ERD Setting column for this currency type are no longer relevant. The system calculates and posts exchange rate differences between order-related goods receipts and invoices not just for materials with standard price, but also for:
    Materials with moving average price
    Account-assigned transactions
    Planned delivery costs
    The valuation of the inventories or consumption is effectively done at the exchange rate for the goods receipt, and not at the exchange rate of the invoice. When you post the good receipts, the fixed exchange rate from the purchase order is not used, but the translation of the purchase order values to local currency is always done at the posting date of the goods receipt.
    If you are using the material ledger with actual costing, the system does not include the posted exchange rate differences at period-end closing in the actual prices of the materials.
    Regards
    Antony

  • How to set only part of the document as "read-only"?

    Hi,
    I am using SharePoint 2010. I would like to set some of the columns as "read-only" while some of them are still editable. If I use Advanced Settigns>Item-Level Permissions, I can only set the whole document as "read-only".
    Is there a way I can do what I wannt?
    Thanks a lot!

    Hi,
     There is no OOTB way to set a column as read-only.  I think you want to make some of the fields as Read-Only in the EditForm.aspx. If so, then you can use the Javascript to do. 
    Add a Content editor webpart to the Editform.aspx page and place the javascript method to make a field read-only.
    You can use the _spBodyOnLoadFunctionNames.push("MakeReadOnly()") method to assocaite your method to the EditForm.
    Here are the links which will help you.
    http://dishasharepointworld.blogspot.in/2011/08/read-only-field-in-sharepoint.html
    http://nishantrana.wordpress.com/2009/01/30/read-only-field-in-sharepoint-editformaspx/
    http://www.jbmurphy.com/2010/06/01/how-to-make-a-field-in-a-sharepoint-edit-form-readonly/
    *******Don't forgot to MARK AS ANSWER / VOTE AS HELPFUL if it really helps************
    R.Mani http://rmanimaran.wordpress.com

  • Why will my Imac (1 yr old) only print the full document in half a page?

    Why will my IMAC only print the full document in half a page? All updated IOS downloads.  My Laptop Mac does not do this, so I feel it is not the printer, but the actual computer.  The full document is printed, it is just half a page, other half is blank.  This is presenting a huge problem for school and also when I need to print a shipping label ( which I do often) for my business.  I have several printers and it performs the mistake on all of them.  Thank you!

    Make sure the scale in the Print window is set to 100%

  • I upgraded to mavericks and I cannot make text or images bigger with the trackpad. It only zooms in the document. How to change that?

    I upgraded to mavericks and I cannot make text or images bigger with the trackpad. It only zooms in the document. How to change that?

    Hi Rohit,
    If you have questions about the Accessibility features of Mavericks, you may find the following article helpful:
    OS X Mavericks: Use accessibility features
    http://support.apple.com/kb/PH14322
    Regards,
    - Brenden

  • When i print from internet it only prints a blank document

    when i print from internet it only prints a blank document.....whats up with that?

    Neither of your posts answer the question What Browser are you using. Those are the OS version of the model of Mac you have.

  • Adobe Reader will not open documents. Will only open the same document regardless which document I select.

    Adobe Reader will not open documents. Will only open the same document regardless which document I select.

    Instead of writing the exact sentences twice, how about some details?

  • Level permission only for edit uploaded documents

    Hi guys,
    I have this requiriment, in my document set I need that spcecified users are allowed only to edit uploaded document (for example editing using word if is a docx type) but not other fields in SharePoint........ is it possible?
    thanks!

    This is not possible using OOTB functionality. If you have edit permission to a file/list item, you have access to modify all of the information.
    There are a few third party options that provide column/field-level permissioning in SharePoint. You could also program your own rules into an Event Receiver.
    Dimitri Ayrapetov (MCSE: SharePoint)

  • Dreamweaver CC crashes when I click on + sign in Transition panel. It only happens in a document cre

    Dreamweaver CC crashes when I click on + sign in Transition panel. It only happens in a document created from template. The panel works seamlessly otherwise in in both original (index) and template documents. If you had any idea why... thanks, yours Peter

    Certainly: Its Win7 and DW CC seems to be LS20. I have reinstalled DW CC and the same problem occurs.
    In detail:
    I create a template from my Index page with editable region.
    Then I create a new document from the template.
    Now the transitions work, are editable - only adding a new one crashes DW.
    Next I will have CS6 and CC both reinstalled, and hope for the best.
    Thank you for your help.
    TPeter

  • "XML declaration allowed only at the start document "on Safari Iphone3G

    Hi
    I get the following error message on my Iphone3G with Vodaphone Hong Kong:
    "The following page contains the following errors:
    error on line 4 at column 6; XML declaration allowed only at the start document
    Below is a rendering of the page up to the first error"
    Moreover I see nothing in the page its empty.
    the corresponding URL is "http://smartone-vodafone.m-finance.net"(only avaible to vodaphone suscriber i guess
    with APN:Smartone-Vodaphone)
    does anyone get same problem? how to solve it?
    thx very much

    Hi,
    I am not sure if you can access it because its for Vodaphone suscriber only, I access first to "SmartIn" service via http://wap.smartone-vodafone.com so thereafter I enter to an other payed financial page "PLUS" via "http://plus.smartone-vodafone.com" and then the "Quote" menu afterwards I cannot open the page to see the price of Gold/Silver which gave me the previous error message and blank page at this adress on my Iphone3G
    "http://smartone-vodafone.m-finance.net/PLUSGS_ST/EN/index.jsp"
    But when I surf on my Laptop I found it seems to be a service provide by this other company "http://smartone-vodafone.m-finance.net"
    I tried different setting (console off/on) javascript/plug-ins/block pop up/accept cookies all On or Off but no succes. Its a pity because I bought Iphone3G only for that
    Maybe Apple should discuss between Vodaphone and this company how to solve the problem. It maybe only one error line code because every other page are fine?? my
    thank you very much again,

Maybe you are looking for

  • How can I see a calendar item that was in my backup

    I did a full backup at home of my 8330 with BB Desktop Software a week ago. But I accidentally left my phone in the office and can't retrieve it until Monday. The data I need is in one calendar entry in the backup. How can I see it? I don't own Outlo

  • Creation of Database View on different tables

    Hi All, I want to create a Data base View on 5 Tables(LIKP,EQUI,MARA,QMFE,QMUR). My Question is i will create a View without any selections and the View is based on tables Link (Primary&foreign keys) only. Once my View is ready i will pull the data f

  • How do I erase my hard drive

    Little help, please.  I want to erase the hard drive on my iMac so we sell it.  I do not want any data inadvertently passed onto the next user.  I am familiar with disk utility and have gone there but when I click on the hard drive it is subdue and I

  • Withholding tax postings for a single line item during invoice posting

    Hello folks, I have a question from my client about withholding tax postings during invoice postings. Scenario is 1. There's a PO with multiple line items. 2. A vendor invoice is posted against this PO which has multiple line items. Not all line item

  • Regarding APP

    Hi, What purpose we use No optimization,Optimize by bank group and Optimize by postal code in Payment method in company code. Regards,