Document Classification application

I trying to build a document classification application and have read through Chapter 6 of Oracle Text Application Developer’s Guide 11g Release 1. I currently have a document retrieval application that uses Oracle text to search documents. Our documents are stored on a file system and not in the database. We created and maintain a context index on a column of a table that contains the full path to each document. We use queries with the contains clause to do our searches and everything works fine.
Trying to build a document classification application, I have not had much success in following the example in chap 6. Therefore, I have two questions:
1. The example in chapter 6 under rule-based classification, creates table to store the documents to classified. Is it possible to classify documents that are on a file system?
2. Must the documents be text documents? I have a mixture of pdf, doc, txt and xls?
I would appreciate any help you can provide.

1. Yes, it is possible to classify documents that are on a file system. In your classification procedure you will need to load them into a temporary blob using dbms_lob.loadfromfile.
2. No, the documents do not have to be text documents. You can use any supported format by using ctx_doc.ifilter to convert the temporary blob to a temporary clob containing a plain text version of the pdf or whatever to use with matches.
Please see the example below that uses two pdf files. Both contain the phrase "fruit of the month". One has recipes that use bananas and the other has recipes that use cranberries. As you can see, this is a modification of the example in the online documentation. Note the significant changes to the classifier.this procedure.
SCOTT@orcl_11g> create table news_table (
  2           tk           number primary key not null,
  3           title     varchar2(1000),
  4           file_name varchar2(100));
Table created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> insert into news_table values
  2    (1, 'test1', 'banana.pdf');
1 row created.
SCOTT@orcl_11g> insert into news_table values
  2    (2, 'test2', 'cranberry.pdf');
1 row created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create table news_categories (
  2            queryid  number primary key not null,
  3            category varchar2(100),
  4            query    varchar2(2000));
Table created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> insert into news_categories values
  2    (10,'banana','banana');
1 row created.
SCOTT@orcl_11g> insert into news_categories values
  2    (20,'cranberry','cranberry');
1 row created.
SCOTT@orcl_11g> insert into news_categories values
  2    (30,'fruit','banana or cranberry');
1 row created.
SCOTT@orcl_11g> insert into news_categories values
  2    (40,'fruit of the month','fruit of the month');
1 row created.
SCOTT@orcl_11g>
SCOTT@orcl_11g>
SCOTT@orcl_11g> create table news_id_cat (
  2            tk number,
  3            category_id number);
Table created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create index news_cat_idx on news_categories(query)
  2  indextype is ctxsys.ctxrule;
Index created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create or replace directory my_dir
  2  as 'c:\oracle11g'
  3  /
Directory created.
SCOTT@orcl_11g>
SCOTT@orcl_11g> create or replace package classifier as
  2    procedure this;
  3  end classifier;
  4  /
Package created.
SCOTT@orcl_11g> show errors
No errors.
SCOTT@orcl_11g> create or replace package body classifier as
  2    procedure this
  3    is
  4        v_bfile       bfile;
  5        v_docblob     blob;
  6        v_document    clob;
  7    begin
  8        for doc in (select tk, file_name from news_table)
  9        loop
10          v_bfile := bfilename ('MY_DIR', doc.file_name);
11          dbms_lob.open (v_bfile);
12          dbms_lob.createtemporary (v_docblob, true, dbms_lob.session);
13          dbms_lob.loadfromfile
14            (v_docblob,
15             v_bfile,
16             dbms_lob.getlength (v_bfile));
17          dbms_lob.createtemporary (v_document, true, dbms_lob.session);
18          ctx_doc.ifilter (v_docblob, v_document);
19          for c in
20            (select queryid from news_categories
21             where  matches (query, v_document) > 0 )
22          loop
23            insert into news_id_cat values (doc.tk, c.queryid);
24          end loop;
25          dbms_lob.freetemporary (v_document);
26          dbms_lob.freetemporary (v_docblob);
27          dbms_lob.fileclose (v_bfile);
28        end loop;
29    end this;
30  end classifier;
31  /
Package body created.
SCOTT@orcl_11g> show errors
No errors.
SCOTT@orcl_11g>
SCOTT@orcl_11g> exec classifier.this
PL/SQL procedure successfully completed.
SCOTT@orcl_11g>
SCOTT@orcl_11g> select * from news_id_cat
  2  order  by 1, 2
  3  /
        TK CATEGORY_ID
         1          10
         1          30
         1          40
         2          20
         2          30
         2          40
6 rows selected.
SCOTT@orcl_11g>

Similar Messages

  • DOCUMENT CLASSIFICATION AND CTXRULE INDEX TYPES IN ORACLE9I TEXT

    제품 : ORACLE SERVER
    작성날짜 : 2004-05-27
    DOCUMENT CLASSIFICATION AND CTXRULE INDEX TYPES IN ORACLE9I TEXT
    ================================================================
    PURPOSE
    이 문서는 Oracle9i 에서 추가된 새로운 기능으로 Oracle Text 에서의
    문서 분류기능(Document Classification) 에 대해 소개한다.
    Explanation
    Oracle9i Text 에서는 CTXRULE index type을 이용하여 각 문서를 분류하는 기능을
    구현할 수 있다. 이 CTXRULE index type은 각 분류별로 define 된 rule에 따라
    문서를 indexing하고, 이러한 기능을 위해 MATCHES operator를 제공한다.
    이 Oracle9i Text 는 일반 Text, XML, HTML 문서에 대해 이 document classification
    기능을 지원한다.
    1. Create a Table of Queries
    먼저 다음과 같이 조회를 위해 분류할 category를 저장할 table을 생성한다.
    CREATE TABLE myqueries (
         queryid NUMBER PRIMARY KEY,
         category VARCHAR2(30),
         query VARCHAR2(2000) );
    아래와 같이 조회시에 문서의 category name과 query시 사용할 구문을 저장한다.
    INSERT INTO myqueries VALUES(1, 'US Politics', 'democrat or republican');
    INSERT INTO myqueries VALUES(2, 'Music', 'ABOUT(music)');
    INSERT INTO myqueries VALUES(3, 'Soccer', 'ABOUT(soccer)');
    2. Create the CTXRULE Index
    CTXRULE index를 이용하여 이 table에 Text index를 생성한다.
    CREATE INDEX myqueries_idx ON myqueries(query)
    INDEXTYPE IS CTXRULE
    PARAMETERS('lexer lexer_pref
    storage storage_pref');
    이 CTXRULE에는 filter, memory, datastore, stoplist 는 지정할 수 없다.
    3. Classifying a Document
    아래와 같이 실제 데이타가 저장되는 table을 생성한다.
    CREATE TABLE news (
         newsid NUMBER,
         author VARCHAR2(30),
         source VARCHAR2(30),
         article CLOB);
    이 Table에 data를 insert하기 전에 MATCHES operator 를 이용하여 지정한 내용에
    대해 문서를 분류하여 query table에 저장하는 trigger 를 생성한다.
    You can create a before insert trigger with MATCHES to route each document to
    another table news_route based on its classification:
    CREATE TRIGGER NEWS_MATCH_TRG
    BEFORE INSERT ON NEWS
    FOR EACH ROW
    BEGIN
    -- find matching queries
    FOR c1 IN (select category
    from myqueries
    where MATCHES(query, :new.article)>0)
    LOOP
    INSERT INTO news_route(newsid, category)
    VALUES (:new.newsid, c1.category);
    END LOOP;
    END;
    RELATED DOCUMENTS
    Oracle9i Text Application Developer's Guide Release

    Hi,
    please post a Oracle Text question in the Oracle Text forum: Text
    That forum is monitored by Oracle Text Experts. This forum not always.
    Herald ten Dam
    Superconsult.nl

  • Error message when I try to print: unsupported document-format"application/vnd.cups-command. I recently had my hard drive replaced in my early 2010 21" iMac. Can anyone help?

    Error message when I try to print: unsupported document-format"application/vnd.cups-command". I recently had my hard drive replaced in  my early 2010 21" iMac. Anyone know anything about this? Thanks in advance.

    What system are you running?  Have you tried going to the Canon website to download and install the latest drivers for that model?
    Are you running Time Machine?

  • Bookmarks not working in multi-document PDF application

    Why are bookmarks not working in multi-document PDF application when uploaded to the web? We created a multi-document PDF application. All bookmarks and links work on our systems. All work on the customer's system. They do not work when uploaded to the customers website. Their IT department says its an issue of relative vs. absolute addressing. We've created applications like this for years without running into this issue. Can't find anything about it in Acrobat Help. Can anyone help or lead me to help? Thx much, Sandy

    Have you tried opening the file in your browser to see if the bookmarks are working?  To open the file in your browser you need to do this:
    1) Launch your favorite browser;
    2) File >> Open >> Select your pdf file
    See this picture to see what I am talking about.  You might need to click on it to magnify it:
    Good luck.

  • Initially I had downloaded Document 2 (Free) application to view my doc, ppt and xls files. I was not able to edit the files so there was an option for upgrade the Document-2 free to paid version. I have upgraded the Document 2 application.

    Initially I had downloaded Document 2 (Free) application to view my doc, ppt and xls files. I was not able to edit the files so there was an option for upgrade the Document-2 free to paid version. I have upgraded the Document 2 application. But on my iPad now there are to application present Docemnt-2 (Free) and Document-2. I am not able to open any existing document using the upgraded version of application. How do I connect all the existing txt,PPT,XLS doccument to the new upgraded Document-2 application and then to edit it on my iPad.

    As suggested I had deleted the free application and did a hard restart the iPad. I have again copied the document using iTunes. But I am not able to edit any document using this app. Document 2 (paid version) supports editing features of the txt/ xls /ppt files. Is there any problem while loading the Document 2 app. If I reload then do I need to purchase again?

  • Looking for documents on application forms

    Hi all,
    I m looking for some documents on application forms (transaction EFRM).
    step by step documentation for changing a form.
    Thanks.

    Hi,
    1. Do yo have 1 to 1 relationschip between a form class and an application form?
    No , It is not 1 to 1 Relationship
    so if i have to make a new smartform/sapscript do i have to make a new application form/application class?
    For creation of smartform or script , you dont need to Any Application form or Application class
    2.
    Where in the system i can assign a form to the class?
    If it is Adobe form (Tcode SFP), You have to create Interface and form, And assign interface to form , whenevr we are going to create form that momemnt we have to provide implenation name , Check SFP Tcode.
    3.
    some one has for me a step by step tutorial hoe to make changes?
    Regards
    Jana
    Are you going to change smartform or script, If it is smartform or script, No need to assign to Applcaition form

  • ArrayController in non-document based Applications

    I just tried to use the ArrayController in an app. The app was not document based. I did everything as you are supposed to do, I made up a class called AppControll, set it Files Owner of the XIB file, connected the contend Array and so on. There were no warnings or errors, the compiler said No Issues, the Build was suchsessfull. But before the window has opened, the Application crashed. On the console was this log:
    2011-05-17 16:28:45.652 MyApp[9637:903] An uncaught exception was raised
    2011-05-17 16:28:45.656 MyApp[9637:903] [<NSApplication 0x10011b240> valueForUndefinedKey:]: this class is not key value coding-compliant for the key boxArray.
    2011-05-17 16:28:45.792 MyApp[9637:903] *** Terminating app due to uncaught exception 'NSUnknownKeyException', reason: '[<NSApplication 0x10011b240> valueForUndefinedKey:]: this class is not key value coding-compliant for the key boxArray.'
    *** Call stack at first throw:
              0   CoreFoundation                      0x00007fff8059a7b4 __exceptionPreprocess + 180
              1   libobjc.A.dylib                     0x00007fff864c30f3 objc_exception_throw + 45
              2   CoreFoundation                      0x00007fff805f2969 -[NSException raise] + 9
              3   Foundation                          0x00007fff867abb8e -[NSObject(NSKeyValueCoding) valueForUndefinedKey:] + 245
              4   Foundation                          0x00007fff866db488 -[NSObject(NSKeyValueCoding) valueForKey:] + 420
              5   AppKit                              0x00007fff8810a384 -[NSApplication(NSScripting) valueForKey:] + 492
              6   Foundation                          0x00007fff866dedcc -[NSObject(NSKeyValueCoding) valueForKeyPath:] + 226
              7   AppKit                              0x00007fff87e9fb6f -[NSBinder valueForBinding:resolveMarkersToPlaceholders:] + 171
              8   AppKit                              0x00007fff88065f80 -[NSArrayDetailBinder _refreshDetailContentInBackground:] + 368
              9   AppKit                              0x00007fff87e92a33 -[NSObject(NSKeyValueBindingCreation) bind:toObject:withKeyPath:options:] + 557
              10  AppKit                              0x00007fff87e6f546 -[NSIBObjectData nibInstantiateWithOwner:topLevelObjects:] + 1172
              11  AppKit                              0x00007fff87e6d88d loadNib + 226
              12  AppKit                              0x00007fff87e6cd9a +[NSBundle(NSNibLoading) _loadNibFile:nameTable:withZone:ownerBundle:] + 248
              13  AppKit                              0x00007fff87e6cbd2 +[NSBundle(NSNibLoading) loadNibNamed:owner:] + 326
              14  AppKit                              0x00007fff87e6a153 NSApplicationMain + 279
              15  MyApp                            0x0000000100001302 main + 34
              16  MyApp                            0x00000001000012d4 start + 52
    terminate called after throwing an instance of 'NSException'
    The debugger tells me "Programm recieved signal SIGBART"
    If I cut the bnding to the contend array, the Applications stats as usual, but it dosent work, because there is no contend array.
    Then I tried something else. I made up a new project, this time a document-based one, and did exactly the same as I did before. It worked well and fine the way it should work. There was no problem and the ArrayController worked fine. I don`t think a made mistake writing the name of the array, because I allways copied and pasted the name to the IB. So I would like to know if it is possible to use an ArrayController in a Non-document based Application? What do i have to do differently? The reference about the ArrayController does not mention this. It would be great if you could help me soon, thank you.

    No, but actually this was not my Problem But It helped me figure out what the problem is in detail. If you use a document based application, NSDocument can call the subclass NSPersistentDocument. This automatically reads out the model (if you use Core Data. I do`t like Core Data because it`s sometimes a bit strange and it`s hard to hunt down bugs there, but I think basically the problem is the same) and create a managedObjectContext, wich is needed to run the NSArray Controller. The big question is now, how to call something simular to NSPersistentDocument wich is not a subclass ob NSDocument but of NSObject.

  • My documents and application folders are no longer on the dock, where did they go?

    my documents and applications are no lnger on the dock, where did they go, how do I put them back?

    Items in your Dock are actually aliases. The original items should be where they always were. The Applications folder is at the top level of your hard drive, and your Documents folder is in your home folder.
    To recreate the those items in your Dock, open a Finder window (Finder > File menu > New Finder window.) For your Documents folder, click on the icon on the left of the Finder window that has your username on it next to a house icon. Look for your Documents folder. Drag this folder into the Dock. That places an alias of it in the Dock (the original item remains where it was.)
    To get so you can see the Applications folder, click it on the left side of the Finder window, then go to the Go menu and select Enclosing Folder. Now drag the Applications folder from here into the Dock.
    That should do it. Best of luck.

  • Sales Document Classification in SAP

    Hi,
    I want to Classify Sales Documents in SAP. For example, the Inquiries will get classified into the following:
    · A_Type-Project: Business Unit/ Divisional level risk review required.
    · B-Type-Project: Large, medium project, product, service business.
    · C-Type-Project: Standard product business, Product oriented project business (clear scope, predefined process steps, no risk)
    The further processing of the Inquiry will depend upon where it falls in this classification. Can any one help me with how to achieve this in standard SAP?
    SAP Menu > Cross application components > Classification system gives the option to create classification type, characteristics etc. I am trying to explore further on this. How would you attach a classification to a business object, say a Sales Order or Quotation?
    Thanks in advance,
    Karthika.

    SPRO->SAP Web Application Server->Business Management->SAP Business Workflow->Basic Settings->Maintain task classes.
    After that create substitute profile and assign it to classification.
    Thanks
    Arghadip

  • How to share documents among applications created in UCM

    Hi all,
    We have 5 different application in UCM, each have their own set of Values for fields like dDocType, DocumentType1, DocumentType2 etc.
    Now we have a requirement that these document can be shared among application. So if the document now belong to Application1 with dDocType: app1 and DocType1 Reports... Now it should also be available to application 2 with dDocType:app2 and DocType1 App2Report.
    1. One approach is to have applicaton specific metadata like App1DocType1, App1DocType2, App2DocType1, App2DocType2...., and each application can fill these
    fields but this seems to result in long list of metadata.. Not very likable solution.
    2. Another approach might be to use mulitple values in filelds like DocType1, but this will break the dependency list i guess.
    ----- help me out here guys, how do you suggest I make documents available across mulitple application without have any major impact on the app code.
    Thanks,
    Syed

    You could try to send the large audio attachment using Mail Drop.
    See this help page:   Mail (Yosemite): Add attachments
    Send large attachments using Mail Drop
    You can use Mail Drop to send files that exceed the maximum size allowed by the provider of your email account. Mail Drop uploads the large attachments to iCloud, where they’re encrypted and stored for up to 30 days.
    If you have an iCloud account and you’re signed in to iCloud when you click Send, Mail automatically sends the attachments using Mail Drop. Mail Drop attachments don’t count against your iCloud storage.
    If you don’t have an iCloud account, or if you’re not signed in, Mail asks you whether to use Mail Drop (select “Don’t ask again for this account” to always use Mail Drop).
    If a recipient uses Mail in OS X Yosemite, the attachments are automatically downloaded and included in your message just like any other attachment. For other recipients, your message includes links for downloading the attachments and their expiration date.
    You can turn Mail Drop on or off for an account. Choose Mail > Preferences, click Accounts, select your account, click Advanced, then select or deselect “Send large attachments with Mail Drop.”
    It seems not yet to work always reliably, but perhaps you are lucky.

  • Attach document from application server using BAPI

    Hi,
    I am trying to attach files to Document Info Record using BAPI
    BAPI_DOCUMENT_CHECKIN_REPLACE2. When i execute the z program that calls this BAPI on
    presentation server in foreground by giving path of documents which are locating on local pc it works
    fine and check in the file in document infor record
    But when i run program in background and file path is of application server.It gives error that
    Program no longer started via RFC. No return possible.
    I want to know the cause of this error.
    Please help.

    i think u need to pass Storage category  = FILESYSTEM.
    check the documentation of data element -
    DE CV_STORAGE_CAT
    but i'm not 100% sure abt it.
    regards
    Prabhu

  • How to include the file name in CV02N-change document in application field

    Hi
    Can anyone tell me what are the settings to be made in Document management system so that i'm able to change the application name(now if its an excel file its showing as excel file,instead of that,is it possible to change and give some name or description against that?)
    Suitable points will be rewarded.
    Thanks & regards
    Karthik.

    Hi Karthikraj,
    Do you want to change only in CV02N?
    i.e. only the description
    In Originals area below what you see are:
    <b>Appl</b>    <b>Application</b>      <b>Storage Cat</b>   then <b>Lock symbol</b>
    If you want to Change in "Application" just select and Right click on it you will see a details click on it and change the description you want ! thats it!
    If you want to change in "Appl." the you should follow the procedure what i told you previously.
    Try it both you will get any idea.(because i have done both, & its working fine)
    Regards
    Rehman

  • Uploading and view an document in application server from abap

    Dear SDN users,
    I have a similar requirement:
    i need to upload a docuement into SAP  under a particular system generated unique No.
    My basis team has given a file path in application server.
    So i need to upload and view(Not Downloading) that uploaded document at any time in future.
    Note : Each System generated no is having different documents.
    Thanks in advance.
    Regards
    RAJ
    Moderator Message: Do not dump your requirement. Get back to the forums in case you've any specific issues.
    Edited by: Suhas Saha on Jan 14, 2012 3:50 PM

    Dear Prakash,
    As i Said  i have to  upload and Just view the documents.
    its an urgent requirement.
    i want to upload multiple documents and i have to raed with file name.
    Note : currently it is downloading only last uploaded one.
    following is the code:
    DATA: V_DSN(40) VALUE '\usr\reports\fico\',
          V_STR(1673) TYPE C.
    FORM UPLOAD .
    CALL FUNCTION 'GUI_UPLOAD'
    EXPORTING
    FILENAME = L_FNAME
    FILETYPE = 'BIN'
    IMPORTING
    FILELENGTH = LENGTH
    TABLES
    DATA_TAB = ITAB.
      OPEN DATASET V_DSN FOR OUTPUT IN BINARY MODE..
      LOOP AT ITAB INTO V_STR.
        TRANSFER V_STR TO V_DSN LENGTH 1673.
      ENDLOOP.
      CLOSE DATASET V_DSN.
      IF SY-SUBRC EQ 0.
    MESSAGE S001(ZSD) WITH 'Sucess'.
      ENDIF.
    EndForm.
    FORM DOWMLOAD .
    CLEAR WA_DEMO.
    OPEN DATASET V_DSN FOR INPUT IN BINARY MODE.
    DO.
    READ DATASET V_DSN INTO ITAB-FIELD MAXIMUM LENGTH 1673.
    IF SY-SUBRC = 0.
    APPEND ITAB.
    ELSE.
    EXIT.
    ENDIF.
    ENDDO.
    CALL FUNCTION 'GUI_DOWNLOAD'
    EXPORTING
    FILENAME = DWN_FILE
    FILETYPE = 'BIN'
    BIN_FILESIZE = LENGTH
    IMPORTING
    FILELENGTH = LENGTHN
    TABLES
    DATA_TAB = ITAB.
    endform
    Regards
    MNR
    Edited by: mnr4sap on Jan 14, 2012 1:54 PM

  • How to open word document from application server?

    hello ,
    I want to open a word document that is placed on the application server.
    the function module 'ws_execute' works fine for displaying documents placed on ur local file system. But if i want to open from application server, what needs to b done?
    i m referring to the following demo program 'SAPRDEMOOFFICEINTEGRATION'
    is this the correct reference or there is some other way to do the same?
    kindly help.
    Regards,
    Roshani

    Hi Roshani,
              The solution is like this.
    1> Execute - this is the solution to your program. The file is generated and put in the Application Server for the data to be sent by some means to the intended Place.
    2> To Display/View - This is for the person (who runs the program) to view if the output generated is correct or not.
             So to accomplish this use the same internal table for both Open Dataset and Gui download.
    Open dataset puts data onto the Application server and Gui_download puts the data onto the Desktop or Presentation server for the user to see the data thats fetched.
    So use them as directed to achieve the desired output.
    Reward Points.
    Thanks,
    Tej..

  • Accidentally put documents in Applications folder, can't get them out

    In Finder I accidentally moved a folder with two Word documents from the Desktop to the Applications folder. I can't move them back out. When I try, I get pointers to the documents in the Applications folder. I can delete the folder, but then the documents are gone.
    Suggestions for undoing this welcome. (I should have tried "Undo" early on.)

    Hold down the Command key when you drop it. You should see the badge disappear when you hold down the key. That indicates it will move as opposed to making an alias (default) and making a copy (Option key).

Maybe you are looking for

  • Dead message queue - DLQ for foreign JMS providers

    Is there a way to configure Error-queue or 'Dead message/letter queue' in MDB listening to foreign JMS provider like JBoss? (Automatic routing of failed messages to error-queue after specified retries to preserve JMS messages)           I did all the

  • Multicast address inside a solaris zone buggy ?

    I need to post to a multicast address from within a solaris 10 (06.06) zone but the routing is abnormal: On the global: 224.0.0.0 240.0.0.0 172.16.248.42 e1000g0 1500* 0 1 U 0 0 Inside the zone: 224.0.0.0 240.0.0.0 172.16.0.60 e1000g32001:1 1500* 0 1

  • Online Voting Software/Systems

    Hi all, I'm not entirely sure whether this is the right board for this question but of all boards in the arch forums it seems most appropriate to me. I am looking for an online voting software or system, preferably one that runs on Linux (but much of

  • Songs in folder won't import into library unless done manually

    I have 8K songs and when i drag or import the entire iTunes music folder into the window or command-O, most of the songs are not recorded in the library that iTunes builds. It has never done that before. All the files are correct formats, including o

  • Oracle spatial, disks and partitions...

    Can anyone point me in the direction of some kind of documentation with regards to recommended disk and partition configurations for Oracle Spatial? I'm running Red Hat Enterprise Linux and Oracle 10g. With the recent advances in technology in terms