Indexing and Searching .MHT files

Will Plumtree index the contents of an .MHT file the way it does a .DOC? I haven't figured out a way to get it to do so, but I would think it should be able to. all I can index is the title.
Thanks
Dana

After creation the index you need execute next operations.
first, check that your index tables conatins indexed terms. Execute
select token_text from dr$YOUR_INDEX$i;
Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
If it's OK, then check that your PDF documents is supported by INSO filter.
Citation:
"PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
(Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
P.S.
for the beginning, you could find answers for your question about Oracle Text here
http://otn.oracle.com/products/text
Sorry for my English.
Best regards, Victor Zogin.

Similar Messages

  • Indexing and Searching PDF Files

    Hi All,
    I am trying to store and search PDF files in the oracle database.
    I can insert and index the PDF files just fine but cannot get any result. I always get No Rows.
    Here's what I am doing and the issues I am facing.
    I created a Table with fields
    ID (VARCHAR)
    NAME (VARCHAR)
    DOC (BLOB)
    I inserted the PDF file in the BLOB field through a Java program and insert worked fine as I verified by retreiving the PDF and writing to file.
    I created index using following SQL:
    create index my_index on PDF_TABLE(PDF_FLD) indextype is ctxsys.context
    parameters ('datastore ctxsys.default_datastore
    filter ctxsys.inso_filter');
    The index was created successfully without any problems.
    I ran query as follows and got no rows although the searched text is in PDF
    SELECT SCORE(1), PDF_FLD from PDF_TABLE WHERE CONTAINS (PDF_FLD, 'Table of Cotents',
    1) > 0;
    I tried alternate queries as well with no luck.
    Any ideas ??
    Thanks

    After creation the index you need execute next operations.
    first, check that your index tables conatins indexed terms. Execute
    select token_text from dr$YOUR_INDEX$i;
    Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
    If it's OK, then check that your PDF documents is supported by INSO filter.
    Citation:
    "PDF - Portable Document Format
    Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
    (Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
    For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
    P.S.
    for the beginning, you could find answers for your question about Oracle Text here
    http://otn.oracle.com/products/text
    Sorry for my English.
    Best regards, Victor Zogin.

  • Indexing and searching excel file

    hai friends,
    i need to index and search the records from the excel file using lucene java
    if u ve any code for that plz give me
    thank you in advance

    gimbal2 wrote:
    I'm not even going to try and tell you just how wrong your post is.But I will! ;-)
    Ok, checking the items from [_How To Ask Questions The Smart Way_|http://www.catb.org/~esr/faqs/smart-questions.html]:
    - [_Write in clear, grammatical, correctly-spelled language_|http://www.catb.org/~esr/faqs/smart-questions.html#writewell]
    - [_Be precise and informative about your problem_|http://www.catb.org/~esr/faqs/smart-questions.html#beprecise] (especially the third item)
    - [_Be explicit about your question_|http://www.catb.org/~esr/faqs/smart-questions.html#explicit]

  • Indexing and Searching pdf files which are used as attachment in an Announcemnet list item

    Hi all,
    I am using a SharePoint 2013 online environment and trying to search and find pdf files which are attached to a announcement list item. However it does not find anything when I search for the name of the pdf file or the content of the pdf file.
    When I attach a word to the list item it gets indexed and it find the file.
    thanks and appreciate every kind of advice.

    Are you able to search for pdfs in other locations? SharePoint 2013 comes with an iFilter out of the box unlike 2010 which needed configuration.

  • InterMedia indexing and searching of zipped files

    Hello, I have interMedia successfully configured to index and query a repository of files (MS Word, Excel, PPT, PDFs, txt files)which are located on a file system. My issue is with zip files. I cannot successfully index and search zip files. I've tried zips that contain both ascii(text) and formatted files (doc, ppt), but interMedia seems not to recognize this particular MIME type. Is there a way to have interMedia index and search zip files? Thanks in advance for any assistance.

    You will have more luck with this question if you post it in the Oracle Text forum. This forum is for interMedia (image, audio video).

  • Webinar: Understanding TREX Indexing and Search Options

    <b>SAP NetWeaver Know-How Network Webinar: 
    Understanding TREX Indexing and Search Options
    Wednesday 25 August 2004
    11 a.m. EDT</b>
    On Wednesday 25 August, Larry Brambrut, an EP RIG Consultant, hosts the webinar titled  <b>Understanding TREX Indexing and Search Options</b> as part of the ongoing SAP NetWeaver Know-How Network Webinar Series.
    Here’s how Larry describes his webinar presentation:
    “This session will describe the enhancements to "Search and Classification"(TREX) in NetWeaver '04 and EP 6.0 SP2 Patch 6. The session will include a discussion of the CM enhancements such as new crawlers, new search UI options and plug-ins, and TREX enhancements such as the new TREX architecture, delta indexing, and new TREX Admin Tool.”
    SDN invites you to post your questions to the presenter prior to the webinar and continue the online discussion afterward.
    <b>How to Participate</b>
    (Please go to the SDN webinar schedule page to find more information)
    Dial-in Information:
    Date: Wednesday 25 August 2004
    Time: 11 a.m. EDT
    Within the U.S., call: +1.888.428.4473
    Outside the U.S., call: +1.651.291.0618
    Password: NetWeaver04
    WebEx Information:
    Topic: SAP NetWeaver Know-How Network
    Date: Wednesday 25 August 2004
    Time: 11 a.m. EDT
    Meeting Number: 742391500
    Meeting Password: netweaver04 (lowercase)
    WebEx Link: sap.webex.com
    Replay Information:
    A recorded replay of this call will be available for approximately three months after the webinar. Access this recording by dialing the appropriate number and using the replay access code 720155.
    Toll-free: +1.800.475.6701
    International: +1.320.365.3844
    <b>
    About the SAP NetWeaver Know-How Webinar Series</b>
    The SAP NetWeaver Know-How Webinar Series is driven by the SAP NetWeaver Regional Implementation Group (RIG), part of the SAP Development organization. The mission of the SAP NetWeaver RIG is to enable customers, employees, and partners to successfully implement the SAP NetWeaver solution. This SAP RIG has expertise in BI, EP, XI, and WebAS. They contribute their implementation expertise to the SDN implementation forums as well as to the SAP NetWeaver Know-How Webinar Series.
    <b>Disclaimer</b>
    SDN is not responsible for any changes to the webinar schedule. The webinar schedule may be changed or cancelled without prior notice.

    Hi there,
    I just read this thread, and maybe someone here can answer my current trex question:
    I have created an ordinary CM repository, and created an index with this repository as source. Now the problem: I would like to exclude files in the repository with specific mimetypes from the TREX indexing process.
    I have verified that the TrexValidMimetypes.ini does not contain any reference to the Mimetypes I'm creating, but never the less, the document titles are searchable and are returned when searching.
    How do I get around this issue?
    Is it possible in NW04 or EP6.0 SP3 PXXX??
    Regards,
    Hco

  • Sun Java Indexing and Search Service - services not starting(maintainance)

    I installed Comm Suite 7 in a single solaris host. I installed jiss as in wiki*. Installation was ok but the jiss index and search services won't start up(maintainance).
    --------------------------------- /var/iss/logs/iss-indexsvc.log.0---------------------------
    Wed Nov 04 16:16:16 IST 2009 com.sun.comms.iss.indexapi.IndexService startService WARNING: St
    arting index service.
    Wed Nov 04 16:16:17 IST 2009 com.sun.comms.iss.indexapi.IndexService startService SEVERE: JMS
    Exception: com.sun.messaging.jms.JMSSecurityException: [C4060]: Login failed: user=jmquser,
    broker=webmail.example.com:7676(39599)
    -----------------------/var/svc/log/application-jiss-indexSvc:default.log-----------------
    webmail.example.com:389 (tcp) => Active
    webmail.example.com:7676 (tcp) => Active
    Nov 4, 2009 4:16:16 PM com.sun.comms.iss.indexapi.IndexService main
    INFO: Begin checking write.lock files.
    Nov 4, 2009 4:16:16 PM com.sun.comms.iss.indexapi.IndexService startService
    WARNING: Starting index service.
    Nov 4, 2009 4:16:17 PM com.sun.messaging.jmq.jmsclient.ExceptionHandler logCaughtException
    WARNING: [I500]: Caught JVM Exception: java.io.EOFException
    Nov 4, 2009 4:16:17 PM com.sun.comms.iss.indexapi.IndexService startService
    SEVERE: JMS Exception: com.sun.messaging.jms.JMSSecurityException: [C4060]: Login failed: us
    er=jmquser, broker=webmail.example.com:7676(39599)
    Error getting IndexService instance: com.sun.comms.iss.common.IssException: JMS Exception:
    Service startup failed
    [ Nov  4 16:16:26 Method "start" exited with status 1 ]
    i run the setup.sh file several times with different values. but problem remains. i check the troubleshooting page too.
    Any help appriciated.
    wiki:
    (http://wikis.sun.comdisplayCommSuite7Communications+Suite+7+Installation+Scenario+-Indexingand+Search+Service)
    Thusith.

    Thusith.M wrote:
    =============================
    # ./imqusermgr list
    User repository for broker instance: imqbroker
    User Name Group Active State
    admin admin true
    guest anonymous true
    jmquser user true
    ============================
    The instance name above should be change i guess? am i correct?Given that the Application Server JMQ instance runs on port 7676 by default, you were most likely changing the wrong instance.
    Try adding the jmquser to the Application Server JMQ instance and perform the login test again e.g.
    /opt/SUNWappserver/imq/bin/imqusermgr add -u jmquser -p adminpass -g user
    /opt/SUNWappserver/imq/bin/imqcmd -b webmail.example.com:7676 list dst
    => login with user "jmquser" and password "adminpass"If you see the following message it means the "jmquser" user exists and the password is correct (the jmquser doesn't have enough rights to see the destinations by default):
    com.sun.messaging.jms.JMSSecurityException: [C4084]: User authentication failed:  user=jmquser, broker=webmail.example.com:7676(38692)
    Please check your security configurations.
    Listing destinations failed.Once that is verified try starting the indexSvc again and see if the original error persists.
    Regards,
    Shane.

  • Spotlight disabled - and Indexing and searching disabled - solution

    I have tried most of the recommendations here - and finally even reinstalled and rolled back my entire PB from Time Machine this night - but Spotlight did not work. Got message in terminal "Indexing and searching disabled".
    Solution found on web - below with 2 elements:
    1) Need to remove files that block for indexing
    2) Turn on indexing - Thanks to Patrick Kinsella
    1) Check your root directory for a file called .metadataneverindex
    If it's there, delete it.
    You can only find it after making invisible files visible (se below)
    2) These great hints won’t work if Spotlight is completely disabled (which some people have tried and don’t know how to reverse). If all else fails, follow this procedure:
    Make hidden files visible (copy and paste the next line into Terminal:
defaults write com.apple.finder AppleShowAllFiles -bool YES
    !!!You need to know that the visible/invisible switch works only after a relaunch of Finder (via Command-Alt-Esc).
    Now in (previously hidden) /etc folder in your root/hard drive, find hostconfig, and open it with any text editor.
Does it include this line:
SPOTLIGHT=-YES-
(note the two dashes astride YES)?
If not, type this line at the bottom. Save the file as hostconfig in your root folder (etc folder won’t accept it from a text editor). Now drag and drop this new hostconfig to /etc. This requires your admin password to replace the existing hostconfig.
    Reboot, and your spotlight is working. This may be a long way around, but it worked for me.

    Now make invisibles invisible again with this line in Terminal:
defaults write com.apple.finder AppleShowAllFiles -bool NO
    To fire up Spotlight, you may need to type this into Terminal:
sudo mdutil -i on /
(the slash is important)
    If Mail is still not searching inside Entire Message, type this into Terminal to index your old messages:
sudo mdimport /users/YOURNAME/library/mail [or whatever the path to your mailbox message folders.
    The combination of these two did the trick for me.

    This tip is useful, but a way too complicated way to do this. A better and simpler solution is as follows:
    1. Login in to your server as an administrator.
    2. Open the Terminal application (in your /Applications/Utilities folder)
    3. Type the following exactly as written in to the Terminal:
    +sudo rm /.metadataneverindex+
    It will ask you for your password. I have never seen this file, so you may get a "file not found" error - in which case move on to step 4 anyway.
    4. Type the following in to the Terminal:
    +sudo mdutil -i on /
+
    5. Spotlight in the right corner of your server login should indicate it is indexing, and before long will start working.
    I don't believe editing of hostconfig is required at all, and you don't need to do all this stuff with making invisible files visible since you are already in the Terminal anyway.
    Hope that helps.

  • Spotlight searching no longer working - indexing and search disabled.

    I've been searching the web and tried everything:
    Server 10.5.8
    In Server Admin - the attached drive is a SharePoint with Spotlight search on.
    I've used mdutil to enable Spotlight.
    I've checked permissions.
    I can search the Boot Drive. I can't search the attached drive.
    mdutil returns indexing and search disabled when used to turn it on.
    very frustrating.
    Anyone out there have a clue?
    Thanks,
    Mark

    HI James,
    Open System Preferences/Spotlight and click the Privacy tab. Where you see; Delete any locations listed, Quit System Preferences and restart your Mac and see if you can use Spotlight.
    Spotlight Tips
    Spotlight: How to re-index folders or volumes
    Carolyn

  • Need some advice with indexing and search server in multihomed environment.

    Hi,
    I want to introduce the JISS (java indexing and search server). We have an multihomed environment with two frontends for convergence and imap/pop proxy service and two mail stores in a cluster HA environment
    (Sun cluster 3.2, messaging server 7u3-15.01, convergence 1.0-12.01 running on glassfish 2.1.1). The directory servers (multimaster) are on speperated servers.
    I viewed the jiss deployment pages in the wiki (http://wikis.sun.com/display/CommSuite/Indexing+and+Search+Service+Deployment+Planning), but they are more confusing than helpful.
    My questions are as follows:
    Can I put the jiss web service on the convergence server (to share the same glassfish server?
    Is it better to put the indexing part of JISS on a seperate server or on the convergence server or better on the mail store servers?
    Can I run the JMQ broker in an HA environment on the cluster? Is it possible to run JMQ together with messaging server in the same cluster group?
    Can JISS index two mail stores (I didn't find anything in the config guide)?
    Best Regards,
    Ruediger

    ruediger_kunze wrote:
    I want to introduce the JISS (java indexing and search server). We have an multihomed environment with two frontends for convergence and imap/pop proxy service and two mail stores in a cluster HA environmentI would recommend holding off until the next release (Communication Suite 7 update 1) as ISS update 1 provides a large number of useful enhancement.
    I viewed the jiss deployment pages in the wiki (http://wikis.sun.com/display/CommSuite/Indexing+and+Search+Service+Deployment+Planning), but they are more confusing than helpful.
    My questions are as follows:
    Can I put the jiss web service on the convergence server (to share the same glassfish server?Yes. This is the scenario used in the single-host-install guide:
    http://wikis.sun.com/display/CommSuite7/Sun+Java+Communications+Suite+7+on+a+Single+Host
    Is it better to put the indexing part of JISS on a seperate server or on the convergence server or better on the mail store servers?This is answered in the Deployment Planning guide:
    "Indexing requires significant CPU resources, thus, it is best to install the indexing service on a separate host dedicated to an ISS single server installation. If this is not an option, then install ISS on the back-end host as a single server installation, and install GlassFish Server as well for ISS."
    Can I run the JMQ broker in an HA environment on the cluster? Is it possible to run JMQ together with messaging server in the same cluster group?This article may help:
    http://wikis.sun.com/display/CommSuite/Deploying+GlassFish+Message+Queue+in+a+Highly+Available+Environment
    Can JISS index two mail stores (I didn't find anything in the config guide)?When you Bootstrap the account you point at the mailhost that the account resides on:
    http://wikis.sun.com/display/CommSuite/Administering+Indexing+and+Search+Service
    "Creating New ISS Accounts"
    Regards,
    Shane.

  • Clustering MSG 7.0 u3 + Indexing and Search Service

    Messaging Server 7 Update 3
    Indexing and Search Service 1
    Indexing and Search service requires IMQ broker configured.
    In clustered msg srv deployments with ISS enabled, what is the correct way to deploy IMQ:
    - as a clustered service with affinity configured between msg and imq
    or
    - imq instance running on each node of the cluster
    Thanks,
    D.

    JMQ should be installed on each cluster node with something like this:
    Grab mq4_1-installer-SunOS.zip (SPARC) or mq4_1-installer-SunOS_X86.zip (x86)
    Slap JMQ installer somewhere like /usr/local/src and unzip it
    mq4_1-installer/installer or mq4_1-installer/installer -nodisplay
    Hammer through the install and you should be ready to go
    Now the actual setup of JMQ:
    vi /etc/imq/imqborkerd.conf
    Change {color:#0000ff}*AUTOSTART=NO*{color} to {color:#0000ff}*AUTOSTART=YES*{color}
    If your host has Sun Java Application Server (SJAS) or GlassFish on it (or will) set {color:#0000ff}*ARGS=port 7777*{color}
    Note: SJAS 9.x and GlassFish come with their own install of JMQ and if you are mixing Messaging Server with SJAS/GlassFish you will want the Messaging Server JMQ to have its own port
    Start JMQ -*{color:#0000ff} /etc/init.d/imq start{color}*
    Reset the admin password:*{color:#0000ff} imqusermgr update -u{color}* {color:#0000ff}*admin*{color} *{color:#0000ff}-p{color}* *{color:#ff0000}<password>{color}*
    Disable the guest account: *{color:#0000ff}imqusermgr update -u guest -a false{color}*
    Setup a user account (jesuser in this example) to use for Messaging Server integration: {color:#0000ff}*imqusermgr add -u*{color} *{color:#ff0000}jesuser{color}{color:#0000ff}{color:#ff0000} {color}-g user -p* {color:#ff0000}*jesuser*
    {color:#000000}cd /opt/sun/coms/messaging/sbin (or messaging64 if rolling 64-bit Messaging Server)
    ./configutil -o local.store.notifyplugin.jmqnotify.NewMsg.enable -v 1
    ./configutil -o local.store.notifyplugin.jmqnotify.UpdateMsg.enable -v 1
    ./configutil -o local.store.notifyplugin.jmqnotify.DeleteMsg.enable -v 1
    ./configutil -o local.store.notifyplugin.jmqnotify.maxHeaderSize -v 1024
    ./configutil -o local.store.notifyplugin.jmqnotify.jmqHost -v *"*127.0.0.1*"*{color}
    {color}{color:#000000} (this could also be the IP address of your cluster node, not the service address but the actual IP address of the node)
    ./configutil -o local.store.notifyplugin.jmqnotify.jmqPort -v "7777"{color:#ff0000} {color}(if you have changed from the default port of 7676 enter it here)
    {color}./configutil -o local.store.notifyplugin.jmqnotify.jmqUser -v
    "jesuser" (substitute the JMQ user that you created earlier)
    {color:#000000}./configutil -o local.store.notifyplugin.jmqnotify.jmqPwd -v "jesuser" (substitute the password for the JMQ user that you specified earlier)
    ./configutil -o local.store.notifyplugin.jmqnotify.DestinationType -v "queue"
    ./configutil -o local.store.notifyplugin.jmqnotify.jmqQueue -v "jesms"
    ./configutil -o local.store.notifyplugin.jmqnotify.Priority -v 3
    ./configutil -o local.store.notifyplugin.jmqnotify.ttl -v 1000
    ./configutil -o local.store.notifyplugin.jmqnotify.Persistent -v 1
    ./configutil -o local.store.notifyplugin.jmqnotify -v '/opt/sun/comms/messaging/lib/libjmqnotify$jmqnotify'{color}
    Repeat on each node that will master HA Messaging Server.
    Edited by: nate_keegan on Feb 17, 2010 1:42 PM - the rich text editor is acting strangely, reverting to text for configutil section

  • Multilinual indexing and searching!

    Hi,
    I've installed a Multi_Lexer for German, Swedish, French and English, how it is described in the interMedia documentation. Creating an index works and searching works, too.
    But I still have a problem. Is there a way to combine ALTERNATE_SPELLING and BASE_LETTER in one way, so I can search after the word 'H|tte' with the expressions 'h|tte' or 'huette' or 'hutte'?
    The way I installed the lexer makes only one possibility able to use. Must I create e table with three columns och three indexes with the same contents?
    Who has got a solution?

    Right now at query time "H|tte" can only be converted to one form - currently it's the
    alternate spelling version.

  • Indexing and searching files on linux

    I have been using this program called Everything on Windows.
    Which is the most awesome desktop search program I have ever encountered btw. It's just clean, simple and efficient
    I can never remember where any of my files are so I really need something similar for linux.
    You guys know of anything like this?

    thestinger wrote:
    If you just want to search using filenames, there's nothing wrong with locate and find.
    If you want to search using metadata from files, content in the files, etc. then you'll have to use something like tracker, strigi or beagle.
    find is fine if you don't have many files or know approximately where the file you're looking for is located, but when searching through 3TB of stuff with a partial filename it's gonna slow since there's no search index.
    Also it would be better if I had a simple gui.
    There's mainly 2 things I like about Everything and would like to find a linux search program that works in a similar way
    -Search index, that is built extremely fast. Also it monitors changes and adds/removes from the index on the fly. (search results appear instantly because of this)
    -It searches in a similar way like when you're searching the media library in most music players (e.g. itunes, rhythmbox, foobar). It starts by showing every file and folder on the computer then narrows down the results with each letter you type.

  • Reading shape file and searching by file type

    I want to write an application for working with ESRI .shp files which I think shouldn't be too hard. Looking for recources on this forum I didn't find any at all, which is unusual to me.
    It made me wonder the following: Why can't I search for resources on ni.com by file type? I see a lot of applications that interact in some way with existing file types (beyond ordinary xls, doc and txt), but information (be it articles, links or forum questions) are spread out all over the web (if the information exists at all). I would love to see recources grouped by file type, so I can see in a list of extensions which types people program for(in labview). Maybe there's allready something like this that I am not aware of?

    We don't have a way to search content by file type. And since most attachments are zipped up by NI or by customers, searching for a filetype probably wouldn't yield the results you were looking for.
    If an article (or customer question) makes a reference to that filetype in the text itself, or includes a link to such a file, the file extension will be indexed.
    I don't see any content relating to ".shp" files, so I would recommend simply posting your request to the appropriate product category in the Forums.
    Best of luck with the application.
    -Carrie Hooper
    National Instruments
    Web Support & Operations

  • Where to store the url of a webpage for indexing and searching?

    Dear Java gurus,
    We have a set of html files stroing in a file system. We can use Lucene to index those files with two fields "path" and "content ". Then using Lucene we can search and the result will be the relevant content and its path (path in the file system).
    As each of these html file is a real web page, we know its url in Internet too. However, I don’t know where to store this real url and let Lucene to index not only the path, content but also this url. If this is possible, then the search result will display the url also.
    Do you have any idea about this ?
    This is the last obstacle for us to develop a small Google like search engine. We have already a crawler that works well.
    Thanks for any suggestions.
    Pengyou

    pengyou wrote:
    jschell wrote:
    pengyou wrote:
    However, if I just want store the html file in a file system for quich test purpose, how can I store the url and keep a link to the related file.Instead of storing the content in the database you instead store a file system location (either a path or a file url.)
    That file system location is where you store the content.Indeed, the file system location is where I store the content which I crawled from Internet. However, it is not the initial url from which I crawled this content. I would like to store the initial url of this content too. This is still a problem.No it isn't.
    You have two pieces of data: Url and content.
    If you want to store the content on the file system then your database table would have two columns: url and file_location.
    You then do the following
    1. Save the content to the file system. Derive a file path from that process.
    2. Write a record to the database consisting two data items : url and that file path.

Maybe you are looking for