Search/Index documents

Teaming 3.0 beta 4:
I've added a txt file and a word document, re-indexed through the "admin pages".
The search function doesn't find words inside the documents, what's wrong?
Surely it should be able to search content within file folders?

Solved it myself.
After installing Open Office (which strangely enough isn't listed as a pre-requisite) you need
to restart your (windows) server.
To make sure that Teaming can interact with OO you can Telnet localhost on port 8100 (this is also a good way of making sure that OO is alive and kicking, in my experience this is not always the case even if it shows up in taskmanager).
To test that oo-teaming-lucence can communicate you need to upload a document into a file folder, when doing so; monitor the Tomcat console and make sure that no "conversion exceptions" shows up.

Similar Messages

Embedded Search Index AND Document Security?

I'm using Adobe Acrobat Standard 8.1.7.
It appears that I cannot have both an embedded search index and restricted security (e.g., password required to change document) on the same document.
Why is that?
If I start with security ON and then attempt to embed a search index, I get below error message:
A search index can not be embedded in this document because this document has restricted security permissions.
If I start with security OFF, successfully embed a search index, and then secure the document, Acrobat "strips off" the previously embedded search index. No warning message; no feedback to end-user; just kills it!
Why are those two functions mutually exclusive? Anyone know of a work-around?
Thank you in advance!

Hi,
As to "why", that might be floating out there in Adobe's devnet space or in one of the blogs maintained by Adobe's devnet crew.
Also good to know about use of embedded index - if used, cannot apply fast web view to the PDF. It is one or the other, but not both.
Work around? I've not come across one; but, that does not mean something isn't "out there" <g>.
Be well...

Ultra Search Indexer: Adding 'alien' document types.

The way the Ultra Search indexer finds src material will not work in my situation. While I may be able to give it databases to crawl, it cannot crawl our content, so the way that you tell the indexer about 'alien' document types by adding custom code to return lists of URLs so the indexer can read the src documents won't work in my scenario.
I want to know what the Ultra Search application does special when indexing documents?
Is there a description so I can reproduce using Oracle Text and perhaps point the Ultra Search querying component against my manufactured repository and have it work?
Thanks.

Is there a way to set up finder search with additional criteria so that it isolates file extensions with .docx, .pdf, .txt all in one single search?
currently the "kind is document" also brings up .jpgs and .wavs which I dont want, (or consider documents).

[Oracle Text]How to register additional datas when indexing documents ?

Hello,
For the moment we index documents (Word, excel, pdf, ppt, html, xml...) from the filesystem and it works well.
Now, we need to attach some informations on each documents and we must be able to search on these attributes, for instance :
We can index a Word document and we would like some additionnal index informations like :
YEAR
SIZE
NUMBER
These informations are stored in a table, the table contains also the path to the documents on the filesystem.
We are able to query a text on the index mixed with a filter on the columns above.
We tought with the solution to store these informations directly in the index, but we don't know if it's a good solution (in term of speed, structure...)
So, Is there any solution to index the documents on the filesystem with extra information at index time ?
Is it possible ? How can we do that ?
What do you think about that ?
Thanks by advance

1. If you're using 12c, you can use ctx_doc.policy_languages. (https://docs.oracle.com/database/121/CCREF/cdocpkg011.htm#CCREF24102)
2. If you want multiple stoplists based on each document's language, you have to use the multi-lexer. For world_lexer, there is one stoplist; since the stoplists are somewhat dynamic (you can add but not remove them), the most accurate way to fetch the list is using ctx_report.describe_index or ctx_report.create_index_script and parse the report.

Problem in searching the document

Hi Experts,
We've uploaded a ppt file into our KM repository. Similarly we've uploaded some other files (txt, doc, etc..) at the same location. But the problem here is, when we are searching the ppt document with search option the document (ppt) is not getting displayed. We are able to search other documents (txt, doc, etc..) which are resided in the same folder.
I've checked index information of the ppt file and found below information for the Service types: Clasification, the status of resource is : OK index operation.But for the service types: Search, the satus of resource is: No information available.
No schedule was defined for the index and we are able to fetch all other documents from different folders under the same repository.
What could be the problem.
Best Regards,
Satya

Hi,
Check the Crawler Parameters if a Resource Filter is assigned to it:
http://help.sap.com/saphelp_nw70/helpdata/EN/46/5d5040b48a6913e10000000a1550b0/frameset.htm
Here is how the Resource Filter can exclude files from Crawling.
http://help.sap.com/saphelp_nw70/helpdata/EN/c0/6f5040b48a6913e10000000a1550b0/frameset.htm
Regards,
Praveen Gudapati

Search Index hangs on large files

I am using the 6.1 SP4 Web Server on Windows XP. I have the default search application working. My problem is adding MS Word documents to the collection. Word documents of the size 25KB work, but when I add a large Word document 6934KB it just hangs with no CPU activity going on.
Is there a work around or a parameter that needs to changed to allow larger sized files to be indexed in the collection?

I'm not sure what's going wrong, but you may want to try increasing the amount of memory the search indexer is allowed to use. To do this, you'll need to modify the searchadmin.bat batch file. You can find it in your Web Server's bin\https\bin subdirectory.
First, make a backup copy of searchadmin.bat. Next, open searchadmin.bat in a text editor and replace "-Xmx128m" with "-Xmx1024m". This will allow the search indexer to use up to 8x more memory.
Please let us know whether this helps.

Creating and Searching index files

Hi,
This is my first posting so apologiese in advance if it is difficult to understand.
Firstly, I am from a development background of some 6 years (c#, asp.net, vb.net, SQL) so i have a solid grounding and am happy to give any new development a try.
I have a client who has 10,000's of scanned pdf documents but no real way of retrieving and searching these documents. In fact another person in my team is involved in making sure these pdfs are searchable by doing some OCR and rescanning work. I have been asked to come up with a way to allow client to retrieve(open) and search the pdf files.
Here is what i am proposing based on what I have leared about Acrobat Pro 9 and Adobe Reader 8/9.
I actually want to do a lot more than above but think that is enough to get me started. In a nutshell the client would like a web based solution to search and retrieve (open) there scanned pdf documents. I have read on other posts in the forum an understand it can be quite difficult to search pdf's (pdx file) from Internet Explorer.
As a start i had the following tasks in mind as needing completed:
Task 1 - Create directory structure on file server for all the scanned pdfs. Create catalog(s) for these documents and then create an index which creates a .pdx file. Given the number of documents it looks like I will have X number of catalogs and X number of pdx files.
Task 2 - create a web application that allows a tree view to be displayed that replicates the directory structure above so a user can open the pdf from the browser.
Task 3 - I know the users can use Adobe Reader advanced find functionality and select the indexes i have created. However, I would like to create a plugin for Adobe Reader that will load all of the indexes into the selectable index list and select all by default. Given the volume of documents it is likely the user will not know exactly what they are looking for so will need to search across all indexes.
Thanks in advance for any help anyone can provide me in getting started with these tasks.

Some links that may be helpful.
Acrobat Developer Center:
http://www.adobe.com/devnet/acrobat.html
Forums:
Acrobat SDK
http://forums.adobe.com/community/acrobat/acrobat_sdk
Acrobat Scripting
http://forums.adobe.com/community/acrobat/acrobat_scripting
Various topic specific forums at the AUC
http://acrobatusers.com/forum
Be well...

Index document with Oracle Text from an ECM without saving the content

Hi,
I have documents in a ECM (Alfresco, UCM and more) and I would like Oracle Text to index the document without saving the content. I want to save space and not have redundant information. I would use Oracle Text to search for document's identification (ID) and fetch the document from the ECM using the ID.
Is it possible ?
Do I have to use Secure Enterprise Search ?
Thanks
Simon

I want to save space and not have redundant information.The database space or the disk space (in OS)?
If it the database space, it is not possible to index/serach without storing the file conetents.
using , FILE_DATASTORE you can save the file in the disk (OS) and index them.
When you remove the file, you need to re-index it.
I donot see any other ways.
Do I have to use Secure Enterprise Search ?SES also uses Oracle Text as its base. It also uses FILE_DATASTORE. But the re-indexing part is automated using crawlers.

Oracle text performance with context search indexes

Search performance using context index.
We are intending to move our search engine to a new one based on Oracle Text, but we are meeting some
bad performance issues when using search.
Our application allows the user to search stored documents by name, object identifier and annotations(formerly set on objects).
For example, suppose I want to find a document named ImportSax2.c: according to user set parameters, our search engine format the following
search queries :
1) If the user explicitely ask for a search by document name, the query is the following one =>
     select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;
2) If the user don't specify any extra parameters, the query is the following one =>
     select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0;
Oracle text only need around 7 seconds to answer the second query, whereas it need around 50 seconds to give an answer for the first query.
Here is a part of the sql script used for creating the Oracle Text index on the column OBJFIELDURL
(this column stores a path to an xml file containing properties that have to be indexed for each object) :
begin
Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 200) ;
ctx_ddl.set_attribute('wildcard_pref','prefix_min_length',3);
ctx_ddl.set_attribute('wildcard_pref','prefix_max_length',6);
ctx_ddl.set_attribute('wildcard_pref','STEMMER','AUTO');
ctx_ddl.set_attribute('wildcard_pref','fuzzy_match','AUTO');
ctx_ddl.set_attribute('wildcard_pref','prefix_index','TRUE');
ctx_ddl.set_attribute('wildcard_pref','substring_index','TRUE');
end;
begin
ctx_ddl.create_preference('doc_lexer_perigee', 'BASIC_LEXER');
ctx_ddl.set_attribute('doc_lexer_perigee', 'printjoins', '_-');
ctx_ddl.set_attribute('doc_lexer_perigee', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('doc_lexer_perigee','index_themes','yes');
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','theme_language','english');
ctx_ddl.set_attribute('english_lexer', 'printjoins', '_-');
ctx_ddl.set_attribute('english_lexer', 'BASE_LETTER', 'YES');
ctx_ddl.create_preference('german_lexer','basic_lexer');
ctx_ddl.set_attribute('german_lexer','composite','german');
ctx_ddl.set_attribute('german_lexer','alternate_spelling','GERMAN');
ctx_ddl.set_attribute('german_lexer','printjoins', '_-');
ctx_ddl.set_attribute('german_lexer', 'BASE_LETTER', 'YES');
ctx_ddl.set_attribute('german_lexer','NEW_GERMAN_SPELLING','YES');
ctx_ddl.set_attribute('german_lexer','OVERRIDE_BASE_LETTER','TRUE');
ctx_ddl.create_preference('japanese_lexer','JAPANESE_LEXER');
ctx_ddl.create_preference('global_lexer', 'multi_lexer');
ctx_ddl.add_sub_lexer('global_lexer','default','doc_lexer_perigee');
ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger');
ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
ctx_ddl.add_sub_lexer('global_lexer','english','english_lexer','en');
end;
begin
     ctx_ddl.create_section_group('axmlgroup', 'AUTO_SECTION_GROUP');
end;
drop index ADSOBJ_XOBJFIELDURL force;
create index ADSOBJ_XOBJFIELDURL on ADSOBJ(OBJFIELDURL) indextype is ctxsys.context
parameters
('datastore ctxsys.file_datastore
filter ctxsys.inso_filter
sync (on commit)
lexer global_lexer
language column OBJFIELDURLLANG
charset column OBJFIELDURLCHARSET
format column OBJFIELDURLFORMAT
section group axmlgroup
Wordlist wildcard_pref
Oracle created a table named DR$ADSOBJ_XOBJFIELDURL$I which now contains around 25 millions records.
ADSOBJ is the table contaings information for our documents,OBJFIELDURL is the field that contains the path to the xml file containing
data to index. That file looks like this :
<?xml version="1.0" encoding="UTF-8" ?>
<fields>
<OBJNAME><![CDATA[NomLnk_177527o.jpgp]]></OBJNAME>
<OBJREM><![CDATA[Z_CARACT_141]]></OBJREM>
<OBJID>295926o.jpgp</OBJID>
</fields>
Can someone tell me how I can make that kind of request
"select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0;"
run faster ?

Below are the execution plan for both the 2 requests :
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c WITHIN objname' , 1 ) > 0
PLAN_TABLE_OUTPUT
|     Id     | Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
|     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
|     1      | TABLE ACCESS BY INDEX ROWID     |ADSOBJ      |1272     |119K     |     4     (0)     |
|     2      |     DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
Note
- 'PLAN_TABLE' is old version
Executed in 2 seconds
select objid FROM ADSOBJ WHERE CONTAINS( OBJFIELDURL , 'ImportSax2.c' , 1 ) > 0
PLAN_TABLE_OUTPUT
|     Id     |Operation                              |Name                         |Rows     |Bytes     |Cost (%CPU)|
|     0     | SELECT STATEMENT                    |                              |1272     |119K     |     4     (0)     |
|     1     | TABLE ACCESS BY INDEX ROWID     |ADSOBJ                         |1272     |119K     |     4     (0)     |
|     2     | DOMAIN INDEX                    |ADSOBJ_XOBJFIELDURL     |          |          |     4     (0)     |
Sorry for the result formatting, I can't get it "easily" readable :(

Batch option to automatically place the search index path

HI
My company has 1000s of PDFs that we deliver to customers with search capabilities across all documents. We have created the catalog index for these documents, and in the past, we used a plugin called Options to batch set the Search Index path in Document Properties. This was a lot easier and less time consuming than opening every single PDF and manually inputting the path. Before options, it took me well more than a day to do this to every document we deliver. The plugin reduced my time to less than an hour. Very valuabe when in a delivery crunch!!!
But since upgrading to Acrobat X Pro which we need for our Office 2010 upgrade, we have been unable to use the plugin. I am looking for a new solution that can run a process to set this option across all documents. We are not a bunch of people who can go write scripts etc to do this so something off the shelf/ready made would be great.
Any recommendations?

"... manually inputting the path."
That is one method.
While not on your scale I maintain a large "eLibrary" of PDF document collections.
Also provide OSM for distribution of some of the collections.
Each topic has a cataloged index. Each sub-topic has a cataloged index.
For a topic / sub-topic a PDF that the user will land on has the path to its PDX.
The PDF opens, the PDX is mounted, advanced search is available for the respective collection.
The PDX stays 'mounted' until the end-user moves on.
Also, PDXs can be selected from the Search dialog/pane.
The PDX and its associated folder of index files is not in the folder that holds PDF(s) which are periodically updated.
So, update a collection, rebuild the index.
Good to go.
Not as quick as the plug-in but less mind-numbing as the "each PDF manually".
Be well...

Search & index problem! (A setting missing I think!)

Hi All
I have created a file system repository to display the content of our HR policy documents. I have then created an index to search and classify the documents and I have moved them into there area in the Taxonomy structue. However when I search I seem to get the results from more than one index and I have no idea why?
Can anyone help
Thanks Phil

Hello Phil
In order to limit a Search iView to a specific index(es), create a Search Options set and assign it to your iView.
To create a Search Options Set, go to System Administration > System Configuration > Knowledge Management > Content Management > User Interface > Search (advanced options) > Search Options Set. The parameter to pay attention to in a Search Options Set for this specific purpose is the Search Index IDs (or Index Groups, depending on how you want to limit the search).
Then, to assign the new Search Options Set to your Search iView, open the iView for editing in the Portal Content Directory, and enter the name of your Search Options Set in the corresponding iView parameter.
Hope this helps!
Cheers,
Fallon

Text Search Index

Hi Gurus,
In text search index is there any way to know which documents are indexed and which are not.
thanks in advance
saby

Actually, I was thinking of using autonomous_transaction at the time in a proc that is called by the trigger as follows:
CREATE OR REPLACE PROCEDURE z_sync_idx
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
ctx_ddl.sync_index('Z_TEST_IDX');
END;
CREATE OR REPLACE TRIGGER z_test_trig
AFTER INSERT OR UPDATE OR DELETE ON z_test
FOR EACH ROW
BEGIN
z_sync_idx;
END z_test_trig;
My logic was flawed however. This does work for those items marked as pending on insert...the problem is the commit for the current insert hasn't happened yet, therefore, it is not marked as pending. Inserting into the table does kick off the triggger, but doesn't sync the current record as a result.
Looks like we're back to the job...

TREX not indexing document 10 KB

hi...
TREX search is working fine for documents < 10 KB but if i try indexing documents > 10 KB it gives prperation failed error.Please let me know what as to be done so that it indexes document > 10 KB
Please help me on this as it is very urgent
Thanks in Advance,
Shanshank

I installed an EP7/TREX SP10 two years ago. All worked fine until I decided to apply windows SP2 and some security patch. My machine is a win2003 x64.
Now EP7 works fine, but TREX in not able to reindex documents > 10kb.
Trex preprocessor trace says:
Preprocessor.cpp(03550) : HANDLE: DISPATCH - Processing Document with key '/documents/Segreterie/Documenti PCTP/Doc. in Arrivo/Anno 2009/2009_03123.pdf' failed, returning PREPROCESSOR_ACTIVITY_ERROR (Code 6401)
In the portal security log file I found the corresponding error:
System/Security/Authentication#sap.com/irj#com.sap.engine.services.security.authentication.logincontext#Guest#2####764602d0407311dea83600188b77747b#SAPEngine_Application_Threadimpl:3_17##0#0#Info#1#com.sap.engine.services.security.authentication.logincontext#Plain###LOGIN.FAILED
User: N/A
Authentication Stack: ticket
It seems to me that the portal is not able to pass user name to Trex (User: N/A) and then It is not authorized to retrive the documents.
I tried to change the user of indexmanager service "index_service"; to set the alternative host in url generator service. Nothing changed.
Any suggestion?
Giorgio Peressin

Indexing document failed. HTTP-Proxy: ServiceUnavailable (Errorcode 13503)

Hi friends,
I am able to create an index but it is showing red in trex monitor->display queue.It says preparation failed 6 and also To be transmited 5.When i see the log file it says "Indexing document failed. HTTP-Proxy: ServiceUnavailable (Errorcode 13503)".
I had created one index perviously it was showing in the search result but its queue status was red also.but now any more index result is not showing in the search result.I cannot understand why it is happening has i have performed all the post installation steps including the setting of bypass proxy server address in portal services.
Thanks

Hi,
Yes indexing has worked for a word document.But when i put the folder containing number of documents as data source then in display queue status of trex monitor it shows a red status with "processing failed"(equal to the number of documents inside the folder).In the log message i see the error message"Indexing document failed. HTTP-Proxy: ServiceUnavailable (Errorcode 13503)".After some Hours when i stop the index in trex monitor as nothing was happening.When i search the folder(in search command box) with the name it is showing the folder as it is containing the document.But when i try to search the documents as string with * at the end by name then there is no result.
I think as indexing of the documents was not done so documents could not be searched but how come i see the folder containning the document when i but the folder name in the search request.
thanks

Searching hashtag documents

Hi,
I am searching the documents/pages based on hash tags but it is not showing in the search result.
Please help.
Thanks,
Rohidas

Managed Property Tags was no longer mapping to the crawled property
ows_taxId_MetadataAllTagsInfo
Once this mapping is done, you need to perform full crawl
Make sure always index all aspx page is checked in search settings
http://webcache.googleusercontent.com/search?q=cache:6uoci4PHcZkJ:sp2013.blogspot.com/2014/01/sharepoint-2013-hashtags.html+&cd=2&hl=en&ct=clnk&gl=in&client=firefox-beta
If this helped you resolve your issue, please mark it Answered

Search/Index documents

Similar Messages

Maybe you are looking for