Indexing PDFs with legacy Indexing Service

I have a .NET application that uses the old legacy Indexing Service on Windows Server 2003 SP2 (32-bit). What do I need to install to make it index PDFs? I've installed XI Reader (as I understood that the iFilter was built into this now?) and I've restarted Indexing Service and done a full re-sync, but it still doesn't seem to be finding any PDFs (also tried a reboot).
Thanks

It appears that the answer was to uninstall XI and install 9.5. I believe Adobe made changes to the iFilter from 10.x onwards so that it no longer supports an old method that is used by Indexing Service. This thread was useful: http://forums.adobe.com/message/5115337#5115337

Similar Messages

Exporting Large Pdfs with Link Indexes - not working

I have a Large pdf of the Early Church Fathers of 1080 pages with indexs to about 200 chapters... Acrobate will not export past the index pages about 40 to 50 pages then stops and saves file.????? Will not export past 50 pages in Doc, HTML, or Rtf?????
Am I doing something wrong. I need to export to HTML to convert to PALM Plucker output.

Yes - Both Funtions work the Same - EXPORT under File gives same window as
Save AS...
Thanks for the help.. would send pdf but it is 230K over the limit.

Convert dotx or docx to pdf with Word Automation Service failed

Hello everybody,
After search on the internet, I'm looking for a solution to this issue.
I wrote this code for a document conversion in a visual studio 2010 workflow:
string wordAutomationServiceName = "Word Automation Service";
ConversionJobSettings jobSettings = new ConversionJobSettings();
jobSettings.OutputFormat = SaveFormat.PDF;
ConversionJob job = new ConversionJob(wordAutomationServiceName, jobSettings);
job.UserToken = workflowProperties.Site.UserToken;
job.AddFile(workflowProperties.WebUrl + "/" + file.Url,
workflowProperties.WebUrl + "/" + file.Url.Replace(".docx", ".pdf"));
job.Start();
URLs are corrects and the word document exists.
The problem is when the job is executed, I have errors in SharePoint logs:
11/18/2011 09:24:15.87     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Office Viewing Architecture       9rte    Medium     Request received for document 00000001-0001-10e2-80af-d08c970b9892, format: , numberInQueue: 0, request id ba03fb58-55b2-4c6c-b1ca-20fad3b11585
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.87     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Office Viewing Architecture       c7ld    Medium     AppManager.BeginProcessRequest adding request to queue    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Timer Job                         g27p    Medium     Local Controller '71cf62b9-c34c-46c4-9828-55de2d5f5ac0':
In Progress: <http://site/Contracts/docsettest/contracttest.dotx> downloaded and queued locally    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x17C0    Word Automation Services
    Configuration                     g6xc    Medium     Item 00000001-0001-10e2-80af-d08c970b9892: Assigned to
local worker process: 1D64 (7524; worker id = cce33245-48b9-4b0d-afcd-e3218845d81a)    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x1CA0    SharePoint Foundation
    Monitoring                        b4ly    Medium     Leaving Monitored Scope (ExecuteWcfServerOperation).
Execution Time=23.6994391735768    2fd2393d-f36d-49a1-bfdf-737aefc8659a
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       vipp    Medium     AppWorker:cce33245-48b9-4b0d-afcd-e3218845d81a initializing for request ba03fb58-55b2-4c6c-b1ca-20fad3b11585
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       vipr    Monitorable    AppWorker:cce33245-48b9-4b0d-afcd-e3218845d81a worker call failed System.ServiceModel.CommunicationObjectAbortedException: The
communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it has been Aborted.    Server stack trace:      at System.ServiceModel.Channels.CommunicationObject.ThrowIfDisposedOrNotOpen()
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)     at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage
methodCall, ProxyOperationRuntime operation)     at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)    Exception rethrown at [0]:      at System.Runtime.Re...
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88*    w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       vipr    Monitorable    ...moting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)     at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData&
msgData, Int32 type)     at Microsoft.Office.Web.Conversion.Framework.Remoting.IAppChannelCallback.Initialize(WorkerRequest request, FileItem fileItem)     at Microsoft.Office.Web.Conversion.Framework.AppWorker.ProcessRequest(ConversionRequest
request). Worker name WordAutomationServices, Document 00000001-0001-10e2-80af-d08c970b9892    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.88     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Service                           g281    Medium     Local Controller '71cf62b9-c34c-46c4-9828-55de2d5f5ac0':
Failure: <http://site/Contracts/docsettest/contracttest.dotx> not uploaded to <http://site/Contracts/docsettest/contracttest.pdf> (65543)    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.90     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       c78j    Unexpected    AppWorker:cce33245-48b9-4b0d-afcd-e3218845d81a ProcessRequestDone() received error response WorkerException, restarting the worker
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.90     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       b1qa    Medium     Shutting down process with force processId: 7524 belonging to AppWorker cce33245-48b9-4b0d-afcd-e3218845d81a
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.91     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Configuration                     g6xb    Medium     Local Controller '71cf62b9-c34c-46c4-9828-55de2d5f5ac0':
Local worker process exited: 1D64 (7524); exit time = 11/18/2011 09:24:15
11/18/2011 09:24:15.91     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Configuration                     d0md    Medium     App 'Word Automation Service': Deleting temp directory
'C:\Windows\TEMP\wdsrv\21659d2e-c634-46a2-9585-b4cd1398f64c\odsibdmm.cmv\1D64'
11/18/2011 09:24:15.92     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       xpre    Medium     Removing worker cce33245-48b9-4b0d-afcd-e3218845d81a, thread: 216    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.92     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       f2yg    Medium     CreateSandBoxedProcessWorker() is called    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       b10e    Medium     Created desktop: Service-0x0-3eaf55d$\Microsoft Office Isolated Environment     00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       2brt    Medium     AppWorker:89d80fff-43ec-459e-9d95-5ed8b67f20bb worker process is started Exe: WordServerWorker.exe Args: /id 89d80fff-43ec-459e-9d95-5ed8b67f20bb
/convertingService net.pipe://127.0.0.1/WordServer71cf62b9-c34c-46c4-9828-55de2d5f5ac0 /assembly WdsrvWorker.dll /type WACWS /IsBatchedTracing True /LogQuota 100 WorkerType: WorkerType1 Directory: c:\windows\system32\inetsrv, pid : 3700, IsSandBoxed: True,
UniqueSandBoxSid: S-1-5-26473-19571-45394-48    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       vioz    Medium     RemoveWorker isRemoved: True session id : uuid:c9cce13b-5285-47d6-a666-29da19e57c67;id=47, Guid: cce33245-48b9-4b0d-afcd-e3218845d81a
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       b4em    Monitorable    AppWorker:cce33245-48b9-4b0d-afcd-e3218845d81a recycle worker process because the conversion failed with result WorkerException.
Worker is WordAutomationServices    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       xpre    Medium     Removing worker cce33245-48b9-4b0d-afcd-e3218845d81a, thread: 216    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       vioz    Medium     RemoveWorker isRemoved: False session id : uuid:c9cce13b-5285-47d6-a666-29da19e57c67;id=47, Guid: cce33245-48b9-4b0d-afcd-e3218845d81a
00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x211C    Word Automation Services
    Office Viewing Architecture       a2oj    Medium     PreProcessTime = 0; InConversionQueueTime = 0.0019142; ResponseTime = 0.0066997; TotalConversionTime = 0.0535976; AvgPreProcessTime
= 0; AvgInConversionQueueTime = 0; AvgResponseTime = 0; AvgTotalConversionTime = 0; historyCount = 0; result = WorkerException; format = n/a    00000001-0001-10e2-80af-d08c970b9892
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x144C    Word Automation Services
    Office Viewing Architecture       4sig    Medium     ChildProcess WordServerWorker.exe is launched inside worker 89d80fff-43ec-459e-9d95-5ed8b67f20bb. Pid 3700
11/18/2011 09:24:15.93     w3wp.exe (0x1BC4)                           0x144C    Word Automation Services
    Office Viewing Architecture       d9hn    Medium     NotifyNewChildProcessInWorker has seen WordServerWorker.exe in worker 89d80fff-43ec-459e-9d95-5ed8b67f20bb
11/18/2011 09:24:16.45     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       viou    Medium     ... registering worker 89d80fff-43ec-459e-9d95-5ed8b67f20bb
11/18/2011 09:24:16.48     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       viox    Medium     Worker 89d80fff-43ec-459e-9d95-5ed8b67f20bb is now initialized.
11/18/2011 09:24:16.55     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       vipx    Monitorable    AppWorker:89d80fff-43ec-459e-9d95-5ed8b67f20bb application server host exited unexpectedly (thread: 6)
11/18/2011 09:24:16.55     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       c78j    Unexpected    AppWorker:89d80fff-43ec-459e-9d95-5ed8b67f20bb ProcessRequestDone() received error response WorkerCrashed, restarting the worker
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       xpre    Medium     Removing worker 89d80fff-43ec-459e-9d95-5ed8b67f20bb, thread: 6
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       f2yg    Medium     CreateSandBoxedProcessWorker() is called
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       b10e    Medium     Created desktop: Service-0x0-3eb1722$\Microsoft Office Isolated Environment
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       2brt    Medium     AppWorker:59168d75-7086-4318-8d12-633affa7b783 worker process is started Exe: WordServerWorker.exe Args: /id 59168d75-7086-4318-8d12-633affa7b783
/convertingService net.pipe://127.0.0.1/WordServer71cf62b9-c34c-46c4-9828-55de2d5f5ac0 /assembly WdsrvWorker.dll /type WACWS /IsBatchedTracing True /LogQuota 100 WorkerType: WorkerType1 Directory: c:\windows\system32\inetsrv, pid : 6752, IsSandBoxed: True,
UniqueSandBoxSid: S-1-5-26473-19571-45394-49
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x18CC    Word Automation Services
    Office Viewing Architecture       vioz    Medium     RemoveWorker isRemoved: True session id : uuid:c9cce13b-5285-47d6-a666-29da19e57c67;id=48, Guid: 89d80fff-43ec-459e-9d95-5ed8b67f20bb
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x144C    Word Automation Services
    Office Viewing Architecture       4sig    Medium     ChildProcess WordServerWorker.exe is launched inside worker 59168d75-7086-4318-8d12-633affa7b783. Pid 6752
11/18/2011 09:24:16.57     w3wp.exe (0x1BC4)                           0x144C    Word Automation Services
    Office Viewing Architecture       d9hn    Medium     NotifyNewChildProcessInWorker has seen WordServerWorker.exe in worker 59168d75-7086-4318-8d12-633affa7b783
11/18/2011 09:24:17.10     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Office Viewing Architecture       viou    Medium     ... registering worker 59168d75-7086-4318-8d12-633affa7b783
11/18/2011 09:24:17.13     w3wp.exe (0x1BC4)                           0x1CA0    Word Automation Services
    Office Viewing Architecture       viox    Medium     Worker 59168d75-7086-4318-8d12-633affa7b783 is now initialized.
Thank you for your help.

Hi Jean,
Were you able to resolve this? I am coming across the exact same error on a SharePoint 2010 development machine. I don't see any other posts on the web about it. Here is the entry from my ULS logs:
Local Controller 'fc8b8704-f0f1-4e85-a69a-dc5686c27e39': Failure: <http://ip-0a6ee272/Shared%20Documents/Word/hello.docx> not uploaded to <http://ip-0a6ee272/Shared%20Documents/PDF/hello.pdf>
(65543)
Do we share any of the following configuration points? I'm trying to narrow down the potential root cause ...
MSDN subscriber EXE install media "SharePoint Server 2010 with Service Pack 1 (x64) - (English)"
SP1 slipstream patch level. No cumulative updates.
http://autospinstaller.codeplex.com/ PowerShell scripted install
SQL 2008 R2 installed on same box as SharePoint
Active Directory domain controller on same box as SharePoint
c:\Windows\System32\drivers\etc\HOSTS file 127.0.0.1 entry for both machine and domain name
Thanks in advance for the research.
I've actually tried re-installing SharePoint several times on brand new virtual machines. That did not resolve the issue. Strangely enough, the RTM version of SharePoint appears to work just fine. With all other configuration points the
same, I loaded RTM ... ran a Word Automation PowerShell script ... and received the expected PDF output. Then when I apply the SP1 patch ... it stops working and I get error 65543.
Best,
@SPJeff

PDF Generation with LiveCycle Data Services

Hi everybody!
I am using LiveCycle Data Services to generate a dynamic pdf. I worked with this tutorial : livedocs.adobe.com/livecycle/es/sdkHelp/programmer/lcds/pdfgen_1.html and i adapted the code to my own example.
I created a pdf template with livecycle designer and i succeed to generate the pdf thanks to LiveCycle Data Services.
Here is my problem :
In LiveCycle Designer i create a table and bind my data connection (from xml source) to this table and i bind subforms to repeating data.
It works when i try to open an overview of the pdf with LiveCycleDesigner.
But when i try to generate the pdf with livecycle data service, there is no repeat of my data. There is only a number of items corresponding to the minimum of repeating i set in livecycle designer binding window...
Is it possible to generate repeating data with LiveCycle DS ?
an example of my xml source :
<item id="1">
<data>blabla</data>
</item>
<item id="2">
<data>blabla</data>
</item>
In livecycle designer, if i set the minimum of repeating to 1, LiveCycle DS generate a pdf with only one item.
if i set the minimum of repeating to 2, LiveCycle DS generate a pdf with only 2 items. etc.
I don't know how to generate an indeterminate number of items...
Thanks in advance for your help.
Bye
Guillaume

Hi Guillaume,
there is no limitation. Dynamic PDF files can be generated with Livecycle Data Services.
You should have a look at the XML file generated by your Flex code. Try to save it and see how the XML file behaves when you generate a PDF preview with Designer. You can go to the menu: File > Properties > Preview > Use XML test data...
With the XFAHelper class, you can either load a PDF or a XDP file. Have you tried with a XDP ?
I've attached a dynamic PDF file that I've created for a customer. I generate a dynamic PDF file using LiveCycle Data Services. Maybe you'll find some clues within the file.
Michael

TREX indexing problem with PDF files

Hi all,
I use KM to access DMS with the "DMS Connector for KM".
I create an index on my DMS repository.
I have more then 8000 documents. Most of then are PDF files.
Only word document are indexed.
i have read and put in place OSS Note 1008299 and 1031193.
I have error message in trc file TrexPreprocessor :
[4648] 2009-08-27 17:12:21.969 e preprocessor Preprocessor.cpp(00963) : HTTPGET failed for URL http://rixsapfps.sbbio.be:52400/irj/go/km/docs/DS/EDIPUBLICROOTFOLDER%23ZFL%23000%2300/DMS_030%23ZFL%23000%2300/DMS_030_SOP%23ZFL%23000%2300/DMS_030_SOP_50%23ZFL%23000%2300/0000000000000009000005363%23SOP%23000%230249871722A5F41746E1000000C14A8425.pdf with Httpstatus 500
[4648] 2009-08-27 17:12:21.969 e preprocessor Preprocessor.cpp(03553) : HANDLE: DISPATCH - Processing Document with key '/DS/EDIPUBLICROOTFOLDER#ZFL#000#00/DMS_030#ZFL#000#00/DMS_030_SOP#ZFL#000#00/DMS_030_SOP_50#ZFL#000#00/0000000000000009000005363#SOP#000#0249871722A5F41746E1000000C14A8425.pdf' failed, returning PREPROCESSOR_ACTIVITY_ERROR (Code 6500)
Any help is welcome.
Pascal

Dear
Please refer
https://forums.sdn.sap.com/thread.jspa?threadID=1058626
https://forums.sdn.sap.com/thread.jspa?threadID=403393&messageID=3429730#3429730
Regards,
Tushar

Create PDF document from Word with hyperlink index entries

Hello,
I'm having a MS Word 2010 document with a content and index directory, both directories were created with the official Word functions and their page numbers are updated automatically. If I convert this document to a PDF file with Acrobat 9 Pro, the entries within the content directory are hyperlinks (if I click on a chapter the corresponding page opens).
But this doesn't work with the index directory at the end of the document. Where can I activate the hyperlink functionality for index directories?
Thanks for your help,
Devid

Hi,
thanks for this info.
On another computer I have Acrobat X Pro installed, but the result is the same. Or did I missed something?

Plz help: how can i index multiple directories including pdfs with oracle text??

problem:
i habe several subdirectories with pdf files which must be indexed by a fulltext index.
.../dir/
sub_dir1/
1.pdf
2.pdf
sub_dir2/
3.pdf
4.pdf
it's possible that other users create new subdirs.
try #1:
i tried to update the FILE_DATASTORE parameter PATH with the concatenated directory list
i.e.: (.../dir/subdir1:.../dir/subdir2:...) and updating the index.
that fails, because the directory string is too long (1637 chars)
try #2:
i set the FILE_DATASTORE PATH parameter to the basedir
i.e.: ('.../dir')
now i generate a list of all pdf's including the subdirectories to store them into
a new table.
i.e.: '12345', 'subdir1/1.pdf'
'23456', 'subdir1/2.pdf'
this one fails, 'cause it seems that the database uses some kind of basename() function to
get the "filename_only" part of the table entry 'subdir1/1.pdf' => '1.pdf'.
so, the db fails to open (and indexing of cause) the file.
how can i solve this prob?
thanks in advance!!!
best regards.
/achim

If you need to use multiple directories, you'll need to put the full directory and filename into the table, and not use the PATH attribute at all. PATH only works where all files are in the same directory (though you MAY find you can use more than one directory on certain OS's).
- Roger

Indexing pdf documents with indextype ctxsys.context

I have an application that stores the contents of uploaded documents in BLOB data fields. We provide web pages which search through the uploaded documents based on text entered by the user. We currently upload both MS Word .doc and HTML documents. For the HTML documents, which are made available to the public, we index the table with the following procedure:
CREATE OR REPLACE procedure WEBADMIN.index_redacted_docs is
begin
declare
cur           PLS_INTEGER;
exec_int           PLS_INTEGER;
counter          number;
begin
select count(*) into counter
from user_indexes
where index_name = 'DOCS_CTX_REDACTED_IDX';
if (counter = 1) then
ctx_ddl.sync_index (idx_name => 'docs_ctx_redacted_idx');
else
cur := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE (cur, 'create index docs_ctx_redacted_idx on documents_ctx_redacted (blob_content) ' ||
     'indextype is ctxsys.context parameters (''filter ctxsys.null_filter'')', DBMS_SQL.NATIVE);
exec_int := DBMS_SQL.EXECUTE (cur);
DBMS_SQL.CLOSE_CURSOR (cur);
end if;
exception
when others then
     DBMS_SQL.CLOSE_CURSOR (cur);
     raise;
end;
end;
We run this process after every uploaded HTML file and are able to locate documents which contain any text entered by the user. The portion of the command we use to query the documents_ctx_redacted table (blob_content is the BLOB field in this table) is (using "corn" as a sample query text):
WHERE (contains (BLOB_CONTENT, 'corn', 10) > 0)
Our customer is now asking that PDF files be uploaded as well and searched in the same manner. After the PDF files are uploaded (into the same table as the HTML files) and the index updated, with the above command ctx_ddl.sync_index (idx_name => 'docs_ctx_redacted_idx'), since the index already exists, we cannot get any rows returned with the above WHERE (contains .... ) clause. We know the text we're looking for (such as "corn") is contained in the PDF files, but the search does not find them, although it finds the HTML documents just fine. I've also tried dropping the index entirely and recreating it, but that also only finds the HTML documents but not the PDF's.
What are we doing incorrectly with the PDF files? Thanks.

We are using Oracle version 10.2 . I looked at the relevant Oracle Text documentation for that version, and the best I could glean was that PDF files are supported by the filter ctxsys.auto_filter (rather than null_filter) when creating the index. I dropped the existing null_filter index and created a new index with the auto_filter parameter, but the end result was the same. I still get no PDF records found when issuing the command (using "corn" as the text query)
WHERE (contains (BLOB_CONTENT, 'corn', 10) > 0)
although the HTML records show up fine again.

Indexing Problem with FILE_DATASTORE and .pdf files

Hello all,
Do any of you have an example showing how to index .pdf files through FILE_DATASTORE? I am able to successfully index text and .doc files but not a .pdf file. Below is the script that I use to index my files:
create index myindex on mytable(docs)
indextype is ctxsys.context
parameters ('datastore COMMON_DIR filter ctxsys.null_filter');
I am using Oracle 8.1.6
Thanks you!!!
-garrett

I don't think that you are able to index anything else then plain ascii texts, because you are not using the INSO filter.
Use preferences like this:
exec ctx_ddl.drop_preference('NO_PATH');
exec ctx_ddl.create_preference('NO_PATH','FILE_DATASTORE');
exec ctx_ddl.drop_preference('MY_LEXER');
exec ctx_ddl.create_preference('MY_LEXER','BASIC_LEXER');
exec ctx_ddl.set_attribute('MY_LEXER','MIXED_CASE', 'NO');
exec ctx_ddl.set_attribute('MY_LEXER','INDEX_THEMES','NO');
exec ctx_ddl.set_attribute('MY_LEXER','INDEX_TEXT', 'YES');
exec ctx_ddl.drop_Preference ('MY_FILTER');
exec ctx_ddl.create_Preference ('MY_FILTER','INSO_FILTER');
exec ctx_ddl.drop_section_group ('MY_SECTION');
exec ctx_ddl.create_section_group ('MY_SECTION','NULL_SECTION_GROUP');
drop index i_filenames;
create index i_filenames on filenames (filename)
indextype is ctxsys.context
parameters ('datastore NO_PATH
section group MY_SECTION
lexer MY_LEXER
filter MY_FILTER
memory 10M
IMPORTANT is the INSO_FILTER preference.
Thomas

Install 3rd party PDF iFilter for index PDF file as attachment in e-mail (msg)

I have called Microsoft Permium Support, base on the reply, SharePoint 2013 does not support to index a PDF file attachment in E-mail (msg) except 3rd party iFilter installed. And they finally told me how to edit Windows Registry for install the Adobe iFilter.
But, the Adobe iFilter is too weak to call large PDF files. So, I would like to install and try the Foxit PDF iFilter, but I cannot find an installation guide for this 3rd party ifilter with SharePoint 2013.
Does anyone here have the experience for Foxit PDF iFilter with SharePoint 2013 can help me?
I am not sure it is bug or feature in SharePoint 2013, but in case I still have to install 3rd party iFilter for index PDF file. I have no idea what is the out of box pdf file indexing support for.

You ca plan to use Foxit.
steps are nearly the same which we use in sharepoint 2013
1. We need to update registry for pdf . Registry value is {987f8d1a-26e6-4554-b007-6b20e2680632}
2. we need to install the foxit ifilter
Here are steps for same
http://support.microsoft.com/kb/2293357
3. run below command:
net stop spsearch4
net start spsearch4
net stop osearch14
net start osearch14
Check below:
http://bjarnegram.wordpress.com/2011/07/13/installing-foxit-pdf-ifilter-on-sharepoint-server-2010/

Creation of rules index failing with ORA-01652 exception

I am trying to create a rules index in the following way,
BEGIN
     SEM_APIS.CREATE_RULES_INDEX(
     'APPS_RDF_IDX',
     SEM_Models('SEMANTIC_SEARCH_MODEL'),
     SEM_Rulebases('OWLPRIME','SEMANTIC_SEARCH_RULEBASE'));
END;
with semantic_search_rulebase having about 5 rules and with 28839 triples in the model.
When I am trying to run create index it fails after a long time by throwing exception
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
though TEMP is allocated 5GB memory.
Please clarify me on the following questions,
1. How much TEMP space should be allocated if the triples are going to be in millions and rules at about 10 to 100 and why is indexing taking a lot of TEMP space with a less amount of triples.
2. How much time normally would create rules index take with triples of size from thousands to millions.
3. How to make the create rules index run faster.
Thanks,
Phani

First of all, please start using create_entailment API instead of that create_rules_index API.
Regarding 1), 5GB temp space is not a whole lot.
It is hard to say exactly how much you need because you have user defined rules.
Regarding 2) and 3), please check out the following inference best practice paper.
http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_infer_bestprac_wp.pdf
Also, if you like, please post your rules and I may be able to help you model
some of your rules using native OWL constructs.

BizTalk 2006 Event Log Warnings - Cannot insert duplicate key row in object 'dta_MessageFieldValues' with unique index 'IX_MessageFieldValues'.

We have been seeing the following 'warnings' in the event log of our BizTalk machine since upgrading to BTS 2006. They seem to occur randomly 6 or 8 times per day.
Does anyone know what this means and what needs to be done to clear it up? we have only one BizTalk server which is running on only one machine.
I am new to BizTalk, so I am unable to find how many tracking host instances running for BizTalk server. Also, can you please let me know that we can configure only one instance for one server/machine?
Source: BAM EventBus Service
Event: 5
Warning Details: Execute batch error. Exception information: TDDS failed to batch execution of streams. SQLServer: bizprod, Database: BizTalkDTADb.Cannot insert duplicate key row in object 'dta_MessageFieldValues'
with unique index 'IX_MessageFieldValues'. The statement has been terminated..

Other than ensuring that there exists a separate and single tracking host instance, you're getting an error about duplicate keys.. which implies that you're trying to Create a BAM Activity twice with the same data.
I suggest you have a in-depth examination of the BAM (TPE or API) associated with the orchestration. In TPE ensure that the first binding you select is the "Instance Id" or "Message Id" before going ahead to map the ports or others.
Regards.

Oracle XE 10.2.0.1.0 - Problem indexing PDF

I am using Oracle XE 10.2.0.1.0 with Czech national settings set.
I need to make PDF with czech national characters working. Indexing TXT, HTML and DOC2003 documents with the same content works fine.
Below is my configuration.
h1. National Language Support
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET AL32UTF8
NLS_COMP BINARY
NLS_CURRENCY Kč
NLS_DATE_FORMAT DD.MM.RR
NLS_DATE_LANGUAGE CZECH
NLS_DUAL_CURRENCY Kč
NLS_ISO_CURRENCY CZECH REPUBLIC
NLS_LANGUAGE CZECH
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS ,.
NLS_SORT CZECH
NLS_TERRITORY CZECH REPUBLIC
NLS_TIME_FORMAT HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT DD.MM.RR HH24:MI:SSXFF
NLS_TIMESTAMP_TZ_FORMAT DD.MM.RR HH24:MI:SSXFF TZR
NLS_TIME_TZ_FORMAT HH24:MI:SSXFF TZR
h1. Datastore
PDF url: http://www.mpsv.cz/files/clanky/6981/tiskove_avizo_CJ.pdf
I renamed the pdf to SummitKZamestnanosti_PDF-40.
CREATE TABLE file_datastore
id NUMBER PRIMARY KEY,
fmt varchar2(10),
docs VARCHAR2(2000)
-- INSERT data INTO File Datastore TABLE
INSERT INTO file_datastore VALUES
(111560,'binary','C:\Docs\SummitKZamestnanosti_PDF-40'
-- Configure DATASTORE
EXEC ctx_ddl.drop_preference('NO_PATH');
EXEC ctx_ddl.create_preference('NO_PATH','FILE_DATASTORE');
-- Configure LEXER
EXEC ctx_ddl.drop_preference('LEXER');
EXEC ctx_ddl.create_preference('LEXER','BASIC_LEXER');
-- CREATE CONTEXT INDEX
DROP INDEX idx_file_datastore_text FORCE;
CREATE INDEX idx_file_datastore_text ON file_datastore
( docs )
indextype IS ctxsys.context parameters
( 'format column fmt Datastore NO_PATH filter ctxsys.AUTO_FILTER lexer LEXER' );
-- QUERIES that should work.
-- SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'Bližší',1 ) > 0;
SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'dopadů ',1 ) > 0;
-- SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'sociálních',1 ) > 0;
SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'Španělsko',1 ) > 0;
-- SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'Švédska',1 ) > 0;
SELECT id, docs, score(1) FROM file_datastore WHERE contains ( docs,'Summit',1 ) > 0;
h1. My Output
ID DOCS
0 rows selected
ID DOCS
0 rows selected
ID DOCS
111560 C:\Docs\SummitKZamestnanosti_PDF-40
1 rows selected
Regards

I have used the 100% same configuration as above, but now for the Oracle Database 11g R1 11.1.0.7.0 – Production instead of Oracle 10g XE.
The result is that AUTO_FILTER for Oracle 11g is able to parse Czech language characters from the sample PDF file without any problems.
The problem with Oracle Text 10g R2 may be I guess:
1. In embedded fonts as mentioned in the Link: [documentation | http://download-west.oracle.com/docs/cd/B12037_01/text.101/b10730/afilsupt.htm] (I tried to embbed all fonts and the whole character set, but it did not helped)
2. in the character encoding of the text within the PDF documents.
I would like to add that also other third party PDF2Text converters have similar issues with the Czech characters in the PDF documents – after text extraction Czech national characters were displayed incorrectly.
If you have any other remarks, ideas or conclusions please reply :-)

How to configure one TREX host with multiple index servers ?

Hi All,
Does anyone know how to configure TREX on the one host,
with multiple index servers ?
Reason for this is to make better use of resources available on the host server(4 Gig, 4 Processor, Windows2003), to improve the search performance of
our KM content for portal users.
I am using TREX 7 and have not been able to do this,
despite reading the Single and Distributed install
documentation.
Any help would be appreciated.
Regards,
Andres

Hi Andres,
To make use of the RAM a Server provides you have to run two indexserver processes (each can then consume 2 GB);
Proceed like this:
1. Go to TREXdeamon.ini; check if section [indexserver2] is there (it is already provided, but not active in standard installation)
2. In TREXdeamon.ini go to
[daemon]
references sections below
programs=nameserver,preprocessor1,indexserver1,queueserver,alertserver
and add indexserver2 here. Restart TREX; second porcess is then started; can be checked in TREX monitor in Portal as well
3. To distribute existing indexes to the new process, start TREXadmintool and go to Index: Landscape
Go to the last two columns and move the indexes (move master here/secondary mouse click)
If you don't distribute the indexes the new index server process will be regarded when an new index is created.
Hope this helps!
cheers
Bettina

Index usage with nls_sort and nls_comp

Hi,
I have created a logon trigger
CREATE OR REPLACE TRIGGER "SYS"."ON_LOGON_SET__SCHEMA" AFTER
LOGON ON DATABASE BEGIN
EXECUTE IMMEDIATE 'alter session set NLS_SORT=BINARY_CI';
EXECUTE IMMEDIATE 'alter session set NLS_COMP=LINGUISTIC';
EXCEPTION
WHEN OTHERS THEN NULL;
END;
because the user does not want case sensitive searches in the database.
However, when using this, Indexes on text fields are no longer used. What should I do with those indexes?
Regards

Possible answers explained in MOS GENERIC_BASELETTER Linguistic Definition [ID 109118.1] and Linguistic Sorting - Frequently Asked Questions [ID 227335.1] depending on what your users want to happen.

Indexing PDFs with legacy Indexing Service

Similar Messages

Maybe you are looking for