Word to pdf issues

Hi all,
I have installed Microsoft Office 2007, Acrobat Professional 8 in my LiveCycle server.
While installing the LiveCycle I switch on the "native application" conversion.
Configured the "ConvertAllFileTypesToPDF".
But it throws the word files to "failure".
Here is the .log report, it shows that whether Microsoft word is installed on the server.
All the applications like professional and word have been open individually to check the popup, everything is fine.
What will be the issue.
Please advice.
Failure Time----Thu Oct 20 11:37:57 IST 2011
source location ---- Reason of failure is-----Invocation error.
Invocation error.
ALC-PDG-001-024-An error occurred while launching Microsoft Word. Please check whether Microsoft word is installed on the server.
ALC-DSC-003-000: com.adobe.idp.dsc.DSCInvocationException: Invocation error.
at com.adobe.idp.dsc.component.impl.DefaultPOJOInvokerImpl.invoke(DefaultPOJOInvokerImpl.jav a:152)
at com.adobe.idp.dsc.interceptor.impl.InvocationInterceptor.intercept(InvocationInterceptor. java:140)
at com.adobe.idp.dsc.interceptor.impl.RequestInterceptorChainImpl.proceed(RequestInterceptor ChainImpl.java:60)
at com.adobe.idp.dsc.transaction.interceptor.TransactionInterceptor$1.doInTransaction(Transa ctionInterceptor.java:74)
at com.adobe.idp.dsc.transaction.impl.ejb.adapter.EjbTransactionBMTAdapterBean.doBMT(EjbTran sactionBMTAdapterBean.java:197)
at sun.reflect.GeneratedMethodAccessor306.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at org.jboss.invocation.Invocation.performCall(Invocation.java:345)
at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSessionConta iner.java:214)
at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedConnectionI nterceptor.java:149)
at org.jboss.webservice.server.ServiceEndpointInterceptor.invoke(ServiceEndpointInterceptor. java:54)
at org.jboss.ejb.plugins.CallValidationInterceptor.invoke(CallValidationInterceptor.java:48)
at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.java:106)
at org.jboss.ejb.plugins.AbstractTxInterceptorBMT.invokeNext(AbstractTxInterceptorBMT.java:1 58)
at org.jboss.ejb.plugins.TxInterceptorBMT.invoke(TxInterceptorBMT.java:62)
at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessionInstance Interceptor.java:154)
at org.jboss.ejb.plugins.SecurityInterceptor.invoke(SecurityInterceptor.java:153)
at org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:192)
at org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke(ProxyFactoryFinderInterceptor. java:122)
at org.jboss.ejb.SessionContainer.internalInvoke(SessionContainer.java:624)
at org.jboss.ejb.Container.invoke(Container.java:873)
at org.jboss.ejb.plugins.local.BaseLocalProxyFactory.invoke(BaseLocalProxyFactory.java:415)
at org.jboss.ejb.plugins.local.StatelessSessionProxy.invoke(StatelessSessionProxy.java:88)
at $Proxy160.doBMT(Unknown Source)

Thanks for the response.
My LiveCycle verson is 8.2, I am able to create PDFs from word on this server.
Since I have installed the "Adobe LiveCycle PDF Generator ES" and "Adobe PDF".
Adobe PDF is set to default.
It says that Invoking error.
Please advice.
Here is the other type of error:
Failure Time----Fri Oct 21 16:30:53 IST 2011
source location ---- Reason of failure is-----Failure to invoke the job ConvertAllFileTypesToPDF
Failure to invoke the job ConvertAllFileTypesToPDF
Invocation error.
javax.jms.JMSException: Could not create a session: javax.resource.spi.CommException: javax.naming.CommunicationException: Could not obtain connection to any of these urls: 0.0.0.0:1100 [Root exception is javax.naming.CommunicationException: Failed to connect to server 0.0.0.0:1100 [Root exception is javax.naming.ServiceUnavailableException: Failed to connect to server 0.0.0.0:1100 [Root exception is java.net.ConnectException: Connection refused: connect]]]
Could not create a session: javax.resource.spi.CommException: javax.naming.CommunicationException: Could not obtain connection to any of these urls: 0.0.0.0:1100 [Root exception is javax.naming.CommunicationException: Failed to connect to server 0.0.0.0:1100 [Root exception is javax.naming.ServiceUnavailableException: Failed to connect to server 0.0.0.0:1100 [Root exception is java.net.ConnectException: Connection refused: connect]]]
ALC-DSC-600-000: com.adobe.idp.dsc.provider.service.scheduler.impl.SchedulerRuntimeException: Failure to invoke the job ConvertAllFileTypesToPDF
at com.adobe.idp.dsc.provider.service.file.scan.impl.FileScanJobImpl.invokeJob(FileScanJobIm pl.java:302)
at com.adobe.idp.dsc.provider.service.file.scan.impl.FileScanJobImpl.processInputs(FileScanJ obImpl.java:119)
at com.adobe.idp.dsc.provider.service.scheduler.scan.impl.AbstractScanJob.execute(AbstractSc anJob.java:56)
at sun.reflect.GeneratedMethodAccessor308.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.adobe.idp.dsc.component.impl.DefaultPOJOInvokerImpl.invoke(DefaultPOJOInvokerImpl.jav a:118)
at com.adobe.idp.dsc.interceptor.impl.InvocationInterceptor.intercept(InvocationInterceptor. java:140)
at com.adobe.idp.dsc.interceptor.impl.RequestInterceptorChainImpl.proceed(RequestInterceptor ChainImpl.java:60)
at com.adobe.idp.dsc.transaction.interceptor.TransactionInterceptor$1.doInTransaction(Transa ctionInterceptor.java:74)
at com.adobe.idp.dsc.transaction.impl.ejb.adapter.EjbTransactionCMTAdapterBean.execute(EjbTr ansactionCMTAdapterBean.java:342)
at com.adobe.idp.dsc.transaction.impl.ejb.adapter.EjbTransactionCMTAdapterBean.doSupports(Ej bTransactionCMTAdapterBean.java:212)
at sun.reflect.GeneratedMethodAccessor307.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)

Similar Messages

Font Conversion issues from Word to PDF

I'm using Acrobat Pro 9 and Word 2007. My Word document has Heading styles with numbers, and a table of contents is built upon the styles. Every font is Arial. When I convert the Word document to PDF, the numbering all turns to Times New Roman, but the text remains Arial. Help. How can I fix this?
Linda

You were right, it was a Word numbering style issue. I have finally figured it out and it appears to be working correctly now.
Thanks,
Linda

HT1338 My mac is becoming too slow. It takes long to open word documents, pdf files or excel documents or even safari. Can anybody suggest something? I have tried to reduce the number of open applications, but does not seem to work.

My mac is becoming too slow. It takes long to open word documents, pdf files or excel documents or even safari. Can anybody suggest something? I have tried to reduce the number of open applications, but does not seem to work.

Hi ...
Checked to see how much free space there is on the startup disk lately?
Right or control click the MacintoshHD icon. Click Get Info. In the Get Info window you will see Capacity and Available. Make sure there's a minimum of 15% free disk space.
Freeing Up Hard Disk Space - Mac GuidesFreeing Up Hard Disk Space - Mac Guides
If disk space is not the issue, booting in Safe Mode deletes system caches that may help.
A Safe Mode boot takes longer then a normal boot so be patient.
Once you see the Desktop, click the Apple menu icon top left corner of the screen.
From the drop down menu click Restart.
See if that makes a difference ...

How to upload documents( like word, excel, pdf etc) into r/3 system

Hi All,
           Does anyone have information on uploading and downloading documents like word, excel,pdf etc into R/3 system. Is there any function modules, classes, programs etc to do this.
    To give an example about what i mean by uploading documents, we can take the process of attaching documents in support of a SLM issue in solution manager.

Hi Syed,
Use the FM gui_upload to upload a file from presentation server and gui_download to download file to presentation server. see the code below and also read the documentation of the respective FM's.
data:
    lw_file type string,              " File Path
    lw_file1 type IBIPPARMS-PATH.      " File Path
CALL FUNCTION 'F4_FILENAME'
   IMPORTING
     FILE_NAME            = lw_file1.
lw_file = lw_file1.
CALL FUNCTION 'GUI_UPLOAD'
    EXPORTING
      filename                      = lw_file
      FILETYPE                      = 'ASC'
      HAS_FIELD_SEPARATOR           = 'X'
    tables
      data_tab                      = t_kna1
   EXCEPTIONS
     FILE_OPEN_ERROR               = 1
IF sy-subrc <> 0.
    MESSAGE ID SY-MSGID TYPE SY-MSGTY NUMBER SY-MSGNO
          WITH SY-MSGV1 SY-MSGV2 SY-MSGV3 SY-MSGV4.
ENDIF.                               " IF SY-SUBRC <> 0
data:
    lw_file type string,              " File Path
    lw_file1 type IBIPPARMS-PATH.      " File Path
CALL FUNCTION 'F4_FILENAME'
   IMPORTING
     FILE_NAME            = lw_file1.
lw_file = lw_file1.
CALL FUNCTION 'GUI_DOWNLOAD'
    EXPORTING
*     BIN_FILESIZE                  =
      FILENAME                      = lw_file
      FILETYPE                      = 'ASC'
*     APPEND                        = ' '
      WRITE_FIELD_SEPARATOR         = 'X'
*     HEADER                        = '00'
    TABLES
      DATA_TAB                      = t_KNA1
IF SY-SUBRC <> 0.
    MESSAGE ID SY-MSGID TYPE SY-MSGTY NUMBER SY-MSGNO
          WITH SY-MSGV1 SY-MSGV2 SY-MSGV3 SY-MSGV4.
ENDIF.                               " IF SY-SUBRC <> 0
IF SY-SUBRC = 0.
    message 'file downloaded successfully' type 'S'.
ENDIF.                               " IF SY-SUBRC = 0
With luck,
Pritam.

Trying to convert word to pdf and getting error message using Ver. 7

I what sent a word doc by a correspondent of mine which I tried to convert to a pdf. I right clicked the doc on my desk top and chose "convert to PDF". It looked as it has all the other times but after it had opened word it came up with an error window/message of "Missing PDFmaker files." "Do you want to run the installer in repair mode?" and then a yes or no button. If I hit no it closes everything if I hit yes it acts like it's installing something and after a while it tells me that the computer needs to be restarted for changes to take affect. I have run it twice now and it doesn't change anything.
can someone give me any tips on this?
D

You did not indicate the version of WORD. If it is WORD 2007, then you can not do this function with AA7. Try opening the file in WORD and printing to the Adobe PDF printer. If that works, then look for the PDF Maker options in WORD (create PDF). If you find them, then try that. If they do not show but the print worked, then you indeed have an issue with PDF Maker - probably either the macro is not activated in WORD or the version of WORD is 2007.

Page numbers are not generated correctly when print from Word to PDF

Whenever I print my Word document to PDF, my page numbers do not reflect the page numbers shown in Print Preview.
For example, I have a Word document with 10 pages and after generating it as PDF, the page number shows Page 1 of 1; Page 2 of 2; Page 3 of 3; ......Page 10 of 10.
Please advise what may be the cause and how can I rectify this problem?
Thank you.

Thank you for your reply.
I do have some sections breaks on the pages, however the pages are all shown in running sequence even on Preview page.
It is the final generated document that has this problem.
I emailed my colleagues my WORD document and ask them to generate to PDF and they have no issues printing the WORD to PDF and every pages were in running order.
Any other advise?

Advange of using Adobe Acrobat over Microsoft word 2007 PDF add -in

Hi,
I would like to know what is the advantage of using Adobe Acrobat 9 standard or professional over Microsoft Word 2007 PDF addin. Microsoft will ship the PDF convertor with SP2 later this month, in that case what should be the use of using adobe acrobat to convert a document to PDF and not use microsoft word built in features.
This is just a doubt which arouse and thought if I could ask here, thought to control some adobe acrobat licenses being used in our environment.
Rgds
Rahul Goel

I have found that some of the links and other items just to not work well with the add-in from MS. However, with OFFICE 2007, it appears that MS has put some hooks in the system to screw up Acrobat. Typical pages that are produced with Acrobat get cut into sections (a graphic will be chopped into multiple graphics and pasted together). I say this is OFFICE only from the perspective that we have tried it with OFFICE 2007 and both AA7 and AA8 and gotten the same result. When we used the same file in OFFICE 2003 with AA7, the PDF was fine. So when you complain about the PDFs created by OFFICE 2007 and Acrobat, you may be seeing something that is a result of OFFICE, not Acrobat. Generally they still look good, it is only in some detail where you will see problems.
A major advantage we found with Acrobat is that it did not have issues with messing up the links that the MS add-in had. You can try some of this yourself, but that is a summary of our conclusions. Bill

Word to PDF conversion does not retain left margin (for thesis submission to a university)

Hi everyone,
I've searched through the more recent margin issues posts and haven't found anything that exactly matches my problem. I'm currently trying to convert my thesis from Word to a PDF (I'm using Word 2003 in Windows XP and Adobe Acrobat 8 Professional) as required by my university and the left side margin is changed from 1.5 inches to 1.25 inches during the conversion process. Since this is for submission to a university, the margin has to be exactly 1.5 inches. I've already tried to change the paper and page layout settings in case was an issue related to scaling (if only at this point) and I've reconverted the file in case it was a fluke (obviously it wasn't since I found my way here). I also tried to just shift the margin in Word over an additional 0.25 inches to see if that would put it at 1.5 instead of 1.25 in Acrobat, but somehow it was still at 1.25. It doesn't seem to matter what the margin is set at in Word, it ends up at 1.25 in Acrobat.
I would very much appreciate any input- it would be nice if now that I've finished all the work on my thesis I could actually get it submitted properly so that I can get my diploma!

What method are you using to generate your PDF from Word? What version of Word are you using? (edit: sorry, I see now you mentioned Word 2003)
Are you using the Word's PDF convert-to icon or are you using Print>Adobe PDF ?

Word to Pdf conversion - Margin diffeence

Hi,
I am in the process of converting some word documents to pdf, but the top margin is not consistent b/w word and pdf i,e pdf have greater margin than the base word document. could any one help me in getting the pdf with the same margin as of wrod document after converting.
I'm using Acrobat Disteller 7.0

I am not sure why you are talking about using Acrobat Distiller 7. But let me get back to the issue.
1. Select the Adobe PDF printer in the print menu. Then go back and check your document (WORD will reflow a document to best fit).
2. Check the paper size of the WORD document versus the PDF document. I suspect that you did not select the proper paper in the Adobe PDF printer properties. Maybe you are trying to use A4 and the paper in the printer is Letter.
3. You should either print to the Adobe PDF printer or use PDF Maker (the Adobe icon in WORD) to create the PDF. Do not print to file, just print to the printer and the conversion process should automatically be started.

Converting Robohelp Content to a Word and PDF document.

I am in the process of creating an online help manual with several jpg images. The jpg images are clear in Robohelp, but when I convert the content to a Word or PDF, the content and images are fuzzy and blurry, especially the PDF. Any thoughts or suggestions as to why this is occurring? I did not have this issue with my last project, the clarity, content, and formatting was very clear including the jpg images.
Thanks,
Wendy

Hi Wendy,
there is almost no conversion of images on generating any output from RoboHelp, can you please also specify how those topics with images were created.
if possible please share a sample image and correponding code snippet from RoboHelp.
you can try removing any resizing done to those images and then generate the output.
Ashish

Problems viewing dashboards in Word and PDF

I use Xcelsius 2008, service pack 3 and when I export dashboards to Word and pdf others cannot open and view them. I send them the files via email, some people can open and use, many cannot. I have Windows XP sp3, and Office 2007. This is a problem because some clients can open and use, others cannot.

Hi,
    Craig is on the right track, it is most likely a Adobe Flash Player Version issue.
    Read the Supported Platform guide for Xcelsius 2008 SP3 for more details.
              http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/50fdb3d2-50cc-2c10-e392-a2e481f71694
    Make sure they have Adobe Flash Player 9.0.151.0 and above installed.
    Another quick test they can do, see if they can open SWF file directly on their Internet Explorer. I doubt the issue is related to WORD or PDF at all.
Cheers,
Ken

Word 2007 = PDF

Hi,
I'm french, sorry for my English.
I have Adobe acrobat pro extensed 9.
When i generate a PDF file with a WORD, i have a problems.
In word 2007 :
In PDF :
Thank you for your help

Converting Word (table) to pdf - lines screwed up - googled as far back as 2004.
BUG STILL exists. HELP/FIX PLEASE?
http://www.pcreview.co.uk/forums/missing-table-lines-conversion-pdf-t878406.html
http://forums.adobe.com/thread/305508
Trying to convert any word doc with tables (& shading) to PDF
- basic table, black borders throughout
- shaded headings, black outline border
- shaded subheadings, black outline border
However when convert to PDF:
- 'displays' NO top cell border for some/all shaded rows
- shows diff thickness lines
- each conversion, diff lines missing/incorrectly sized
- however converted pdf prints perfectly fine
Adobe know about the bug, per PRMW's (Paul's) post on 2009-07-15 15:44:34, however only offered a painful time consuming workaround using non-freeware Adobe Pro:
http://acrobatusers.com/forum/pdf-creation/word-pdf-table-lines-missing-or-faded#comment-7 8139
- "It is not feasable to edit 200+ tables in the PDF every time the PDF is generated, as we maintain the original in word.
- "This complete issue seems to have been passed off by Adobe as no problem and that there is a work around. I consider this an unsatisfactory response from a major product supplier.
Microsoft TechNet & NitroPdf said it's an Adobe issue & to contact Adobe to fix the bug.
Tried, but proble exists:
* Word 2010 > File > Save & Send > Create PDF/XPS Document
* Word 2010 > Save As > Pdf
* Word 2010 > Print > PrimoPdf (even tried properties > advanced > dpi 300/600/2400) > Custom
* Word 2010 > Print > doPDF v7 (even tried 'high quality images)
* Word 2010 > Print > PDFCreator
* Word 2010 > Print > CutePdf Writer (even worse)
* Nitro Pdf Reader > Convert From File > (even worse)
* www.pdfonline.com > Word to Pdf (even worse)
* www.wordtopdf.com > email: Sorry, an unexpected conversion failure occurred when converting your file.
Software:
* Word 2010 - tried with .docx & .doc (97 to 2003)
* Adobe Reader 8.2.6 (freeware), then upgraded to Adobe Reader X 10.0.1 (freeware)
* GhostScript 9.01 w32 (freeware)
* CutePdf Writer (freeware)
* PrimoPdf (freeware)
* Nitro Pdf Reader 1.4.0.11 (freeware)
* doPDF 7.2.361 (freeware)
* PDFCreator 1.2.0 (opensource - www.pdfforge.org)
Seems to display better at 300%, but lines still not right (even at 2400%), but who views pdf's at this zoom?
Message was edited by: shell_l_d

Word to PDF - Performance

Hello,
We have a written a custom application in .NET that converts a set of word document to pdf. The PDF generation piece is taking about 10 to 15 secs per document, and also causing few performance issue in terms of CPU usage and not being able to open WORD or PDF in the desktop pc when the application is running.
Below is the piece of code for WORD to PDF generation logic. Please advise if there are any issues in the way the API being used, and also recommend if there is any alternate way that could generated PDFs with a minimum time.
        Dim AcroApp As Acrobat.AcroApp
        Dim AcroAVDoc As Acrobat.CAcroAVDoc
        Dim AcroPDoc As Acrobat.CAcroPDDoc
        Dim openDoc As Boolean
            AcroApp = CreateObject("AcroExch.App")
            AcroAVDoc = CreateObject("AcroExch.AVDoc")
            AcroPDoc = CreateObject("AcroExch.PDDoc")
            For Each docFile As String In Directory.GetFiles(ConfigurationSettings.AppSettings("mailmergeletterspath"))
                openDoc = AcroAVDoc.Open(docFile, "")
                AcroPDoc = AcroAVDoc.GetPDDoc
                AcroPDoc.Save(1, Path.Combine(ConfigurationSettings.AppSettings("doc2pdfpath"), Path.GetFileNameWithoutExtension(docFile) & ".pdf"))
                AcroPDoc.Close()
                AcroAVDoc.Close(True)
            Next
Thanks,
Ashok

Hi, Thanks for that clarification. Could you please help me out or provide some guidance on what would be right way to accomplish this? Any reference to documentations would be appreciated.
Thanks,
Ashok

Word to PDF export - seeing dots within underlines

When I export our Company financial statements from Word to PDF, dots are appearing on the PDF in all instances where there are double underlines within the document.
The weird thing is these dots do not appear on a prinout of the PDF, nor when I magnify the PDF. However, since users log onto our site to download financial statements we have filed in the SEC, they will see this formatting issue.
Does anyone know how to fix this? I have will gladly send an image to anyone needs to know more, for some reason this forum and not letting me upload an example.
Much appreciated,
James

Correct, the Adobe Forums do not permit attachments (attachments became a spam/security issue back in the day and the option was turned off).
To share a file place at an online facility such as acrobat.com or drop box.
Then share the link in a post.
Be well...

Problem indexings hyphenized words in PDFs

Hello everyone on this forum
In the new site we are building, I am using Oracle Text to implement the search functionality.
I have problems indexings hyphenized words in PDFs.
The code I used to create the content table and the Oracle Text index, is like follows
CREATE TABLE JMMC_TST_OracleText( article_id NUMBER PRIMARY KEY
, desc VARCHAR2(30)
, doc BLOB DEFAULT empty_blob()
COMMIT ;
I populated the doc column from a database column in our CMS, containing a PDF document. Just for testing, also populated it from a PDF file, using TOAD for Oracle 8.6.
EXEC CTX_DDL.create_preference( 'jmmc_BSJC_lexer2', 'BASIC_LEXER' );
EXEC CTX_DDL.SET_ATTRIBUTE( 'jmmc_BSJC_lexer2', 'SKIPJOINS', '-' );
EXEC CTX_DDL.SET_ATTRIBUTE( 'jmmc_BSJC_lexer2', 'CONTINUATION', '-' );
CREATE INDEX JMMC_TST_INDEX
ON JMMC_TST_OracleText( doc )
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ( 'LEXER jmmc_BSJC_lexer2
STOPLIST CTXSYS.EMPTY_STOPLIST' );
COMMIT ;
The following sql
select ctx_report.describe_index('JMMC_TST_INDEX') from dual ;
SELECT err_timestamp, err_text
FROM ctx_user_index_errors
ORDER BY err_timestamp DESC;
shows that indexing went without errors, and index was correctly created.
The word: processo
(that in the PDF is hyphenized visually as
........... pro-
cesso .....
) is indexed as 2 tokens instead of just one token, as the following sql shows
select token_text
from dr$JMMC_TST_INDEX$i
where UPPER(token_text) = UPPER('CESSO')
or UPPER(token_text) = UPPER('PRO') ;
The following query returns 1 result
SELECT SCORE(1), article_id , doc
FROM JMMC_TST_OracleText
WHERE CONTAINS( doc, 'pro cesso', 1) > 0 ;
The following query returns 0 results
SELECT SCORE(1), article_id , doc
FROM JMMC_TST_OracleText
WHERE CONTAINS( doc, 'processo', 1) > 0 ;
Strange thing is, several months ago I tried this test with the same PDF, and everything went by without any problem.
The tests were done on different machines, and on both occasions I used Oracle 10.1.0.5.0.
Looks like I'm overlooking something or maybe some obscure setting (of DB, server or system) is causing the problem.
Suddenly hyphenized words in PDFs, stopped being indexed correctly.
Searched in the manuals and in this forum, and could not find a solution. Any help from anyone in this forum ?
Thanks in advance.

Hello everybody on this forum
As the initiator of this thread, I am glad that after some months, someone else is looking at this issue.
To add/clear to the confusion, I have followed Roger Ford suggestion.
Here’s the test I ran
1) Created a minimal test file (using Windows Notepad) with the following content:
ABC-
DEF
Hex view of above file is:
41 42 43 2D 0D 0A 44 45 46 00
A B C - . . D E F .
2) Created test table
CREATE TABLE JMMC_TST_OracleText(
article_id NUMBER PRIMARY KEY
, fmt VARCHAR2(30)
, doc BLOB DEFAULT empty_blob()
The main difference to Roger Ford test case is: my content column is a BLOB instead of a VARCHAR2.
The reason why my doc column is a BLOB, is because the site I’m building, content come from our CMS, and has different types both text and binary (eg Word, PDFs, etc), that I need to index together.
So I use a mixed-content column in a materialized view, to prepare/consolidate/hold all contents I index.
3) I inserted 1 row in above table (using TOAD for Oracle 8.6.), putting my minimal test file in the doc column.
4) Create Preferences and Index
EXEC CTX_DDL.create_preference( 'jmmc_BSJC_lexer2', 'BASIC_LEXER' );
EXEC CTX_DDL.SET_ATTRIBUTE( 'jmmc_BSJC_lexer2', 'SKIPJOINS', chr(45) );
EXEC CTX_DDL.SET_ATTRIBUTE( 'jmmc_BSJC_lexer2', 'CONTINUATION', chr(45) );
COMMIT;
CREATE INDEX JMMC_TST_INDEX
ON JMMC_TST_OracleText( doc )
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ( 'LEXER jmmc_BSJC_lexer2
FILTER CTXSYS.AUTO_FILTER
STOPLIST CTXSYS.EMPTY_STOPLIST
FORMAT COLUMN fmt' );
COMMIT;
Note: the basic lexer SKIPJOINS and CONTINUATION characters, were set the same as the hyphen character used in test file.
5) Tokens indexed:
select token_text from dr$JMMC_TST_INDEX$i
Shows:
ABC
DEF
6) Filter indexed content and generate a plaintext version:
create table JMMC_filtertab (
query_id number
, document clob
commit ;
begin
ctx_doc.filter( 'JMMC_TST_INDEX', '1', 'JMMC_filtertab', '11', TRUE);
end;
Hex view of plaintext version is:
41 42 43 2D 20 20 44 45 46 00
A B C - D E F .
Note that the original end-of-line chars (0D 0A) were replaced by 2 SPACES.
It looks like the filter replaces end-of-line chars by SPACES, and feeds the lexer, with something like:
ABC- DEF (instead of: ABC-DEF) ;
So the poor lexer, sees 2 tokens and has no clue they originally were only one hyphenised token.
This is consistent with what Meta Link Note 124624.1 - Intermedia Text & Continuation Character ('-') in PDF says.
7) Just for comparison the result of Roger Ford test (using a VARCHAR column instead of a BLOB) is:
Hex view of the filtered plaintext version is:
61 62 63 2D 0D 0A 64 65 66 00
a b c - . . d e f .
So the main difference seems to be different filtering behaviour for BLOB or VARCHAR columns, in dealing with end-of-line chars.
I have also tried other combinations of INDEX/LEXER preferences (i.e. SKIPJOINS/CONTINUATION/FILTER/NEWLINE,etc), and different file types (WORD,PDF) which means I also tested with “true binary content” and different end-of-line chars.
No matter what I tried, results were all the same: if I index a BLOB column, I’m not able to index hyphenized lines correctly.
According to the manuals, the CTXSYS.AUTO_FILTER were supposed to deal correctly with mixed-content columns if given the correct information (ie FORMAT COLUMN)
Hope this triggers a response from someone.
Thanks to all the people that took time to look at this problem.

Word to pdf issues

Similar Messages

Maybe you are looking for