Problem with indexing a PDF file

Hi all,
We can most of the time successfully index PDF files, but sometimes it simply fails to do so. Nothing wrong with the code, but it looks to me that some PDF files are not accepted. Is there some sort of a list from Oracle which tells me what version of PDF is accepted or how they should be made so that Oracle can index the content?
Nitai
Message was edited by:
Nitai

Hi,
Reproduced.
I indexed 5 pdf's downloaded from the internet and yours. Your one pdf is the only one not indexed. All are version 1.4 (right-click | properties | pdf) which is fully supported (see http://www.verity.com/cms/groups/public/documents/collateral/mk0459.pdf for a list of supported formats for the keyview filter, referred to as auto_filter in 10g).
The differences between the files are:
Works:
============
Application PScript5.dll Version 5.2
PDF Producer: Acrobat Distiller 5.0.5 (Windows)
Fast Web View: No
Doesn't
============
Application: Adobe InDesign CS2(4.0)
PDF Producer: Adobe PDF Library 7.0
Fast Web View: Yes
I checked to see that the pdf was not simply a scanned image and it wasn't. No errors in CTX_USER_INDEX_ERRORS and the pending record disappears post index sync. No records in the $I table.
A few possible things to check with support:
1) If custom fonts are used check with Oracle support on the filter's ability to extract using the custom font.
2) Are there any known issues with PDF docs generated using InDesign CS2 or with docs created for Fast Web View?
See bug 3814696 and reference it with your service request. The outcome is not published so I don't know how/if they resolved it. You may want to mention this with your TAR since the scenario is similar, and toss in the differences between the "works" and "doesn't" scenarios that I mention above.
Hope it helps,
-Ron

Similar Messages

  • Problem with saving a pdf file to computer. Continually get an error message " This document could not be saved. There was a problem reading this document (21).

    Need advice on a saving file issue. I'm having problem with saving a .pdf file to computer. Continually get an error message " This document could not be saved. There was a problem reading this document (21). This is new as this error message just recently started to pop-up.

    More information about this issue can be found here:
    https://forums.adobe.com/thread/1672655
    A "quick" fix that worked for me was to uninstall Adobe... then download the base install for Adobe Reader 11.0.
    Then download each of the individual updates and run them sequentially. 
    I've installed back up to the last security update which is version 08 and have been able to do normal Save As operations.
    You will have to disable automatic updates in order to stay at version 08 until Adobe resolves this issue in a later release.
    http://www.adobe.com/support/downloads/product.jsp?product=10&platform=Windows
    Adobe Reader 11.0 - Multilingual (MUI) installer    AdbeRdr11000_mui_Std
    Adobe Reader 11.0.01 update - Multilingual (MUI) installer    AdbeRdrUpd11001_MUI.msp
    Adobe Reader 11.0.02 update - All languages    AdbeRdrSecUpd11002.msp
    Adobe Reader 11.0.03 update - Multilingual (MUI) installer    AdbeRdrUpd11003_MUI.msp
    Adobe Reader 11.0.04 update - Multilingual (MUI) installer    AdbeRdrUpd11004_MUI.msp
    Adobe Reader 11.0.05 security update - All languages    AdbeRdrSecUpd11005.msp
    Adobe Reader 11.0.06 update - Multilingual (MUI) installer    AdbeRdrUpd11006_MUI.msp
    Adobe Reader 11.0.07 update - Multilingual (MUI) installer    AdbeRdrUpd11007_MUI.msp
    Adobe Reader 11.0.08 security update - All languages    AdbeRdrSecUpd11008.msp

  • Indexing Problem with FILE_DATASTORE and .pdf files

    Hello all,
    Do any of you have an example showing how to index .pdf files through FILE_DATASTORE? I am able to successfully index text and .doc files but not a .pdf file. Below is the script that I use to index my files:
    create index myindex on mytable(docs)
    indextype is ctxsys.context
    parameters ('datastore COMMON_DIR filter ctxsys.null_filter');
    I am using Oracle 8.1.6
    Thanks you!!!
    -garrett

    I don't think that you are able to index anything else then plain ascii texts, because you are not using the INSO filter.
    Use preferences like this:
    exec ctx_ddl.drop_preference('NO_PATH');
    exec ctx_ddl.create_preference('NO_PATH','FILE_DATASTORE');
    exec ctx_ddl.drop_preference('MY_LEXER');
    exec ctx_ddl.create_preference('MY_LEXER','BASIC_LEXER');
    exec ctx_ddl.set_attribute('MY_LEXER','MIXED_CASE', 'NO');
    exec ctx_ddl.set_attribute('MY_LEXER','INDEX_THEMES','NO');
    exec ctx_ddl.set_attribute('MY_LEXER','INDEX_TEXT', 'YES');
    exec ctx_ddl.drop_Preference ('MY_FILTER');
    exec ctx_ddl.create_Preference ('MY_FILTER','INSO_FILTER');
    exec ctx_ddl.drop_section_group ('MY_SECTION');
    exec ctx_ddl.create_section_group ('MY_SECTION','NULL_SECTION_GROUP');
    drop index i_filenames;
    create index i_filenames on filenames (filename)
    indextype is ctxsys.context
    parameters ('datastore NO_PATH
    section group MY_SECTION
    lexer MY_LEXER
    filter MY_FILTER
    memory 10M
    IMPORTANT is the INSO_FILTER preference.
    Thomas

  • Problem with color in pdf files

    My documents contain color logo.
    Whenever converted into pdf file, I always see color logo in the pdf files.
    Today, the logo came out to be in black and white.
    I repeated many times even with Adobe PDF (printer driver).
    But logo came out in B & W.
    I am wondering how to fix this problem.
    I currently use WinXP Pro SP2-Acrobat Pro 7.10.
    NOte: Last week, v. 7.09 was updated to 7.10.
    I wonder this update contains a bug which produces B&W image.

    Please ignore this post.
    I would like to ask the moderator to delete this post.
    (note; I found that theproblem has nothing to do with pdf but with the original xls invoice douments).

  • Problems with Contribute3/Dreamweaver8 & PDF files

    Does anyone know of problems downloading PDF files that have
    been uploaded with either Contribute 3 or Dreamweaver 8 to a
    server. I seem to be able to read these files in FireFox but not
    IE6 - don't know what I am doing wrong - HELP !!

    May be that the acrobat reader association with IE has got
    corrupted. Why
    not try to download an updated reader, and install it.
    Paul Whitham
    Certified Dreamweaver MX2004 Professional
    Adobe Community Expert - Dreamweaver
    Valleybiz Internet Design
    www.valleybiz.net
    "mitchellbutt" <[email protected]> wrote in
    message
    news:eiaop2$4o0$[email protected]..
    > Does anyone know of problems downloading PDF files that
    have been uploaded
    > with
    > either Contribute 3 or Dreamweaver 8 to a server. I seem
    to be able to
    > read
    > these files in FireFox but not IE6 - don't know what I
    am doing wrong -
    > HELP
    > !!
    >

  • Problem with downloading a PDF file from a webserver run on Linux

    Hi,
    I've written a simple functionality that manages file attachments.
    Everything works fine (attaching, downloading, deleting) when the webserver runs on Windows.
    However when I deployed the code to the Resin webserver run on Linux and use the Win browser to connect to the app, the downloading of PDF file doesn't work (uploading and downloading of txt, doc, xls, jpg files is OK).
    The downloaded PDF file is almost twice as big as original (~28KB when original is ~12KB) and it can't be open.
    I guess it is the problem of writing to the output stream of HttpServletResponse but I can't localize the problem.
    Here is the code I use for downloading file:
    private boolean downloadFile(HttpServletResponse response, String filePath,
                   String originalFilename) {
         File file = new File(filePath);
         String contentType = URLConnection
                   .guessContentTypeFromName(originalFilename);
         // If the content type is unknown set the default value.
         if (contentType == null) {
              contentType = "application/octet-stream";
         BufferedInputStream input = null;
         BufferedOutputStream output = null;
         try {
              input = new BufferedInputStream(new FileInputStream(file));
              int contentLength = input.available();
              response.reset();
              response.setContentLength(contentLength);
              response.setContentType(contentType);
              response.setHeader("Content-disposition", "attachment; 
                           filename=\""+ originalFilename + "\"");
                output = new BufferedOutputStream(response.getOutputStream());
              int bufSize = 10000;
              byte[] buf = new byte[bufSize];
              int bytesNo = 0;               
              while ((bytesNo = input.read(buf, 0, bufSize)) != -1) {
                   output.write(buf, 0, bytesNo);
              output.flush();
              input.close();
              output.close();
         } catch (IOException e) {
              log.debug(e.getMessage());
              e.printStackTrace();
    }Can you point any problem?
    Thanks in advance,
    Ala

    matali wrote:
              int bufSize = 10000;
              byte[] buf = new byte[bufSize];
              int bytesNo = 0;               
              while ((bytesNo = input.read(buf, 0, bufSize)) != -1) {
                   output.write(buf, 0, bytesNo);
    This piece is completely wrong and doesn't work for files bigger than 10000 bytes. Replace it by
    byte[] buffer = new byte[10240]; // 10KB exactly.
    int length = 0;
    while ((length = input.read(buffer)) > 0) { // Read next 10KB of input to buffer.
        output.write(buffer, 0, length); // Write specified length of buffer to output.
    }Or just use output.write(input.read()) in a loop of the contentLength as you're alredy using BufferedInputStream/BufferedOutputStream.
    Another thing, the method is declared to return a boolean, but it actually doesn't? I would just let it throw an exception in case of failure instead of returning a boolean and let the calling method handle the exception.

  • OBIEE 11g having problems with password protected PDF files.

    I have been able to get an analysis in OBIEE 11g to display PDF files.
    However, some of these documents contain sensitive information and must be secured. Since anyone with access to the file name
    could simply type in the proper path in the browser window, this is unacceptable. In order to try and prevent this, I created a pdf file
    that is protected with a password.
    Opening the file by itself, produces the desired results. The password is requested before the file will open.
    When I open the file through my analysis in OBIEE, Adobe reader activates, but the password is not requested and the file does not open.
    It is as if OBIEE is somehow not sensing that Adobee is asking for a password.
    Does anyone have any experience with this?

    FYI, in case anyone is interested, I found out what is going on.
    I created the original password protected PDF using Microsoft Word. I did this because I do not have a full version of Adobe Acrobat that allows me to create files.
    On a hunch, I found someone that has a full version of Acrobat, and had them create a password protected PDF file. This file worked perfectly.
    Apparently, Word is not strictly adhering to PDF guidelines, and OBIEE senses the differences, resulting in the file not opening properly.
    Something to keep in mind for anyone linking to password protected PDF files in OBIEE.

  • Problems with reading a pdf-file using Safari/Adobe Acrobat Pro

    Using Adobe Acrobat Pro 9.5.5. and Safari in Snow Leopard I open the pdf-file of  http://dare.uva.nl/document/505726 and small boxes appear in the text f.i. on page 70, 74, on the last page etc.
    I don't know why those small boxes appear only on my MacBook Pro with the installed software and others don't read a corrupt text. If I click in den pdf-file on the right side of the mouse and choose "open with Adobe Acrobat Pro" the pdf-file opens correctly (without small boxes)!

    As you discovered, Adobe Acrobat/Pro/Reader v.9 will crash for all managed user accounts that store their home folders on the server because it has "difficulty" reading and writing to a folder across a network, which is why it crashes shortly after it launches.
    HOWEVER, there is hope. All you need to do is tell Adobe to write all of its files to the local drive rather than the home folder on the sharepoint, as follows:
    While logged into your network-user account:
    1. Go to the "Shared" folder on the local drive. (Local HD > Users > Shared)
    2. Create a new folder named "9.0_x86" if you have an Intel or "9.0_ppc" if you have a PPC (without the quotes for either case). You will have to authenticate as an Administrator.
    3. Go to the Acrobat folder within the user's Application Support and delete the folder in it.
    (Home > Library > Application Support > Adobe > Acrobat > 9.0_x86 or 9.0_ppc)
    4. Open/launch the Terminal application found within the Utilities folder on you Mac.
    (Applications > Utilities > Terminal.app)
    5. In Terminal, enter the 1st command for Intel or the 2nd for PPC:
    ln -s /Users/Shared/9.0_x86 ~/Library/Application \Support/Adobe/Acrobat/9.0_x86
    OR
    ln -s /Users/Shared/9.0_ppc ~/Library/Application \Support/Adobe/Acrobat/9.0_ppc
    Adobe Acrobat 9 should now work properly. You may get the "quit unexpectedly" error message the first time you quit the application, but should only happen the one time.
    On a side note, the print spool is slow when printing very large multi-paged PDFs.
    This is a fix I found in 2009 by Dennis, can't remember where though.
    Message was edited by: Gabriel Prime

  • Problem with opening of PDF file, generated through smartform

    Recently, I have developed a new program where, the output of smartform will generate a PDF file, and then that PDF file will be mailed to concern persons,
    I have sucessuly done this in development server, and further testing I have transported this to Quality, in Quality i am able to generated a pdf file and able to send it to mail box of concern person, but when concern person tries to open the PDF file, it gives the error stating 'There was an error in opening the document, The file is damaged and could not be repaired'
    But same, I can able to read from development server.
    When I asked the Basis person, they are not aware about it, and , I have make it sure that, I have transported all request to Quality and nothing is pending in development.
    Can you please give me some idea to sort out this issue.
    Thanks in advance
    Rani

    Hi,
    I have same requirement.
    I need to cnvert smartform->pdf->send mail.
    The mail is send but the attachment is corrupted.
    Can you tell me what code you have written so its working in developement?

  • Problem with opening of PDF in my webmail application

    I have a problem with opening a pdf file in my webmail application of my internet provider in apple safari since yesterday. When I wanted to open up a pdf file I got a pop up message to allow a connection between adobe and the webmail application which I did. Since then I can not open up a pdf file. If I go to another browser or if I use email from another internet provider I use too it works and I can open up. It seems to be a blockage of the specific combination of this webmail application and safari.
    What can I do ?

    Hi,
    I have same requirement.
    I need to cnvert smartform->pdf->send mail.
    The mail is send but the attachment is corrupted.
    Can you tell me what code you have written so its working in developement?

  • Adobe PDF iFilter 9 for 64-bit platforms does not index my PDF files with Digital Sign

    Adobe PDF iFilter 9 for 64-bit platforms does not index my PDF files with Digital Sign, why?

    hi  Phillip
    i am not sure what you mean
    I downloaded the ifilter and installed it
    then configured everything as shown in the pdf file
    I tried indexing from scratch exactly as i did successfully in the other computer
    and got some errors in the log file
    i checked the sql server log and the event viewer logs and got :
    Error '0x80004005' occurred during full-text index population for table or indexed view '[Pirsumim_ext_ck].[dbo].[T_PUBLICATIONS]' (table or indexed view ID '2073058421', database ID '14'), full-text key value 0x0000027A. Attempt will be made to reindex it.    
    The component 'PDFFilter.dll' reported error while indexing. Component path 'C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\PDFFilter.dll'.   
    Informational: Full-text retry pass of Full population completed for table or indexed view '[Pirsumim_ext_ck].[dbo].[T_PUBLICATIONS]' (table or indexed view ID '2073058421', database ID '14'). Number of retry documents processed: 1. Number of documents failed: 1.
    Changing the status to MERGE for full-text catalog "Pirsumim_ext_catalog_ck" (5) in database "Pirsumim_ext_ck" (14). This is an informational message only. No user action is required.
    Informational: Full-text Auto population initialized for table or indexed view '[Pirsumim_ext_ck].[dbo].[T_PUBLICATIONS]' (table or indexed view ID '2073058421', database ID '14'). Population sub-tasks: 1
    the same dll worked fine in another computer...
    how can i see more details what is wrong with this dll  ?
    meidad

  • Adobe PDF iFilter 9 for 64-bit platforms does not index my PDF files in SQL server database

    hi all
    I need your help
    i have this asp.net site which works with sql server
    it searches a specific word in the database with full text search
    in my database i have a column of type image that holds a PDF file
    i want my SP to search for given word in my file
    so i installed the Adobe Ifilter
    configured it and every thing worked fine
    but after i moved to production Server and installed every thing the same way ...it did not create the index for the pdf files
    i can search for doc and docx and even for xml but not pdf
    i know i installed it correctly since it is working great in the testing environment
    i used this link for the configuration
    http://www.adobe.com/special/acrobat/configuring_pdf_ifilter_for_ms_sql_server_2005.pdf
    the only difference i see between my Testing environment and my Production environment  is that
    my testing has
    windows sever 2003 sp2 with Sql server 2005 64 bit std sp2   -- works fine
    my production  has
    windows sever 2003 R2sp2 with Sql server 2005 64 bit ent sp3  --does not work, does not index my pdf files
    can you tell me if there are problems with some environments ?
    are there any known problems with this Ifilter ?
    what should i do ..

    Thank you
    I opened it in the Acrobat Windows forum
    http://forums.adobe.com/message/2557155#2557155
    meidad Evyoni

  • Embedding Full-text Index into PDF File

    Hello Everyone,
    I've tried to create and embed full-text index into PDF file, but with no luck. I've followed steps described at http://help.adobe.com/en_US/Acrobat/9.0/Standard/WSC28D4DBB-6A78-4027-9E04-F50FE411CFB9.w. html - there can be seen progress of collecting of data and at the end the button Update index is enabled. This is signal for me that Index was created. After clicking on Ok button, saving document as new one and then reopening "new created" document, there is info that no Index is embedded in the Manage Embedded Index dialog. Is there any other step necessary to do? Or is it bug? Adobe Acrobat Pro 9.1 on Windows Vista 32bit is used.
    Jan
    PS: Interesting is also comment at the bottom of above mentioned help page...

    Thanks for the response. It is true that if I make changes and look at the embedded index status, it shows that it needs updating
    However the problem I can't get around after extensive testing is that sometimes for no apparent reason the index is dropped on save. This can happen if I check the status of the index to make sure it is valid, save the file, and reopen it.
    I've concluded that this must be a bug and am using other indexing options for the time being.

  • Problem with index in merged projects

    Dear Sir,
    I have a problem with index in the master project, I can only
    view the master project index keywords under the index tab. My
    master project contains 2 topics besides the other sub projects. I
    have created index for all sub projects indvidually and at the end
    I created the master project index.
    I made sure that the 'Binary Index' flag is ticked before
    compilation and stil the problem is on, do somebody know what could
    be the problem.

    Hi,
    Two points here.
    Firstly, you can duplicate SSL's. (Right click on an SSL in
    the projects sub-directory and select the duplicate layout option.
    This will give you a copy of the original SSL).
    I tend to have two SSL's - one is local and I use that every
    time I test compile the project when I'm working on it.
    The other is set to output to the master project. I use that
    once I'm happy with the sub-project and start working with the
    master project.
    Second point is to be aware of how RoboHelp treats the
    imported/merged .chm files from the sub projects.
    It imports them into the main project master directory - then
    when the master project is compiled it puts a copy of them into the
    master project SSL. However, if you re-compile it doesn't overwrite
    the files that are already in the master project SSL directory.
    This is why you have to be aware of where the sub project
    .chm files are going - you need to make sure that if you revise the
    sub projects, a fresh copy of the .chm files ends up in the output
    SSL directory of the master project.

  • Problem with creating/viewing PDF's from InDesign CS3

    I have a problem with our PDF workflow and just cannot seem to resolve it.
    The problem is as follows: My coworker and I (both designers running CS3 on iMac's running 10.5.6 Leopard) work daily on producing documents and graphic layouts.
    Internally we can view and print PDF documents we create just fine with no troubles with the exception of our supervisor, who is running a mac with Tiger operating system. Our office environment is both Mac and PC. On may occasions he cannot print PDF's we create. Many times his prints will contain garbled characters, drop italics and formatting, replace fonts, or just print slowly.
    This problem is also happening to our editor who is offsite. This is a fairly serious problem for her, considering her job relies heavily on being able to view and open PDF files we create. She was able to send a PDF file which shows the garbled mess her printer spit out when she printed. Apparently there were pages upon pages of messy garbled text. When documents do print from her, they are usually very slow in printing, taking up to a minute or more to print each page.
    The sample of what she sent me is attached, and can also be found on my MobileMe iDisk at: http://public.me.com/rlcollier (document entitled Print Results.PDF)
    My question really to the community is obviously what might be causing these problems. Its very frustrating not being able to determine if its something we're doing ourselves thats causing some incompatability or corruption in these files, or if its the users systems themselves. I can say that Debra our editor has can have a garbled mess of a 4 page file from us, and then turn around and print a graphic heavy 90 page PDF with ease from Boeing. Our PDF's seem to be the only ones she struggles with. That being said, my inclination is that its something on our end.
    Any ideas of where to start looking? Any help at all would be greatly appreciated and welcomed. Thanks!

    I currently had our editor test printing of some of our files using both Foxit and Adobe Reader (as was suggested) in order to see if either made a difference in her printing ability and here is what she came back with:
    I tried to print out both these pdfs (David's is the one you reworked and Lisa's HESSM-3, both sent yesterday).
    With Adobe:  David's first page printed quickly, but it had errors (part of his pants didn't print, and there's an arbitrary shaded box in the text).  Page 2 didn't print--every time I tried it had a different "offending command" code.  Printing Lisa's HESSM  made it up to page 7 before problems showed up (stock photo only partially printed), and it stopped on page 8 (with the random "offending command" code).
    With Foxit:  Both David's and the HESSM pdfs printed completely and without error...but it took a long time.   David's 2 pages took about 3 to 4 minutes, and HESSM's 16 pages took close to 20 minutes.  The time is in the transfer of data to the printer; the physical printing  goes pretty quickly.
    I cant say that I believe email is the problem, although I cant rule it out. I've tested emailing vs. passing through our workgroup with my supervisor, and it does not make any difference in his ability (or lack of ability) to print our files. He was able to print to a different printer (an HP 4650 as opposed to a 4100) without troubles. He refuses to believe its a printer problem however because PDF files originating from our office are the only ones he has trouble with. Never has he had any trouble with a single PDF file produced from any other source. This is also the case for our editor who only has trouble with PDF files originating from either mine, or my coworkers systems.
    PS: I've attached both files that were referenced by our editor above for viewing/testing.

Maybe you are looking for