Full text PDF indexing for website search?

Hi.  We run a couple of websites on CQ5.5 and are trying to get the PDF files we refer to in the DAM to show in search results that users conduct on our sites.  I've seen a number of references that imply that full text searching of PDFs is possible.  For example:
http://dev.day.com/docs/en/crx/current/developing/searching_in_crx.html#Full-Text%20Extrac tion
But thus far I've not been able to figure out what I must do to get it working.  I had expected that if this were possible to do, then it would have worked with the Geometrixx demo site.  It did not.
Am I chasing my tail here, or is there actually a way to get this done?  If it's possible, links to documentation on how to configure indexing_config.xml and any other required files would be greatly appreciated.
Thanks.

Laurent,
We never got a definitive answer, but we have suspicions that it was due to having upgraded from CQ 5.4 to 5.5.  It seems that the libraries used for the indexing changed during that version upgrade.  When I took our application and installed it on a pristine 5.5 installation, the PDF indexing worked.  It was only our existing installations (two staging, two production) that did not work.  So at least we know it's not our application or CQ in general.
Sadly, we don't have the resources to rebuild our servers, and we also ran into a separate problem that would prevent us from using the indexing anyway.  It seems that there is no way to prevent cross-site results if you have multiple sites on the same CQ install and they each have their own sections in the DAM where the PDF files are stored.  Would take some custom code to get around the issue, it seems.
For example, you have site A and site B.
/content/a  <- Main site A content for pages
/content/b
/content/dam/a <- Site A's files in the DAM
/content/dam/b
There is no stock way, that I am aware of, to keep searches on site A from turning up PDF results from /content/dam/b (for site B), and vice versa.  That's enough to keep us from using it - a total deal breaker.

Similar Messages

  • ESH_ADM_INDEX_ALL_SC cannot perform initial indexing for all search connectors

    Dear SAP Gurus,
    We are implementing TREX version 7.10.50 for Talent Management ECC 6.0 - EHP 5.
    I'd like to ask you question regarding ESH_ADM_INDEX_ALL_SC program
    which used to create search connector for TREX and perform initial indexing for all search connectors.
    As we know we can perform indexing using  ESH_COCKPIT transaction code or use ESH_ADM_INDEX_ALL_SC.
    If I try to perform indexing using ESH_COCKPIT, all search connectors can be indexed ("searchable" column are "checked" and status are changed to "Active" for all search connectors).
    However, if I try to perform indexing using ESH_ADM_INDEX_ALL_SC, not all search connectors are indexed.
    I've traced the program ESH_ADM_INDEX_ALL_SC using ST01 transaction code and found these error:
    - rscpe__error 32 at rscpu86r.c(6;742) "dest buffer overflow" (,)
    - rscpe__error 32 at rscpc   (20;12129) "convert output buffer overflow"
    - rscpe__error 128 at rstss01 (1;178) "Object not found"
    Please kindly help me to solve this issue,
    Thank you very much
    Regards,
    Bobbi

    Hi Luke,
    Please find below connectors and the status after running ESH_ADM_INDEX_ALL_SC:
    HRTMC AES Documents Prepared
    HRTMC AES Elements Prepared
    HRTMC AES Templates Prepared
    HRTMC Central Person Prepared
    HRTMC Functional Area Prepared
    HRTMC Job Prepared
    HRTMC Job Family Prepared
    HRTMC Org Unit Prepared
    HRTMC Person Active
    HRTMC Position Prepared
    HRTMC Qualification Active
    HRTMC Relation C JF 450 Active
    HRTMC Relation C Q 031 Active
    HRTMC Relation CP JF 744 Active
    HRTMC Relation CP P 209 Active
    HRTMC Relation CP Q 032 Active
    HRTMC Relation CP TB 743 Active
    HRTMC Relation FN Q 031 Active
    HRTMC Relation JF FN 450 Active
    HRTMC Relation JF Q 031 Active
    HRTMC Relation P Q 032 Active
    HRTMC Relation S C 007 Active
    HRTMC Relation S CP 740 Active
    HRTMC Relation S JF 450 Active
    HRTMC Relation S O 003 Active
    HRTMC Relation S O Area of Responsibility Active
    HRTMC Relation S P 008 Active
    HRTMC Relation S Q 031 Active
    HRTMC Relation S S Manager Active
    HRTMC Relation SC JF FN Active
    HRTMC Structural authority Active
    HRTMC Talent Group Prepared
    As suggested by OSS, we implement SAP Note 1058533.
    Kindly need your help.
    Thank you very much
    Regards
    Bobbi

  • DEFAULT Heading, Title, Main Text...for google search result??

    Hi,
    Every pages when we added in iWeb it'll come with some default Text Box shown as
    *"Type a heading for your webpage here", "Type the main text for your page here", "Type the title for the page" ...*
    Do the above default text box helps for Google search result?
    Any other useful purpose for that?

    mactreouser wrote:
    What bout Adding a Text Box? Isn't the same thing of the Default Title Box? Or it must be place at the Top ? or it already pre-set for search engines?
    You could find out if its the same thing by doing the following:
    Add a text box to your home page, publish your iWeb site to a folder, click on +"Visit Site Now"+ and then in Safari do +View > View Source+. Then look for the title tag that the article talks about:
    <title>Your title here</title>

  • Adobe PDF IFilter for document searches does not work

    I am new to full text indexing of documents but I know enough to get the files into the database and apply indexing and searches because I got it to work for Word (.doc) files.  
    I'm trying to get Adobe's Ifilter version 11 to work in Windows 7 x64.  I'm using Sql Server 2012 Express with Advanced Services sp1. I have included the full path to the /bin folder for the PDF dll in my PATH environment variable per the instructions.
    Register ifilters (after install)
    EXEC sys.sp_fulltext_service 'load_os_resources', 1;
    Verify that the .pdf filter is installed:
    EXEC sys.sp_help_fulltext_system_components 'filter';
    This is the row I get for PDF which I delimited with ';'. The underline portion is what I have in PATH env variable.
    filter; .pdf; E8978DA6-047F-4E3D-9C78-CDBE46041603; C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll; 11.0.1.36;  Adobe Systems, Inc.
    The file content column is
    content VARBINARY(MAX) NOT NULL
    I insert the file with
    INSERT INTO dbo.Documents (filename, doctype, content)
    SELECT
     N'MyFile',
     N'pdf',
     bulkcolumn
    FROM OPENROWSET(BULK 'C:\MyFile.pdf', SINGLE_BLOB) AS doc;
    I reboot the machine and rebuild the Full Text Catalog after installing the PDF iFilter.
    Then I search with one of these.  There are Word and PDF files that contain 'apple'.
     SELECT id, filename, doctype FROM dbo.Documents WHERE FREETEXT(content, N'apple');
     SELECT id, filename, doctype FROM dbo.Documents WHERE CONTAINS(content, N'apple');
    Now this all works well for .doc files but .PDF files never show up in searches.  I have tried both version 9 and version 11 to no avail.
        

    Hello,
    We believe we have figured this out.  It looks like it has to do with the length of the default folder location for the Adobe iFilter.
    I was able to reproduce the issue and the following resolved it for me.  See if this resolves it for you all as well.
    Here is how to get Adobe Version 11 PDF filter to work.
     1 . If you haven’t already, run the following in SQL Server:
    Sp_fulltext_service ‘Load_os_resources’, 1
    Go
    --you might also need to run: 
    sp_fulltext_service ‘Verify_signature’,0  --This is used to validate trusted iFilters. 0 disables it. So use with caution.
    --go
    2. Stop SQL Server.  (Make sure FDHost.exe stops)
    3.  
    Uninstall the Adobe ifilter (because it defaulted to having spaces or the folder name is too long).
    4.  
    Reinstall the Adobe iFilter and when it prompts for where to install it, change it to: C:\Program Files\Adobe\PDFiFilter
    5.  Once the installation finishes, go the computer’s Environment variables. Add the following to the PATH.
    C:\Program Files\Adobe\PDFiFilter\BIN
    NOTE: it must include the BIN folder
    NOTE: If you had the OLD location that included spaces, remove it from the path environment variable.
    6. Start SQL Server
    7.  IF you had an existing Full-text index on PDFs, drop the full-text index and recreate it.
    8. You should now get results when you run sys.dm_fts_index_keywords('db','tblname')  --Note: Change db to be the actual database name and tblname to be the actual table name.
     Give this a try and see if this fixes yours. 
    Sincerely,
    Rob Beene, MSFT

  • Configuring Browsing Indexes for Service Search Descriptor Filters

    I am running DSEE 6.1 on Solaris 10.
    I restrict access to the ldap clients (solaris8, 9, and 10) for various users in the Directory by configuring the service search descriptors to use a filter based on specific roles. Each servers profile mentions a role depending on type of server and then users are assigned roles which are nested within specific server type roles:
    NS_LDAP_SERVICE_SEARCH_DESC= passwd:ou=People,dc=example,dc=com?one?nsrole=cn=serverRole,ou=profile,dc=example,dc=com
    NS_LDAP_SERVICE_SEARCH_DESC= group:ou=group,dc=example,dc=com?one
    NS_LDAP_SERVICE_SEARCH_DESC= audit_user:ou=People,dc=example,dc=com?one?nsrole=cn=serverRole,ou=profile,dc=example,dc=com
    NS_LDAP_SERVICE_SEARCH_DESC= shadow:ou=People,dc=example,dc=com?one?nsrole=cn=serverRole,ou=profile,dc=example,dc=com
    NS_LDAP_SERVICE_SEARCH_DESC= user_attr:ou=People,dc=example,dc=com?one?nsrole=cn=serverRole,ou=profile,dc=example,dc=com
    I have noticed in my error logs on the Directory servers messages regarding these filters not being indexed:
    WARNING<20805> - Backend Database - conn=949139 op=1 msgId=2 - search is not indexed base='ou=people,dc=example,dc=com' filter='(nsRole=cn=serverRole,ou=profile,dc=example,dc=com)' scope='one'
    I have also had a few instances where the naming services seems to have stopped altogether. This seems to be timed with when my clients do a refresh of the ldap cache - which is the time that I seed the not indexed messages in the error log.
    I guess that I need to set up Browsing Indexes for these filters
    Can anyone give examples how to do this?
    I guess I will need a vlvBase of ou=people,dc=example,dc=com
    vlvScope of 1
    vlvFilter of nsRole=cn=serverRole,ou=profile,dc=example,dc=com
    I am not sure what I would do for vlvsort attributes though??

    The access logs shows that the attributes to be sorted are uid and cn:
    25/Apr/2008:09:58:21 +1200] conn=171835 op=1 msgId=2 - SRCH base="ou=people,dc=example,dc=com" scope=1 filter="(nsRole=cn=serverRole,ou=profile,dc=example,dc=com)" attrs="cn uid uidNumber gidNumber gecos description homeDirectory loginShell"
    [25/Apr/2008:09:58:21 +1200] conn=171835 op=1 msgId=2 - SORT cn uid (1426)
    [25/Apr/2008:09:58:21 +1200] conn=171835 op=1 msgId=2 - VLV 0:999:0:0 1:1426 (0)
    [25/Apr/2008:09:58:26 +1200] conn=171835 op=1 msgId=2 - RESULT err=0 tag=101 nentries=999 etime=5 notes=U
    So the vlvsort attributes should be cn and uid.

  • PDF indexing and multiple searches.

    Dear members:
    Please forgive me if my question is rather basic but I haven't been able to find the exact answers I am looking for in order to address my project needs.
    I have a folder where I keep all of my PDF files. These are all articles from medical journals that I keep organized using a browser application specific for these types of articles. The application allows me to search these articles but it only looks for specific keywords (title, author name, date, journal name and keyword just to name a few). However, it doesn't look at the content of the PDF file to find words that are contained in the body of the article itself.
    I would like to be able to use Acrobat to search these articles and try to find words I am looking for in the entire article instead of being restricted only to keywords. These are the questions I have:
    1. What is the best way to index these PDF files so that they can become searchable ?
    2. Is there a way to find out if they have already been indexed by the publishing company so that I avoid wasting time by doing it again ?
    3. My library now contains approximately 15,000 articles and I expect it to grow to at least 30,000. How can I handle these searches so that performance doesn't become an issue ? Is there a way to ensure that Acrobat can search these number of files without taking a long time ?
    4. I understand from the help files that Acrobat can search an entire folder so I don't have to run my search one article or file at a time. Is this correct ? What is the best way to run my search so that Acrobat looks at all files in one folder ? In this folder I have subfolders (subdirectories) ? Will Acrobat look at all files when searching including those in subdirectories within the specified directory ?
    Thank you in advance for your help and replies.
    Best regards,
    Joseph Chamberlaini

    After creation the index you need execute next operations.
    first, check that your index tables conatins indexed terms. Execute
    select token_text from dr$YOUR_INDEX$i;
    Second, you will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
    If it's OK, then check that your PDF documents is supported by INSO filter.
    Citation:
    "PDF - Portable Document Format
    Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF"
    (Appendix B. Supported Document Formats in Oracle Text Reference 9.2)
    For Oracle 9i you could install 9.2.0.4 patchset (it included INSO FILTER 7.5)
    P.S.
    for the beginning, you could find answers for your question about Oracle Text here
    http://otn.oracle.com/products/text
    Sorry for my English.
    Best regards, Victor Zogin.

  • "Filter/partition key" for full-text searching

    Hi there,
    We have a challenge whereby we have a table of products by store, each store having say 200,000 products.  Basically, for each store, we want to allow searching by product name.  The best solution for this is to have full-text searching, but there
    is no way to have a "filter" or "partition" key on the store ID.
    So in essence what happens, the full-text search scans the entire full-text catalog for the products, then it uses the primary key to match to the table and then filters out the other stores.  Considering we have hundreds of stores in the table, this
    is not a good solution.
    We contemplated adding separate indexed views and full-text catalogs for every store, but this would be a nightmare to manage.
    I was expecting to see some sort of a "partition by Column" in the full-text indexes, but it doesn't exist.  This basically means we have to scrap full-text and look for a third party solution.
    Does anyone have any idea how we could achieve this with just standard SQL full-text searching?

    Hi Adam,
    Thank you for your question.  I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated. 
    Thank you for your understanding and support.
    If you have any feedback on our support, please click
    here.
    Elvis Long
    TechNet Community Support

  • Acrobat - Convert Office documents to PDF so that it is crawled/indexed by SharePoint search

    Hi there,
    This is a hybrid question between Acrobat and SharePoint and I'll post on both forums....
    Background:
    In a fairly complex application we have a publishing server that utilizes Acrobat to convert Office documents to PDF using the Convert to PDF functionality.
    We then publish that PDF to a library in SharePoint.  We would like to have those published PDFs searchable by SharePoint search.  Unfortunately there is something about these PDFs where SharePoint cannot crawl the content.
    Note:  I do realize that PDFs are not indexable by SharePoint out of the box and I have installed and configured the iFilter utility.  I have been able to index and search for other PDFs, so I know the mechanism works.  It just seems to be these
    particular PDFs.
    I have also manually "Saved as PDF" directly from Word/Excel and those PDFs are crawled by SharePoint....it just seems to be when Acrobat does its conversion.  I'm sure it's just a simple configuration somewhere... I just don't know what I'm
    looking for.
    Another note:  When I open the published PDFs, I am able to use Acrobat's search to find the text.... and the text is selectable; so it's not as if the conversion changed it to an image.
    So....would anyone happen to have encountered this issue?  Or does anyone know what makes a PDF indexable by SharePoint search?
    Thanks in advance

    Hi  ,
    According to your description, my understanding is that the PDFs which are converted from Office documents by Acrobat cannot be crawled in your SharePoint 2010.
    For your issue, please make sure these PDFs version is 1.5(Acrobat 6.x) or above.
    You can take steps as below for verifying:
    Open your PDF using Adobe Reader.
    Go to File -> Properties.
    Check the PDF Version under Advanced section.
    Best Regards,
    Eric
    Eric Tao
    TechNet Community Support

  • Convert to PDF from Excel so that it is indexable by SharePoint search

    Hi there,
    This is a hybrid question between Acrobat and SharePoint and I'll post on both forums....
    Background:
    In a fairly complex application we have a publishing server that utilizes Acrobat to convert Office documents to PDF using the Convert to PDF functionality.
    We then publish that PDF to a library in SharePoint.  We would like to have those published PDFs searchable by SharePoint search.  Unfortunately there is something about these PDFs where SharePoint cannot crawl the content.
    Note:  I do realize that PDFs are not indexable by SharePoint out of the box and I have installed and configured the iFilter utility.  I have been able to index and search for other PDFs, so I know the mechanism works.  It just seems to be these particular PDFs.
    I have also manually "Saved as PDF" directly from Word/Excel and those PDFs are crawled by SharePoint....it just seems to be when Acrobat does its conversion.  I'm sure it's just a simple configuration somewhere... I just don't know what I'm looking for.
    Another note:  When I open the published PDFs, I am able to use Acrobat's search to find the text.... and the text is selectable; so it's not as if the conversion changed it to an image.
    So....would anyone happen to have encountered this issue?  Or does anyone know what makes a PDF indexable by SharePoint search?
    Thanks in advance

    This cannot be done on a Mac. If you need to continue this discussion, please post in the Acrobat Macintosh forum.

  • Error returned in Acrobat X Pro when attempting to output pdf full text from Proquest database.

    Contacted Proquest and they said to contact Adobe. Recently installed CS5, so I now have newer Acrobat version from before. This newer version isn't playing well. Can anyone help?
    This is my original plea to Proquest:
    Description:
    Recently had Adobe Creative Suite (CS5) installed on my machine. Now when I attempt to download full text pdf I receive an error message when trying to open the downloaded file. "There was an error opening this document. The file is damaged and could not be repaired."
    I can view pdf within Proquest window, but problems occur when I try to open the download - defaults to Acrobat X Pro, not standard Reader. I also get an error message when trying to email it. The only outputting method that seems to work is the Export/Save option.
    I tried on another machine that uses Acrobat Reader XI and all worked fine.
    BTW - I am working in Firefox.
    Thanks in advance for any assistance.

    Here are some screenshots:

  • How does full-text search for pdf files work?

    Hi there,
    Basically I can see my pdf file in the content server.. inside the pdf there's a piece of test that says: "Test's Sample" but when I do a search with that string the file gets filtered from the results.
    I think it has to do with the ' (single quote) being there because other text in the pdf works fine.. so I was wondering how does VDK store this full text? where? I'd like to see how it gets translated IF that's how it works with pdf files....
    Following advice from Re: Parse error with search query I tried doing the search by:
    Test\'s Sample
    Test`s Sample
    "Test's Sample"
    The database is db2 if that helps.. how can I fix this problem?

    Nevermind, I fixed it by changing the VDK filters (in case someone is looking for a solution too).
    Cheers,

  • Full-Text search is not working with PDF files - SQL Server 2012 64 bit

    Hi,
    We are in the process of storing PDF files in SQL Server 2012 with Full-Text search capability.
    I followed the steps as below and it works fine with word document but not for PDF files. I tried with PDF ifiler 11 & 9 and both are unsuccessful.
    Server/DB Level Settings:
    1)
    Enable FileStream
    2)
    Install Full-Text
    then restart
    3)
    Use [specific db]
    alter
    database [db name]
    add
    filegroup Files
    contains filestream;
    alter
    database [db name]
    add
    file (
    name = N'Files',
    filename =
    N'D:\SQL\DATA') to
    filegroup [Files];
    3)
    Database level
    Settings:
    FileStream:
    FileStream
    Directory name:
    [Set the name]
    FileStream
    non-transacted
    Access: [set Appropriate]
    3a)
    Add a
    datafile to DB
    with filestreamdata
    filetype.
    4)
    Share D:\SQL\DATA
    directory and
    add specific accounts
    with read/write
    access
    5)
    Give bulkadmin
    access to those
    specific accounts
    at server
    level
    6)
    From the
    page (link)
    download and
    install the *.pdf
    IFilter for
    FTS. Link:
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542
    7)
    To the
    PATH global system
    variable add
    path to the
    catalog,
    where you installed
    the plugin.
    Default for
    this version is:
    C:\Program
    Files\Adobe\Adobe
    PDF iFilter 9
    for 64-bit
    platforms\bin
    8)
    From the
    page (link)
    download a
    FilterPackx64.exe
    and install
    it. Link:
    http://www.microsoft.com/en-us/download/confirmation.aspx?id=20109
    9)
    Now from
    SSMS execute the following
    procedures:
    -sp_fulltext_service
    'load_os_resources',1
    -sp_fulltext_service
    'verify_signature', 0
    EXEC
    sp_fulltext_service
    'update_languages';
    -- update language list
    EXEC
    sp_fulltext_service
    'restart_all_fdhosts';
    -- restart daemon
    reconfigure
    with override;
    10)
    Restart the
    server
    11)
    select document_type,
    path from
    sys.fulltext_document_types
    where document_type
    = '.pdf'
    -select
    document_type,
    path from sys.fulltext_document_types
    where document_type
    = '.docx'
    12) Results are OK.
    Following is my Table /Index/ catalog script:
    CREATE
    TABLE dbo.DocumentFilesTest
    DocumentId  INT
    IDENTITY(1,1)
    NOT NULL
    PRIMARY KEY,
    AddDate datetime
    NOT NULL,
    Name nvarchar(50)
    NOT NULL,
    Extension nvarchar(10)
    NOT NULL,
    Description nvarchar(1000)
    NULL,
    FileStream_Id UNIQUEIDENTIFIER
    ROWGUIDCOL NOT
    NULL UNIQUE DEFAULT
    NEWSEQUENTIALID(),
    FileSource varbinary(MAX)
    FILESTREAM DEFAULT(0x)
    go
    --Add default add date for document   
    ALTER
    TABLE dbo.DocumentFilesTest
    ADD CONSTRAINT
    DF_DocumentFilesTest_AddDate
    DEFAULT sysdatetime()
    FOR AddDate
    EXEC
    sp_fulltext_database
    'enable'
    GO
    IF
    NOT EXISTS
    (SELECT
    TOP 1 1 FROM sys.fulltext_catalogs
    WHERE name
    = 'Ducuments_Catalog_test')
    BEGIN
    EXEC sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'create',
    'D:\SQL\PDFBlob';
    END
    --EXEC sp_fulltext_catalog 'Ducuments_Catalog_test', 'drop'
    DECLARE
    @indexName nvarchar(255)
    = (SELECT
    Top 1 i.Name
    from sys.indexes
    i
    Join sys.tables
    t on 
    i.object_id
    = t.object_id
    WHERE t.Name
    = 'DocumentFilesTest'
    AND i.type_desc
    = 'CLUSTERED')
    PRINT @indexName
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'create',
    'Ducuments_Catalog_test', 
    @indexName
    EXEC
    sp_fulltext_column
    'DocumentFilesTest',
    'FileSource',
    'add', 0,
    'Extension'
    EXEC
    sp_fulltext_table
    'DocumentFilesTest',
    'activate'
    EXEC
    sp_fulltext_catalog
    'Ducuments_Catalog_test',
    'start_full'
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    ENABLE
    ALTER
    FULLTEXT INDEX
    ON [dbo].[DocumentFilesTest]
    SET CHANGE_TRACKING
    = AUTO
    ALTER
    FULLTEXT CATALOG
    Ducuments_Catalog_test REBUILD
    WITH ACCENT_SENSITIVITY=OFF;
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'pdf'
    'BOL12006553.pdf'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\BOL12006553.pdf',
    SINGLE_BLOB)
    AS BLOB;
    GO
    INSERT
    INTO DocumentFilesTest(Extension,
    Name,
    FileSource)
    SELECT
     'docx'
    'test.docx'
    * FROM
    OPENROWSET(BULK
    'd:\SQL\PDFBlob\test.docx',
    SINGLE_BLOB)
    AS Document;
    GO
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'BILL')
    Returns nothing. it should come from PDF file
    SELECT
    d.*
    FROM dbo.DocumentFilesTest
    d WHERE
    Contains(d.FileSource,
    'TEST')
    Returns from word document as follows:
    2           2014-06-04 10:11:41.393            test.docx docx           
    NULL   [BINARY Value]  [Binary Value]
    Any help is appreciated. Its been a long wait.
    Thanks,
    Vel
    Vel Thavasi

    Hello,
    Did you check the fulltext log files for more details about the errors. If the filter isn’t working, there should be errors in the error log file.
    The following thread is about similar issue, please refer to:
    http://social.msdn.microsoft.com/forums/sqlserver/en-US/69535dbc-c7ef-402d-a347-d3d3e4860d72/sql-server-2008-64bit-fulltext-indexing-pdf-not-working-cant-find-ifilter
    Regards,
    Fanny Liu
    If you have any feedback on our support, please click here.
    Fanny Liu
    TechNet Community Support

  • Full Text Search in PDF file Not Working in SQL Server 2012

    OS: Windows Server 2012 @ Azure
    DB: SQL Server 2012 SP 1 with Cum Update 6
    Filter: OfficeFilter installed, PDFFilter64 11 installed (actually I tried 9 too)
    I have done the following steps:-
    1. Configure SQL Server Instance to enable FILESTREAM for Transaction-SQL Access (IO Access and Allow Remote Client Access to FileStream data) and restart the instance service.
    2. Set Stream Access Level to Full Access and  
    3. Create Database with file stream folder and set the created database Properties.Options: FileStreamDirectorName = fileContainer and FileStream Non-Transaction Access = Full.
    4. Create a FileTable with file director
    5. Execute the following scripts to ensure all installed components working. PDF is listed as one of the supported filter.
    EXEC sp_fulltext_service @action='load_os_resources', @value=1;
    EXEC sp_fulltext_service 'verify_signature', 0 -- don't verify signatures
    EXEC sp_fulltext_service 'update_languages'; -- update language list
    EXEC sp_fulltext_service 'restart_all_fdhosts';
    EXEC sp_help_fulltext_system_components 'filter'
    reconfigure with override
    6. Copy a few PPTX, DOCX, PDF file into the file director.
    7. Search the data by following command. I can PPTX and DOCX files can return right result but PDF is not returned although it contains the searching contents.
    SELECT *
    FROM dbo.Course
    WHERE CONTAINS(file_stream, 'Counsellor');
    Any expert advise?
    Ant in SG

    Are you seeing any errors in the SQL Server Error Log, the Windows Application or System logs?  How about in the Full-text crawl logging?
    Troubleshooting Errors in a Full-Text Population (Crawl)
    If your server has a mix of multi-threaded iFilters and single-threaded iFilters, this can cause serious problems with building the full text index.  (How do I know this?  Well, let's just say that I have suffered as well. And I was shocked!) 
    The efficiency was greatly increased by this article: 
    Troubleshooting: Slow Full-Text Indexing Performance Due to Filtering Process
    This means changing the threading model for the multi-threaded (e.g. Microsoft Office) filters to be Apartment Threaded.  Or perhaps if you are full text indexing PDF files, abandoning the free single-threaded Adobe IFilter and purchasing the FoxIt
    (or some other) multi-threaded PDF iFilter would benefit you.
    RLF

  • How to link a full text index with catalog in a PDF file ?

    Good morning and thank you for your help.
    I already create some PDF files on a folder (with hypertext links between us) and I use the command "Tools\Document processing\Full Text Index with Catalog" to create an index; at this time everything works properly.
    Now I want to link this Index to my first PDF file in order to use automatically this index on an advance search in this file.
    I hope that someone may answer me!
    Thank you.

    Now I want to link this Index to my first PDF file in order to use automatically this index on an advance search in this file.
    In the properties of the document:

  • Full text index searching in large document sets

    I have been placed in charge of a digital PDF document library for a small biotech company. The library consists of about 1000 100-300 page .pdf documents which have been scanned and OCRed. In order to facilitate the full text searching of the documents a PDX catalog has been created. In theory, the PDX catalog would seem to be an excellent means of quickly accessing the data, but due the sheer volume of text that is contained in the documents this does not seem to be the case.
    Any given search may take hours to complete and many computers in the department have been known to lock up due to the load of running a search. Obviously, this has made using the PDX search more of a hassle than it is worth.
    I do not know exactly how the index searches work, but from what I gather they somehow search within each document in turn and return to you all the instances in all the documents that contain a certain term. If this is the case, than it would make sense that the searches would take a long time because the search would have to search each of the 1000 documents in sequence.
    The thing is: we really do not need to know the context and placement of every instance that a word appears in a document. All we need to know is IF it appears, and perhaps how many times. Is there a way to make an index that will simply give us this information without having to search the actual document?
    Heres an example of what I am trying to achieve:
    Note: I know almost nothing about full text indexes so please forgive me if any of this sounds insane
    Lets say we have a document called "word count.pdf" which contained the following text:
    "blah blah yadda yadda text Recombinant human insulin more text still texting and so on"
    And another called "word count 2.pdf" with the following text
    "Recombinant human insulin and la la la dee do"
    The indexes for these files could be condensed and stored like this:
    "Word count.pdf"
    Blah 2
    yadda 2
    recombinant 1
    human 1
    insulin 1
    text 2
    texting 1
    and 1
    so 1
    on 1
    "Word count 2.pdf"
    recombinant 1
    human 1
    insulin 1
    and 1
    la 3
    dee 1
    do 1
    In this example, if we were to run a search on "text" the index would return "word count.pdf, 3 instances (2 of text and 1 of texting" whereas if we were to search for "recombinant" it would return both "word count.pdf, 1 instance" and "word count 2.pdf, 1 instance".
    This way, I could quickly weed out all documents that do not have the word that I am looking for and get an idea about which documents should be searched more in depth without scanning every single instance of the term in every document.
    Is there any way to accomplish something similar to this using acrobat? (Or anything else, for that matter)
    My specifications: (similar to specs of all computers searching the pdx):
    Windows XP,
    intel celeron CPU 2.6GHz, 1G of ram
    Adobe Acrobat 8 Professional

    Look at dTSearch. We used the publisher version for a CD with large files sets (with hundreds of pages per file/thousands of PDF pages of multicolumn index data - text heavy), and it does a great job. The desktop version would provide the type of searching you are looking for. Indexing is also very fast. Our customer complained, like yourself, about the speed of searches in Acrobat 6 and higher - most of the delay is due to the population of the results window.
    http://www.dtsearch.com/

Maybe you are looking for

  • How do I get a video signal to an external monitor via the mini-display port on my Mac Book Air?

    I can't get a video signal out through my mini-display port on my Mac Book Air (version 1).  Thus I can't feed BBC iPlayer to an external monitor.  There is some connectivity - the MAC wallpaper is broadcast, the mouse pointer is visible and sound is

  • ORA-56901: non-constant expression is not allowed for pivot|unpivot values

    Getting following errors ORA-56901: non-constant expression is not allowed for pivot|unpivot values ORA-06512: at "APPS.PIVOT_AWARD", line 16 ORA-06512: at line 5 when i run the following function it is giving error as above. can you please help me c

  • Photos no longer stacking after Photoshop Edit

    I have Lightroom 4 and Photoshop CS6 running on a Mac. I have the box checked to stack edited with originals. I recently moved all of my photos to a an external Raid 1 drive - I did this within Lightroom, so that Lightroom knew where everything was.

  • Character Encoding in XML

    Hello All, I am not clear about solving the problem. We have a Java application on NT that is supposed to communicate with the same application on MVS mainframe through XML. We have a character encoding for these XML commands we send for communicatio

  • Error in MERGE Statament

    I have scenario where i have to insert some columns in to target according to value in intgrt_src_trx_cd in source table and that column (intgrt_src_trx_cd ) is not ther in target table I wrote code as below, but i am getting an error Error report: S