Solr indexing

Has anyone used Solr to index a directory or directories of .doc, .xls, .pdf files? The return from a cfindex on that collection has the document properties in the summary field and it is all together. That is to say that the title, author, etc is not broken out like it is in a verity collection. Does anyone know how to edit the schema.xml or config.xml in a solr collection to index the files so that I can retrieve this information. I am trying to capture all the document properties in their respective fileds and would like to add the created and modified date. I would like to be able to search on these fields.

Hi Guys
Has there been any progress on this, I'm experiencing the same issue, solr is not indexing the body of PDF files at all.
Opening up the collection direct and inspecting shows no content, just the reference to the filename.
I set up the collection in CF admin and have tried to index the collection there and also by using cfindex, both fail.
The same collection is happy to index .xls and .doc with no issue.
<cfindex collection="ti_docs_collection" action="update" extensions=".pdf, .xls, .doc" key="C:\Inetpub\wwwroot\ti\dsdocs" status="sreturn" type="file">
Have you had any joy with this, I'm about to start pulling my own teeth out.
Thanks
Mark

Similar Messages

  • CF9 and SOLR indexing

    We are using CF9 64-bit and setting up a SOLR collection for an HR application. The database contains several million records and includes resumes that we want to do full text searches on.
    We started out by using cfindex to create the index but it would bomb out after just a few thousand records with an error about "warming threads" (I don't have the exact error handy but can get it later) and the indexing would have to be manually restarted. This wasn't a good solution for a multi-million record operation..
    Next, we created a custom Data Import Handler (DIH) outside of CF using the instructions in the SOLR wiki. This index worked great and was very fast. However, the ColdFusion tags (cfsearch, etc.) would not work with this index. We even made sure to duplicate the required nodes (<custom1> <custom2>, etc.) that the cfindex tag would have created. Still cannot search that index.
    We'd really rather not reinvent the wheel and have to write custom search code. Obviously, we like using CF and it would be great if we can use the built-in indexing and searching capability.
    Any ideas on how we can either 1) make the <cfindex> work without stopping OR 2) go ahead and use the custom DIH and be able to make the <cfsearch> work properly?
    Dana

    I only have just over 500 records that I am trying to index, which they do consist of some large documents, and I try to loop through using the cfindex and I also get this error:
    Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later
    I found that if I put this in my loop
    <cfscript>
        thread = CreateObject("java", "java.lang.Thread");
        thread.sleep(1000);
    </cfscript>
    then I no longer have the error, but it does take a long long time to index.  I also would like a better solution.
    The coldfusion debugger shows that it is erroring out on the custom4 field.  I don't know if the custom fields are struggling more than the main body field.  Anyway, I am continuing to research my options.

  • Can Verity/Solr index files AND columns from db into one resultset?

    Hi,
    I have a form where users upload a document and also add some meta data that goes with the document.  I currently store the meta data in the db and the actual document on the file system.
    How can I use Verity/Solr to index files and columns from a database table so it will return one recordset?
    -ws

    <cfindex type="file"> should do the trick, shouldn't it?
    http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7d 04.html
    Adam

  • CF 9 and Solr indexing

    I am having a problem doing a looped cfindex to a solr collection  on a large group of documents (html, pdf. txt etc). If I run the looped cfindex  on my XP dev machine, indexing, for example, 100 records of the same  type (html), the routine runs flawlessly. If I run  the same on my  Windows 2003 production server, I get the following errors after 4 or so documents:
    Error  opening new_searcher exceeded limit of  maxWarmingSearchers4_try_again_later  Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later  request: http://localhost:8983/solr/casecollection/update?commit=true&waitFlush=false&waitSearcher= false&wt=javabin&version=1,
    If I reduce the number to 4 records at a time, no errors. If I increase it to 5 records, I start to generate the errors.
    I need to index 60k+ files, so this is a bit of a concern. Any suggestions?

    I am having a problem doing a looped cfindex to a solr collection  on a large group of documents (html, pdf. txt etc). If I run the looped cfindex  on my XP dev machine, indexing, for example, 100 records of the same  type (html), the routine runs flawlessly. If I run  the same on my  Windows 2003 production server, I get the following errors after 4 or so documents:
    Error  opening new_searcher exceeded limit of  maxWarmingSearchers4_try_again_later  Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later  request: http://localhost:8983/solr/casecollection/update?commit=true&waitFlush=false&waitSearcher= false&wt=javabin&version=1,
    If I reduce the number to 4 records at a time, no errors. If I increase it to 5 records, I start to generate the errors.
    I need to index 60k+ files, so this is a bit of a concern. Any suggestions?

  • ColdFusion 11 and Solr

    I just installed ColdFusion 11. I am pretty sure I selected the option to install the addons like Solr, but when I am in the coldfusion administrator under Data & Services, I click ColdFusion Collections and I get nothing. It won't go to the page at all. If I click on Solr Services a page will come up. If I click on ColdFusion collections and then restart the coldfusion addons I get a page that comes up saying
    "Unable to retrieve collections from the Search Services.Ensure that you have installed ColdFusion Search Service and it is running."
    I am assuming it means it isn't installed.
    So I went to Adobe - ColdFusion Support Center : More Downloads and downloaded/installed the Windows Add-on Services Standalone Installer. I didn't change any of the settings or folders and installed it. I restarted the server. I logged back into the coldfusion administrator and I see the same thing. Nothing changed. When I go to view the file folders I have c:coldfuion11 and a c:coldfusionAdd-onServices. Should the coldfusionAdd-onServices folder been within the coldfusion11 folder?
    I read you can create your collection through the administrator or through coding a page. I thought maybe I need to try it this way. So I created a page to create the collection and it did not work either.
    What am I missing? Did I miss a step or something to make this work?
    Any help I can get, I would appreciate.
    I have a windows 2008 server.

    Here are just a few of the solr files for you to look at. They all appear to be SUCCESSFUL.
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\abc
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\abo
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\backup
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\backupcleaner
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\commit
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\optimize
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\readercycle
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\rsyncd-disable
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\rsyncd-enable
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\rsyncd-start
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\rsyncd-stop
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\scripts-util
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snapcleaner
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snapinstaller
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snappuller
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snappuller-disable
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snappuller-enable
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\bin\snapshooter
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\conf\admin-extra.html
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\conf\elevate.xml
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\conf\mapping-ISOLatin1Accent.txt
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\conf\protwords.txt
                              Status: SUCCESSFUL
    Install File:             C:\ColdFusion11\cfusion\jetty\solr\conf\schema.xml
                              Status: SUCCESSFUL
    On the coldfusion-out.log file it all appears ok as well.
    I can see things like this that shows solr is starting:
    Apr 6, 2015 15:16:54 PM Information [localhost-startStop-1] - Starting jaxrs...
    Apr 6, 2015 15:16:54 PM Information [localhost-startStop-1] - Starting graphing...
    Apr 6, 2015 15:16:55 PM Information [localhost-startStop-1] - Starting solr...
    Apr 6, 2015 15:16:55 PM Information [localhost-startStop-1] - Starting archive...
    Apr 6, 2015 15:16:55 PM Information [localhost-startStop-1] - Starting document...
    Apr 6, 2015 15:16:55 PM Information [localhost-startStop-1] - Starting eventgateway...
    Apr 6, 2015 15:16:55 PM Information [localhost-startStop-1] - Event Gateway Disabled.
    I can see on this same log, when I am in the coldfusion administrator I click on ColdFusion Collections I see this:
    Apr 24, 2015 10:12:21 AM Error [ajp-bio-8014-exec-6] - The request has exceeded the allowable time limit Tag: cfoutput The specific sequence of files included or processed is: C:\ColdFusion11\cfusion\wwwroot\CFIDE\administrator\solr\index.cfm, line: 331

  • Solr CF9 working example

    Good evening everyone,
         I have a very data intensive Flex/CF application, I am no guru by any means but I have done all that I can with db optimization and coding to improve the speed of my app.  We have finally concluded that we might just need some sort of an advance indexing technique to help speed up this app. I have was wondering if anyone have a good example/pointer to solr indexing/searching using query.
         I would like to create multiple indexes from mysql tables, and the perform search against this index.  I would like to be able to perform field1:value1, field2:value2 type of search against Lucenen index created using Solr.  I did the following
    <cfindex action="update" query="v_test" collection="lib_30" type="custom"
                 key="sequenceId" title="Lib 30 index" body="#v_test.columnList#" category="#v_test.columnList#"
                 status="report">
    but can't seem to perform search against a particular field. Any pointers?  I can't seem to get this to work as I would like, and can't seem to get much out of live docs.
    Thanks.
    Jay

    try {
                  File file=new File("myfile.txt");
                  System.out.println ("isFile: "+file.isFile());
                  System.out.println ("Deleted: "+file.delete());
             catch(SecurityException ex){
                  ex.printStackTrace();
             catch (Exception ex) {
                  ex.printStackTrace();
             }

  • Searching for single words in Solr

    I have a Win2k8 Standard 64 bit install of CF9.0.1.  I have simple PDF document containing two words, "Seattle" and "Seahawks".  If I search for "Seattle", I get 0 results.  If I search for "Seattle Seahawks", I get the one result I expected. 
    What can I do to add better support for single word searches?
    NOTE: This does also occur with .doc and .txt files.
    Thanks,
    Merritt Chapman

    Seattle should give you a hit.
    Default query mode in Solr distributed with coldfuion is OR. (it can be changed in solrconfig)
    I suspect the actual search query is Seattle OR Seahawks
    -do you still get one hit searching for Seattle AND Seahawks ?
    So for some reason Seattle has not been put into the index.
    It can happen if its in the stop word list for the collection (but it should not) or if the synonyms file is  badly configured.
    I would analyze how Solr indexes these words (http://localhost:8983/solr/[your collection]/admin/analysis.jsp
    Select the fieldname where you store the data [summary ?]
    check verbose output
    and type  Seattle Seahawks in Field value
    Check how Solr applies filters etc

  • SOLR RuntimeException: Can't find resource 'solrconfig.xml' in classpath or ...

    I'm setting up 64-bit CF with SOLR on a client's Windows Server 2008 64-bit box.
    Installation was not an issue. Was able to produce a SOLR index a number of times since the install. But the nightly rebuilds of the index began failing at after about a week. I've tried rebuilding a number of times at this point via the CF Admin but with no success. What I note is that attempting to recreate the collection results in the sub-directory I've named (HRindex) being created and nothing else.
    Looking in the SOLR stderr log file I find this error each time I've tried re-creating this collection:
      java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'X:\SOLRcollections\HRindex/conf/', cwd=C:\ColdFusionSolr
    I've found references to this problem elsewhere on the net indicating that this may result from a permissions problem with this directory (HRindex) or with the file solrconfig.xml inside that directory...but a) I've subsequently given full control to all users; b) the file solrconfig.xml is not being created at all. On most attempts, no subdirectories are created below X:\SOLRcollections\HRindex\ .
    Anyone here have any experience with this?
    Also worth noting. Not long after this problem began, I decided to uninstall SOLR and tried reinstalling it. So most of the original versions of solr.xml are gone.
    Here is a copy of the file solr.xml that was under the original install in c:\coldfusion9\solr\multicore
    <?xml version='1.0' encoding='UTF-8'?><solr persistent='true'> <cores adminPath='/admin/cores'>
      <core name='core0' instanceDir='core0/'/>
      <core name='HRindex' instanceDir='X:/SOLRcollections/HRindex/HRindex/'
    dataDir='X:/SOLRcollections/HRindex/HRindex/data/'/>
    </cores>
    </solr>
    When I reinstalled SOLR from the CF9 dvd, I instaled it in a different directory. But there are NO copies of solr.xml file under the new installation.
    Not knowing how SOLR works under the covers, where should this file be found? When does SOLR build it?

    XIntelligence wrote:
    When you create a new Solr Collection files like schema.xml & solrconfig.xml are copied from X:\ColdFusion9\solr\multicore\template\conf
    If for some reason these files are locked / access restricted or corrupt the collection would not work.
    You can manually copy X:\ColdFusion9\solr\multicore\template\conf to X:\SOLRcollections\HRindex/conf/ and restart the CF search service though.
    If that is the case, you first have to manually create the directory conf in X:\SOLRcollections\HRindex\. Then copy the XML files into that directory. However, note the repetition '/HRindex/HRindex/' in your settings in solr.xml. It looks like a mistake.

  • Host Solr/Lucene on Microsoft Azure Cloud storage

    I want to host solr index on the Azure cloud storage. I have done some google search for this and all search result shows me this(http://www.interoperabilitybridges.com/Azure/Getting_Started_Guide_Solr_Lucene.asp) link.
    In that link they suggest to download one Azure-Solr(https://github.com/Microsoft-Interop/Windows-Azure-Solr/tags) Project from GitHub. But it has already removed from GitHub. I have find alternative project(https://github.com/MSOpenTech/Windows-Azure-Solr)
    on github for that but that is old and need many references to be download that are not available now. Please advice me how can I proceed now?

    Hello Krutal,
      Can you check this Link, which is about how to create Lucene indexes via a Lucene Directory Object using Azure BlobStorage for Persistent Storage.
      Azure Library for Lucene.Net (Full Text Indexing for Azure)
      and let us know if this helps.
    Regards,
    Nithin Rathnakar

  • What Do You Use in Place of VSpider When Using Solr?

    Since Verity is deprecated according to CF Documentation, what crawler do you use if you want to index dynamic pages (like vSpider would)?  Can you use Solr with vSpider or is there something better out there or bundled with CF9?

    To contiue this thread, I'm looking for a replacement for vspider too. That's funny that I'm only one of two people who used vspider!!!
    I'm migrating my collections to Solr and I need to use a crawler to index my sites. vspider worked really well because it was simple to setup a recurring job to update a verity indedx each night.
    I built a CFC that crawls a site, which I might use to build Solr indexes, but the problem is that it is subject to the server timing out because a site might take a while to crawl, let alone index. I get around it by using <cfsetting requesttimeout="some ungodly number">, but it's still possible that the timeout value is not long enough and the request to index a site will timeout before the index is finished.
    Crawling a site and building an index seems like a lot of work for a single request and I wonder what this will do to the JVM. I'm guessing it will spike and CF will be very slow.
    It seems like crawling and indexing a site should be done outside of CF, and since Solr is built on Java, maybe the indexing should be done in Java?
    Any ideas?

  • Fatal Error in JRE 6.0_24-b07

    I have found others facing a similar problem and yet mine is different and I'll be thankful if somebody could point to the possible source of this error and how to fix/diagnose it. I am running a Nutch crawler, writing to a Solr index on an EC2 instance with JRE version 6.0_24-b07 (though I have tried it with other versions as well). The crawler works fine on my local machine but very often the instance gets stuck and eventually creates an error file with contents similar to the following:
    # A fatal error has been detected by the Java Runtime Environment:
    # Internal Error (safepoint.cpp:247), pid=16632, tid=140208175748864
    # guarantee(PageArmed == 0) failed: invariant
    # JRE version: 6.0_24-b07
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops)
    # If you would like to submit a bug report, please visit:
    # http://java.sun.com/webapps/bugreport/crash.jsp
    --------------- T H R E A D ---------------
    Current thread (0x0000000040e30000): VMThread [stack: 0x00007f84c2729000,0x00007f84c282a000] [id=16636]
    Stack: [0x00007f84c2729000,0x00007f84c282a000], sp=0x00007f84c2828a80, free space=1022k
    Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
    V [libjvm.so+0x790f75]
    V [libjvm.so+0x324f06]
    V [libjvm.so+0x6b3ed6]
    V [libjvm.so+0x79f2a7]
    V [libjvm.so+0x79edbe]
    V [libjvm.so+0x64314f]
    VM_Operation (0x00007f84c072fb80): RevokeBias, mode: safepoint, requested by thread 0x0000000041d03000
    --------------- P R O C E S S ---------------
    Java Threads: ( => current thread )
    0x0000000041a41000 JavaThread "FetcherThread" daemon [_thread_new, id=18355, stack(0x0000000000000000,0x0000000000000000)]
    0x0000000041305800 JavaThread "FetcherThread" daemon [_thread_blocked, id=18354, stack(0x00007f84c032d000,0x00007f84c042e000)]
    0x00000000411f1000 JavaThread "FetcherThread" daemon [_thread_blocked, id=18353, stack(0x00007f84c052f000,0x00007f84c0630000)]
    0x00000000414b5000 JavaThread "FetcherThread" daemon [_thread_blocked, id=18352, stack(0x00007f84bbbfc000,0x00007f84bbcfd000)]
    0x0000000041645800 JavaThread "FetcherThread" daemon [_thread_blocked, id=18351, stack(0x00007f84bb3f4000,0x00007f84bb4f5000)]
    0x0000000041304800 JavaThread "FetcherThread" daemon [_thread_blocked, id=18350, stack(0x00007f84c002a000,0x00007f84c012b000)]
    0x0000000041d03000 JavaThread "QueueFeeder" daemon [_thread_blocked, id=18349, stack(0x00007f84c0630000,0x00007f84c0731000)]
    0x0000000041a1d000 JavaThread "SpillThread" daemon [_thread_blocked, id=18348, stack(0x00007f84c0d4c000,0x00007f84c0e4d000)]
    0x000000004118a800 JavaThread "communication thread" daemon [_thread_blocked, id=18347, stack(0x00007f84c0a40000,0x00007f84c0b41000)]
    0x00007f84bc4ea800 JavaThread "Thread-1105" [_thread_in_vm, id=18346, stack(0x00007f84c093f000,0x00007f84c0a40000)]
    0x000000004195d800 JavaThread "pool-2-thread-1" [_thread_blocked, id=16654, stack(0x00007f84c0c4b000,0x00007f84c0d4c000)]
    0x00007f84bc029800 JavaThread "ajp-bio-9009-AsyncTimeout" daemon [_thread_blocked, id=16652, stack(0x00007f84c0f45000,0x00007f84c1046000)]
    0x00007f84bc28b000 JavaThread "ajp-bio-9009-Acceptor-0" daemon [_thread_in_native, id=16651, stack(0x00007f84c1046000,0x00007f84c1147000)]
    0x00007f84bc005000 JavaThread "http-bio-9000-AsyncTimeout" daemon [_thread_blocked, id=16650, stack(0x00007f84c198d000,0x00007f84c1a8e000)]
    0x00007f84bc061000 JavaThread "http-bio-9000-Acceptor-0" daemon [_thread_in_native, id=16649, stack(0x00007f84c1147000,0x00007f84c1248000)]
    0x00007f84bc325000 JavaThread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon [_thread_blocked, id=16648, stack(0x00007f84c1b55000,0x00007f84c1c56000)]
    0x00007f84bc328000 JavaThread "timerFactory" [_thread_blocked, id=16647, stack(0x00007f84c1248000,0x00007f84c1349000)]
    0x00000000410c2000 JavaThread "GC Daemon" daemon [_thread_blocked, id=16644, stack(0x00007f84c1dd8000,0x00007f84c1ed9000)]
    0x0000000040e64800 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=16642, stack(0x00007f84c2123000,0x00007f84c2224000)]
    0x0000000040e5f800 JavaThread "CompilerThread1" daemon [_thread_blocked, id=16641, stack(0x00007f84c2224000,0x00007f84c2325000)]
    0x0000000040e5c800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=16640, stack(0x00007f84c2325000,0x00007f84c2426000)]
    0x0000000040e5a800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=16639, stack(0x00007f84c2426000,0x00007f84c2527000)]
    0x0000000040e38800 JavaThread "Finalizer" daemon [_thread_blocked, id=16638, stack(0x00007f84c2527000,0x00007f84c2628000)]
    0x0000000040e36800 JavaThread "Reference Handler" daemon [_thread_blocked, id=16637, stack(0x00007f84c2628000,0x00007f84c2729000)]
    0x0000000040dd6800 JavaThread "main" [_thread_in_native, id=16633, stack(0x00007f84c7458000,0x00007f84c7559000)]
    Other Threads:
    =>0x0000000040e30000 VMThread [stack: 0x00007f84c2729000,0x00007f84c282a000] [id=16636]
    0x0000000040e6f000 WatcherThread [stack: 0x00007f84c2022000,0x00007f84c2123000] [id=16643]
    VM state:synchronizing (normal execution)
    VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event])
    [0x0000000040dd31a0] Safepoint_lock - owner thread: 0x0000000040e30000
    [0x0000000040dd3220] Threads_lock - owner thread: 0x0000000040e30000
    Heap
    PSYoungGen total 647744K, used 599873K [0x00000000d7f60000, 0x00000000ffee0000, 0x0000000100000000)
    eden space 641792K, 93% used [0x00000000d7f60000,0x00000000fc8e0700,0x00000000ff220000)
    from space 5952K, 5% used [0x00000000ff910000,0x00000000ff960000,0x00000000ffee0000)
    to space 6528K, 0% used [0x00000000ff220000,0x00000000ff220000,0x00000000ff880000)
    PSOldGen total 224704K, used 113284K [0x0000000087e00000, 0x0000000095970000, 0x00000000d7f60000)
    object space 224704K, 50% used [0x0000000087e00000,0x000000008eca10e8,0x0000000095970000)
    PSPermGen total 66176K, used 36389K [0x0000000082c00000, 0x0000000086ca0000, 0x0000000087e00000)
    object space 66176K, 54% used [0x0000000082c00000,0x0000000084f894b0,0x0000000086ca0000)
    Dynamic libraries:
    40000000-40009000 r-xp 00000000 08:01 25426 /usr/lib/jvm/java-6-sun-1.6.0.24/jre/bin/java
    40108000-4010a000 rwxp 00008000 08:01 25426 /usr/lib/jvm/java-6-sun-1.6.0.24/jre/bin/java
    40dcd000-43d79000 rwxp 00000000 00:00 0 [heap]
    82c00000-86ca0000 rwxp 00000000 00:00 0
    86ca0000-86db0000 ---p 00000000 00:00 0
    86db0000-87e00000 rwxp 00000000 00:00 0
    87e00000-95970000 rwxp 00000000 00:00 0
    95970000-d7f60000 ---p 00000000 00:00 0
    d7f60000-ffee0000 rwxp 00000000 00:00 0
    ffee0000-100000000 ---p 00000000 00:00 0
    7f84bb17b000-7f84bb1b8000 r-xs 0024f000 08:10 24445502 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/xmlbeans-2.3.0.jar
    7f84bb1b8000-7f84bb1bf000 r-xs 00049000 08:10 24445501 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/tika-parsers-0.9.jar
    7f84bb1bf000-7f84bb1c0000 r-xs 00015000 08:10 24445500 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/tagsoup-1.2.jar
    7f84bb1c0000-7f84bb1c2000 r-xs 00004000 08:10 24445499 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/slf4j-api-1.5.6.jar
    7f84bb1c2000-7f84bb1c6000 r-xs 0002f000 08:10 24445498 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/rome-0.9.jar
    7f84bb1c6000-7f84bb1d3000 r-xs 000c1000 08:10 24445497 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/poi-scratchpad-3.7.jar
    7f84bb1d3000-7f84bb252000 r-xs 0034a000 08:10 24445496 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/poi-ooxml-schemas-3.7.jar
    7f84bb252000-7f84bb259000 r-xs 00073000 08:10 24445495 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/poi-ooxml-3.7.jar
    7f84bb259000-7f84bb276000 r-xs 0017c000 08:10 24445494 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/poi-3.7.jar
    7f84bb276000-7f84bb28a000 r-xs 00339000 08:10 24445492 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/pdfbox-1.4.0.jar
    7f84bb28a000-7f84bb2bb000 r-xs 003f0000 08:10 24445490 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/netcdf-4.2-min.jar
    7f84bb2bb000-7f84bb2be000 r-xs 00014000 08:10 24445489 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar
    7f84bb2be000-7f84bb2c0000 r-xs 0000b000 08:10 24445488 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/jempbox-1.4.0.jar
    7f84bb2c0000-7f84bb2c3000 r-xs 00023000 08:10 24445487 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/jdom-1.0.jar
    7f84bb2c3000-7f84bb2c5000 r-xs 00006000 08:10 24445486 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar
    7f84bb2c5000-7f84bb2c9000 r-xs 00027000 08:10 24445485 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/fontbox-1.4.0.jar
    7f84bb2c9000-7f84bb2ce000 r-xs 00048000 08:10 24445484 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/dom4j-1.6.1.jar
    7f84bb2ce000-7f84bb2f3000 r-xs 00172000 08:10 24445478 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/bcprov-jdk15-1.45.jar
    7f84bb2f3000-7f84bb2f6000 ---p 00000000 00:00 0
    7f84bb2f6000-7f84bb3f4000 rwxp 00000000 00:00 0
    7f84bb3f4000-7f84bb3f7000 ---p 00000000 00:00 0
    7f84bb3f7000-7f84bb4f5000 rwxp 00000000 00:00 0
    7f84bb4f5000-7f84bb4f8000 ---p 00000000 00:00 0
    7f84bb4f8000-7f84bb5f6000 rwxp 00000000 00:00 0
    7f84bb5f6000-7f84bb5f9000 ---p 00000000 00:00 0
    7f84bb5f9000-7f84bb6f7000 rwxp 00000000 00:00 0
    7f84bb6f7000-7f84bb6fa000 ---p 00000000 00:00 0
    7f84bb6fa000-7f84bb7f8000 rwxp 00000000 00:00 0
    7f84bb7f8000-7f84bb7fb000 ---p 00000000 00:00 0
    7f84bb7fb000-7f84bb8f9000 rwxp 00000000 00:00 0
    7f84bb8f9000-7f84bb8fc000 ---p 00000000 00:00 0
    7f84bb8fc000-7f84bb9fa000 rwxp 00000000 00:00 0
    7f84bb9fa000-7f84bb9fd000 ---p 00000000 00:00 0
    7f84bb9fd000-7f84bbafb000 rwxp 00000000 00:00 0
    7f84bbafb000-7f84bbafe000 ---p 00000000 00:00 0
    7f84bbafe000-7f84bbbfc000 rwxp 00000000 00:00 0
    7f84bbbfc000-7f84bbbff000 ---p 00000000 00:00 0
    7f84bbbff000-7f84bbcfd000 rwxp 00000000 00:00 0
    7f84bbcfd000-7f84bbd00000 ---p 00000000 00:00 0
    7f84bbd00000-7f84bbdfe000 rwxp 00000000 00:00 0
    7f84bbdfe000-7f84bbe01000 ---p 00000000 00:00 0
    7f84bbe01000-7f84bbeff000 rwxp 00000000 00:00 0
    7f84bbeff000-7f84bbf02000 ---p 00000000 00:00 0
    7f84bbf02000-7f84bc000000 rwxp 00000000 00:00 0
    7f84bc000000-7f84bd5a9000 rwxp 00000000 00:00 0
    7f84bd5a9000-7f84c0000000 ---p 00000000 00:00 0
    7f84c0000000-7f84c0002000 r-xs 00001000 08:01 25328 /usr/lib/jvm/java-6-sun-1.6.0.24/jre/lib/ext/dnsns.jar
    7f84c0002000-7f84c0004000 r-xs 0000d000 08:10 24445483 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/commons-logging-1.1.1.jar
    7f84c0004000-7f84c000a000 r-xs 00045000 08:10 24445482 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/commons-httpclient-3.1.jar
    7f84c000a000-7f84c000e000 r-xs 00024000 08:10 24445481 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-tika/commons-compress-1.1.jar
    7f84c000e000-7f84c0018000 r-xs 00088000 08:10 24445643 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/lib/zookeeper-3.3.1.jar
    7f84c0018000-7f84c0023000 r-xs 0005f000 08:10 24445642 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/lib/xstream-1.3.1.jar
    7f84c0023000-7f84c0025000 r-xs 00018000 08:10 24445451 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/lib-nekohtml/nekohtml-0.9.5.jar
    7f84c0025000-7f84c0026000 r-xs 00015000 08:10 24445469 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-html/tagsoup-1.2.jar
    7f84c0026000-7f84c0027000 r-xs 00000000 08:10 24445544 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/somethingparser/somethingparser.jar
    7f84c0027000-7f84c0028000 r-xs 00004000 08:10 24445467 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/parse-html/parse-html.jar
    7f84c0028000-7f84c0029000 r-xs 00003000 08:10 24445449 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/lib-http/lib-http.jar
    7f84c0029000-7f84c002a000 r-xs 00001000 08:10 24445511 /something1/packages/thecrawler/webapps/crawlers/WEB-INF/classes/conf/plugins/protocol-http/protocol-http.jar
    7f84c002a000-7f84c002d000 ---p 00000000 00:00 0
    7f84c002d000-7f84c012b000 rwxp 00000000 00:00 0
    7f84c012b000-7f84c012e000 ---p 00000000 00:00 0
    7f84c012e000-7f84c022c000 rwxp 00000000 00:00 0
    7f84c022c000-7f84c022f000 ---p 00000000 00:00 0
    7f84c022f000-7f84c032d000 rwxp 00000000 00:00 0
    7f84c032d000-7f84c0330000 ---p 00000000 00:00 0
    7f84c0330000-7f84c042e000 rwxp 00000000 00:00 0
    7f84c042e000-7f84c0431000 ---p 00000000 00:00 0
    VM Arguments:
    vm_args: -Djava.util.logging.config.file=/something1/packages/thecrawler/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/something1/packages/apache-tomcat-7.0.20/endorsed -Dcatalina.base=/something1/packages/thecrawler -Dcatalina.home=/something1/packages/apache-tomcat-7.0.20 -Djava.io.tmpdir=/something1/packages/thecrawler/temp
    java_command: org.apache.catalina.startup.Bootstrap start
    Launcher Type: SUN_STANDARD
    Environment Variables:
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    USERNAME=root
    LD_LIBRARY_PATH=/usr/lib/jvm/java-6-sun-1.6.0.24/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.24/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.24/jre/../lib/amd64
    SHELL=/bin/bash
    Signal Handlers:
    SIGSEGV: [libjvm.so+0x791b30], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGBUS: [libjvm.so+0x791b30], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGFPE: [libjvm.so+0x640ba0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGPIPE: [libjvm.so+0x640ba0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGXFSZ: [libjvm.so+0x640ba0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGILL: [libjvm.so+0x640ba0], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
    SIGUSR2: [libjvm.so+0x643780], sa_mask[0]=0x00000000, sa_flags=0x10000004
    SIGHUP: [libjvm.so+0x643380], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGINT: SIG_IGN, sa_mask[0]=0x00000000, sa_flags=0x00000000
    SIGTERM: [libjvm.so+0x643380], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    SIGQUIT: [libjvm.so+0x643380], sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004
    --------------- S Y S T E M ---------------
    OS:squeeze/sid
    uname:Linux 2.6.32-309-ec2 #18-Ubuntu SMP Mon Oct 18 21:00:50 UTC 2010 x86_64
    libc:glibc 2.11.1 NPTL 2.11.1
    rlimit: STACK 8192k, CORE 0k, NPROC infinity, NOFILE 1024, AS infinity
    load average:12.49 12.12 11.91
    /proc/meminfo:
    MemTotal: 7864548 kB
    MemFree: 562736 kB
    Buffers: 500660 kB
    Cached: 1560628 kB
    SwapCached: 0 kB
    Active: 4917260 kB
    Inactive: 1955200 kB
    Active(anon): 4198384 kB
    Inactive(anon): 612968 kB
    Active(file): 718876 kB
    Inactive(file): 1342232 kB
    Unevictable: 0 kB
    Mlocked: 0 kB
    SwapTotal: 0 kB
    SwapFree: 0 kB
    Dirty: 104 kB
    Writeback: 0 kB
    AnonPages: 4811024 kB
    Mapped: 666396 kB
    Shmem: 180 kB
    Slab: 157276 kB
    SReclaimable: 144948 kB
    SUnreclaim: 12328 kB
    KernelStack: 3032 kB
    PageTables: 0 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 3932272 kB
    Committed_AS: 5022592 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 6256 kB
    VmallocChunk: 34359731576 kB
    DirectMap4k: 7864320 kB
    DirectMap2M: 0 kB
    CPU:total 2 (4 cores per cpu, 1 threads per core) family 6 model 26 stepping 5, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt
    Memory: 4k page, physical 7864548k(562736k free), swap 0k(0k free)
    vm_info: Java HotSpot(TM) 64-Bit Server VM (19.1-b02) for linux-amd64 JRE (1.6.0_24-b07), built on Feb 2 2011 16:55:54 by "java_re" with gcc 3.2.2 (SuSE Linux)
    time: Thu Oct 13 08:54:34 2011
    elapsed time: 9913 seconds

    If you are using JNI or using a third party library that uses JNi then that would likely be the problem.
    If not then it is a VM bug.

  • Adjusting a URL and reloading causes it to go to Google - any way to just let it reload the page?

    I was testing some calls to a Solr index with different URL query parameters. But each time I adjust a parameter and reload the page, instead of making the call again I'm dropped into Google search.
    However, if I (tediously) manually edit the URL to specifically add back in "http://" at the beginning of the line I can reload the page.
    This doesn't happen in Safari or Chrome. It seems to be a problem unique to FireFox.
    It makes testing cumbersome and I was wondering if there was a solution for this.
    Thanks,
    doug

    Could you describe how you are making the calls? For example, are you setting window.location.href, using window.open(), etc.? Submitting a form?
    One reason for getting a search could be a space in the URL. Is there any way to avoid spaces in the URL, or can you confirm that they are being properly encoded to %20?
    Do you want to try disabling address bar search (keyword search) as a workaround? https://support.mozilla.org/en-US/kb/search-web-address-bar#w_turning-off-the-internet-keyword-search

  • Solr errors when indexing custom file extensions.

    Greetings!
    I am working on my company's public website and need to be able to index the web pages.  The site is configured to read .ak files as .cfm files, but Soler errors when trying to read them.  While testing I found that if I remove the <head> tags from the documents there are no errors.  I've looked into the Solr config files for a location to tell Solr that .ak files should be parsed as html.  I have been unable to find such a setting, does one exist?
    Thanks for your help,
    Dave

    Hi,
    Are you able to manually add new extension to the list?
    Try the troubleshooter under “Advanced Options” or rebuild the index for a test.
    If doesn’t work, then let’s go deeper for this issue.
    Open registry, expand, HKEY_CLASSES_ROOT
    Under this entry, there should be a bunch of file types, I did a test, if I delete a file type in it, for example .txt, then when I open the index option again, advanced option, file type, I’ll no longer find the txt extension in the list, so you may check
    the keys in case the registry is broken.
    Regards
    Yolanda
    TechNet Community Support

  • Create index in solr & cf9 from query data

    Hey guys,
       Does anyone have a working example of cfindex where input data comes from a query and where you can search said index for a given value in a specified field.
        I create an index as below.
        <cfindex action="update" query="v_test" collection="lib_30" type="custom"
                 key="sequenceId" title="Lib 30 index" body="#v_test.columnList#" category="#v_test.columnList#"
                 status="report">
        This creats an index but my fields tag in index are empty, and all the data in each column is concatinated together to create one long string.
         I have googled, and tried to make heads and tails of the live doc, but I haven't been successful
         Any one, please help
    Jay

    I can't see any evidence that CF supports individual search fields with Solr.  The <cfindex> implementation for Solr seems to just replicate what it did for Verity: bung all the data from the various columns specified in the BODY attribute into one long string.
    I hasten to add that my comment is not based on code-based investigation, but just my reading of the docs coupled with your findings.  And tangential experience with CF's Solr integration implementation which I have found to be a bit... basic.
    Adam

  • Sandbox security denying CFINDEX from indexing a collection (Solr/CF9)

    Hello, everyone.
    I did fix the last Sandbox security related issue with Solr collections - it was in the "Files/Dir" section, I had to put everything under C:\ColdFusion9\wwwroot.
    Now, I'm facing yet another Sandbox related issue with collections.
    I have one reindex script that has NO ISSUES when pulling data from a database and indexing a collection from that.
    I have another reindex script that will not index a collection from a query, unless Sandbox is disabled.  I will try to give some pseudo code.
    <cfquery name="search_results" datasource="documents">
      SELECT DOC_ID, DOC_NAME, DOC_DESCRIPTION
      FROM DOC
      WHERE DOC_ID in (<cfqueryparam value="#thisList#" cfsqltype="CF_SQL_VARCHAR" list="yes">)
    </cfquery>
    <cftry>
      <cfindex action="refresh" collection="collection_name" key="DOC_DESCRIPTION" type="custom" title="DOC_DESCRIPTION" query="search_results" body="DOC_ID, DOC_NAME, DOC_DESCRIPTION" status="results" />
      <cfcatch><cfdump var="#cfcatch#"></cfcatch>
    </cftry>
    The query will retrieve 15 records, with or without Sandbox.  But the CFINDEX will not work if Sandbox is enabled.
    The other reindex script is not affected, either way.
    What could be causing the CFINDEX to fail?
    Thank you,
    ^_^

    I had a similar issue with CFSEARCH on CF10 with sandbox security that I resolved by adding the following to the neo-security.xml file:
    <struct><var name='CLASS'><string>java.net.SocketPermission</string></var><var name='TARGET'><string>127.0.0.1:0</string></var><var name='ACTION'><string>listen,resolve</string></var></struct></array></var></struct></var> </struct>
    That gives permission to listen on dynamic ports (the colon-zero part).

Maybe you are looking for