Indexing crawled content

The latest endeca ATG integrated module donot use developer studio for configuring record adaptor.
I would like to know how to configure and index third party data source with JDBC connection or csv file or even craweld content.

Thanks Kristen.
But it becomes like redoing configuration once in repository and another in pipeline dev studio?
is there any plan to have this one way, may be in future release?
I would also like to know if any future release planned to support Endeca SEO OOTB?

Similar Messages

Best way to index portal content

Hi all,
I am searching for a nice way to index the content of the portal.
Using the standard functionality (http://help.sap.com/saphelp_nw70ehp1/helpdata/en/46/9d1405fa743ef0e10000000a1553f7/frameset.htm) does not lead to sufficient results, since all roles which are located in PCD are indexed and are found by the enduser.
This could lead to the situation that the enduser might open a page which leads to an error or a blank page.
The other possibility is to use the PCD roles search (http://help.sap.com/saphelp_nw70ehp1/helpdata/en/b2/d59a4271c80a31e10000000a1550b0/frameset.htm). This only offers results of roles which are assigned to the user.
However, content of iViews (e.g. URL iViews) are not indexed and so the user can only search for the ID / description of the page (which is okay, but content search is also important).
Is there no other possibility to search for portal content?
Thanks for any hints...
Best Regards
Philipp Kölsch

Hi,
it seems that the current ways of indexing portal content are not really sufficient... I hope that there are better solutions in the future...
Best Regards
Philipp Kölsch

SharePoint 2013 - SQL Server BCS Model Incremental Crawl content doesnt show up in Search results

SharePoint 2013 - SQL Server BCS Model Incremental Crawl content doesn't show up in Search results, Incremental crawl is working fine, i.e., its picking up newly added records in table to the Search, but the newly added content is not available in search
results page.
But when i do a Full Crawl, search results are showing up with appropriate content.
What could be the issue here?
Suresh Kumar Udatha.

This time on the Full crawl I got only 62 warnings and 12 errors and ~537.000 success. Warnings were about truncating the crawl documents because their content length exceed the configured for crawl. The 12 errors were "Processing this item failed because
of a timeout when parsing its contents." and "The content processing pipeline failed to process the item.". I think 12 errors is not much to re-execute full crawl. Site collection has one SP Site Group (with Read Permission Level). In this site group I have
only one AD Group added, so permission change is not a possible reason for re-crawl, plus nobody changed anything in this ad group. All documents are stored in 2 document libraries and there are no sub-sites. I want to access these documents trough search
(custom managed property restriction kql) but this way I have no mechanism to fast re-crawl only error documents from the first full-crawl (those 12). This is very strange and put SP 2013 Search almost unusable for my scenario.
Thanks,
Darko
Darko Milevski http://mkdot.net/blogs/darko/

TREX ABAP Client: How to index file content?

Hello Colleagues,
We have installed TREX search engine and writing own solution by using ABAP Client. Everything is ok except files content like XLS, DOC and so on. When we try to post binary content to the FM TREX_EXT_INDEX it is not processed by TREX and only attributes search is available. I think something wrong with data types.
Test example:
data lt_data type table of char100.
data l_size type i.
data lt_doc type trext_index_docs.
data ls_doc type trexs_index_doc.
call function 'GUI_UPLOAD'
    exporting
      filename   = 'D:test.xls'
      filetype   = 'BIN'
    importing
      filelength = l_size
    tables
      data_tab   = lt_data.
ls_doc-doc_key = '00001'.
ls_doc-doc_langu = 'EN'.
ls_doc-doc_type = 'B'.
ls_doc-mime_type = 'application/excel'.
call function 'SCMS_BINARY_TO_STRING'
    exporting
      input_length = l_size
    importing
      text_buffer = ls_doc-content
    tables
      binary_tab   = lt_data.
append ls_doc to lt_doc.
call function 'TREX_EXT_INDEX'
    exporting
      i_index_id            = me->index_id
      i_rfc_destination     = me->rfc_dest
      i_index_document_list = lt_doc.
Document is indexed without content. Why?

Hi Evgeni,
I realise this is a little late, but just in case you are still interested - or anyone else out there is:
Basically somewhere internally the FM 'TREX_EXT_INDEX' does the following with your ls_doc-content:
l_xstring = p_content_in. "(p_content_in == ls_doc-content)
If you look at the conversion rules in ABAP the xstring then expects the string to contain only the characters '0-9A-F'. Which your string does not have after calling the FM SCMS_BINARY_TO_STRING.
Thus you have to format your ls_doc-content differently. Basically you need to move the hex characters in your xstring into 'real characters'. Which will expand the string massively.
We solved the problem using the following Form:
form conv_content using value(raw_content) type string
               changing content            type string.
data: lv_char   type c,
        lv_string type string.
field-symbols: type x.
clear content. " init output.
while raw_content is not initial.
    lv_char = raw_content(1).
    assign lv_char to casting.
    lv_string = .
    concatenate content lv_string into content in character mode respecting blanks.
    shift raw_content left.
endwhile.
endform.
Not exactly efficient, but you can call it just as you would call the FM SCMS_BINARY_TO_STRING (except you don't need the filesize). Then the TREX will index your MS-Office documents as long as they are not of type Office 2007 or newer. In that case there is another bug, where the mime_type you pass in the interface is only 50 Chars long - which is too short to fit the full docx mime type for example.
Regards,
Robin

Unable to full text index the contents in Oracle 11g UCM

Hi,
I am new to the Oracle UCM 11g.
i am unable to full text index the content files that are check-in into the Oracle UCM.
I have added the below entries in config.cfg file:
SearchIndexerEngineName=OracleTextSearch
IndexerDatabaseProviderName= SystemDatabase
AdditionalEscapeChars=-;#
While performing the indexing operation using Repository Manager only, metadata of the content files are indexed, but full text is not getting indexed.
What is missing here in Oracle UCM for not fulltext indexing the contents? What configurations do i need to do for this so that i can search perform the full text search on the Contents in Oracle UCM?
Thanks in Advance
Dipesh

Hi Srinath,
Collection rebuild cycle runs perfectly fine. After enabling tracing for Indexer and systemdatabse, i got the below info in the log:
"Finished rebuilding the search index with a total of 123 files successfully indexed. A total of 0 files had a full text index."
The below is the details of the activeindex.hda:
<?hda version="11gR1-11.1.1.3.0-idcprod1-100505T121221" jcharset=UTF8 encoding=utf-8?>
@Properties LocalData
UseImplicitZonedSecurityField=true
blFieldTypes=
ActiveIndex=index1
blDateFormat=M/d{yy}{ h:mm[:ss]{ a}}!mAM,PM!tGMT+05:30
@end
@ResultSet SearchCollections
7
sCollectionID
sDescription
sVerityLocale
sProfile
sLocation
sFlag
sUrlScript
TestHost
!csSearchDefaultSearchCollection
English-US
local
index1
enabled
<$URL$>
@end
Is it possible that OracleTestSearch Component is missing in Oracle UCM?
Thanks
Dipesh

Indexing portal content by external search engine

Hi All,
I have a question about how to enable external search engine to index portal content.
as I searched in sap help [Indexing Portal Content|http://help.sap.com/saphelp_nw70ehp1/helpdata/en/46/9d1405fa743ef0e10000000a1553f7/frameset.htm]
To enable public search engines, such as Google, to index portal content, provide the search engine with the following URL:
irj/servlet/prt/portal/prtroot/com.sap.portal.utilities.tools.portalspider.SpiderWebInterface?
How should I provide this above url to external search engine?
Many thanks for all reply.
Cheers,
Kanok-on K.
P.S. We use the default KM.

Hi...
Did you get any solution? Is it working now?
I need same Information.... Please share if you have solved the issue.... and your site is indexed by search engine.
Thanks,
PradeeP

Re-Indexing BEA Content - SecurityServiceManager not yet initialized

We've recently upgraded from Weblogic 9.2 to 10.2. The upgrade was successful and everything is running fine, except indexing search content. We followed the instructions here:
[url http://edocs.bea.com/wlp/docs100/search/searchProduction.html]
After running index_cm_data.sh, we keep getting the following error:
<ERROR>: Indexing failed. Please use -help for more info.
weblogic.security.service.NotYetInitializedException: [Security:090392]SecurityServiceManager not yet initialized.
        at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.getSecurityService(Unknown Source)
        at weblogic.security.service.SecurityServiceManager.getSecurityService(Unknown Source)
        at weblogic.security.service.SecurityServiceManager.getPrincipalAuthenticator(Unknown Source)
        at weblogic.security.services.Authentication.login(Authentication.java:69)
        at weblogic.security.services.Authentication.login(Authentication.java:51)
        at com.bea.p13n.security.Authentication.authenticate(Authentication.java:237)
        at com.bea.content.indexer.IndexerRunner.index(IndexerRunner.java:311)
        at com.bea.content.indexer.IndexerRunner.main(IndexerRunner.java:931)We'd be very grateful if anyone could direct us to a possible solution.
Edited by mirzaei at 04/21/2008 1:45 AM

We removed p13n_app.jar from the classpath and presto, it works.

While crawling content from Moss 2007 to Sp2013 its throwing error

hi,
When i am trying to crawl the Moss 2007 site content in SP2013 serach I am getting below Error in the Crawl error log.
I have provided owner access for the Moss 2007 content DB,to the user accout which i am using in SP2013 serach.And also provided read access for this account to the Moss 2007.But still i am unable to crawl.
1 The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page.
2 This item could not be crawled because the repository did not respond within the specified timeout period. Try to crawl the repository at a later time, or increase the timeout value on the Proxy and Timeout page in search administration. You might
also want to crawl this repository during off-peak usage times.
3 The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly. If the repository was temporarily unavailable, an incremental crawl will fix this error.
4 Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using
has "Full Read" permissions on the SharePoint Web Application being crawled.
Thanks & Regards, Krishna

Try to use website as content source to crawl it
Create disableloopbackcheck registry on all sharepoint servers
Are you able to browse sharepoint sites from crawl server.
check
http://blogs.technet.com/b/josebda/archive/2007/03/19/crawling-sharepoint-sites-using-the-sps3-protocol-handler.aspx
http://sharepoint.stackexchange.com/questions/93691/using-sharepoint-2013-to-crawl-sharepoint-2007
If you are still struggling you can watch the requests by using Fiddler as a proxy for the crawl process. Then you can kick off the crawl and watch the requests and the response from the 2007 server.

TREX indexing on content server performance

Hi guys,
Our Portal is integrated with SAP CRM (using webdav) that manages documents stored in SAP content server. We use TREX to index these documents such that users in Portal can search for these documents. Currently we're evaluating the performance of indexing and searching, thus if we have a heavy load of documents to index, would it affect the SAP CRM/Content server that is the document repository? (such as memory consumption, performance, etc..)
Thanks,
ZM

Hi Chris,
do you use the ContentServer in the DMS application? If yes, you need to index documents stored in the DMS_PCD1 docu category.
Regards,
Mikhail

TREX indexing from content server(maxdb)

Hello.
I've installed standalone TREX on Linux, configured the connection to R3 system and to Portal as well. On the R3 I have configured connection to the SAP Content serve(via archive link) where we save some documents(pdf, doc. files). The part which I don't understand is how can I configure the repository of the content server as a source for the Trex index?
Any hint would be useful. Thanks in advance.
Vit

Hi,
the question is what exactly do you wanna do?
Integrating a archivelink in the Portal or indexing the documents via ERP?
In the first case you must create a web repository and define an index.
In the other case it depends on the SAP Solution but mostly you have to customize the SM59, SRMO, SKPR06.
Best reagrds
Frank

TOC, Index, Search content not loading in FireFox 3.0 +

Hi,
I have created from RH 8 a merged Webhelp system (1 master, 2 children - based on Peter Grainge's posted procedure). It works fine from IE, and on my system, from FireFox 3.0 and up.
The problem is, that when deployed at the customer's site, the TOC, Index, and Search content does not load in FireFox. All frames display (Nav, contents, and the top frame), but the content for the TOC, Index, and Search does not load.
This only happens at the customer site. The webhelp system works okay in IE.
I have checked and we are running the same javascript version, have the same security settings in FireFox set.
I looked on Peter's site and implemented the redirect fix for FireFox that he has posted, but that hasn't seemed to fix the problem.
They aren't running any special toolbars (like Google toolbar). I am checking on add ons.
Does anyone have any ideas on what else I might check or better yet what might be causing the problem?
Thank you,
Tannis

Willam
A merge will not work properly unless all the projects have been generated once. After that it should not matter that a project is missing, indeed simply not publishing one of the projects to the server is one of the features of merging making it easy to supply different customers with different content.
I wonder if the problem you saw was because that project had not been generated.
Tannis
Sorry, the reference to AIR help was because you mentioned Snippet 141 which is AIR specific. I didn't look further back in the thread.
You said that the start page of a child project opens the tripane window with the default topic but you do not get the TOC etc. That rules out the merge being the problem. I am inclined to the view that the server is the issue so get your client to load the published output from my merge demo as that is a known quantity. Ask them to open each child project in turn and then finally the merge.
See www.grainge.org for RoboHelp and Authoring tips
@petergrainge

Prevent Windows Search from indexing PST contents in Outlook

Running Windows 8.1 with Windows Search, and Office 2013 with more that 20GB in more than 10 PSTs and OSTs, searching becomes very slow after indexing everything. Reindexing has no change. If I exclude Outlook it's very fast.
There is an option in Outlook search ribbon -> Search Tools -> Locations to search. I removed most of the PSTs (accounting for 75% of the contents), but still ALL PSTs and OSTs get indexed (although they won't be searched).
This makes using Windows Search nearly impossible (e.g. searching using the charm thing takes 30 seconds).
How can I have Windows Search index ONLY the PSTs and OSTs that I specifically want?
Is there any configuration, registry stuff or programatic workaround to exclude from indexing (not searching)?
Thank you.
Andy.PT

Hi,
Still, as mentioned above, I don't have such a method to choose which to be included in Indexing and which to be excluded.
The only thing that I find which is close to your request is this:
If the method above isn't helpful in your scenario(as you mentioned), you can try some 3rd-party tools/add-ins to realize a new search function which may meet your request.
Regards,
Melon Chen
Forum Support
Come back and mark the replies as answers if they help and unmark them if they provide no help.
If you have any feedback on our support, please click
here.

Search Service stopped crawling content after previously working

The search service stopped crawling contennt. This was working previously. I tried to stop full crawl and restart. The ULS logs reveals this error repeatedly. when filter on search errors.
Microsoft.Ceres.ContentEngine.SubmitterComponent.ContentSubmissionDecorator : CreateSession() failed with unexpected exception System.ServiceModel.FaultException`1[System.String]: The creator of this fault did not specify a Reason. (Fault Detail is equal
to Specified flow: Microsoft.CrawlerFlow could not be started. Unable to start a flow instance on any of the available nodes.).
Before we dropped the index and or rebuilding the Search Service application has anyone else worked on a similar issue?
Thank you.

Hi PSCoan,
From the error message, it seems that the service account running the SharePoint Search Host Controller and the SharePoint search server 15 is missing the following four local policies:
SeAssignPrimaryTokenPrivilege (Replace a process-level token )
SeImpersonatePrivilege (Impersonate a client after authentication)
SeServiceLogonRight (Log on as a service)
SeIncreaseQuotaPrivilege (Adjust memory quotas for a process)
Please go to the Local Group Policy Editor to assign the policies to the service account.
Click Start , type gpedit.msc in the
Start Search box, and then press ENTER .
Expand Local Computer Policy -> Computer Configuration -> Windows Settings -> Security Settings -> Local Policies -> User Rights Assignment.
Right click “Replace a process-level token” -> Properties -> Add User or Group.
Right click “Impersonate a client after authentication” -> Properties -> Add User or Group.
Right click “Log on as a service” -> Properties -> Add User or Group.
Right click “Adjust memory quotas for a process” -> Properties -> Add User or Group.
Thanks,
Victoria
Forum Support
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact
[email protected]
Victoria Xia
TechNet Community Support

Can folders' index.html contents avoid containing such folders' names within their urls, please?

Could Dreamweaver’s file panel folders (located to the right, within Dreamweaver CS5.5, etc.) potentially contain index.html files that do NOT share such folders’ actual names within their own urls?
The reason I ask is that I would like to create various index.html sub-directories that will all be linked from yet another index.html subdirectory, for web surfers’ convenience.   It would really help me keep them all organized (for occasional updates' sake) if I could keep them in a folder within Dreamweaver’s file panel that at least resembles that other index.html subdirectory’s name, though.   However, it seems that moving index.html files to such an "alpha" folder inserts that alpha folder’s name into the contained urls, not just its own.   Can this be avoided or worked around, somehow, please?     In case it helps, I have CS5.5 and will have CS6 upon its release date.   Any thoughts, please?

Thanks for the preliminary response.   It would really help me keep index.html files organized (for occasional updates' sake) if I could keep them in a folder within Dreamweaver’s file panel (to the right) whose name does NOT appear in those individualized urls.
To give you an example of why I'd like to be able to do this, let's say one wants to have a folder named California, and that various index.html files will be linked from the actual California page.   If one wants to update all pages that belong in the California directory, without having to sort through pages that correspond with (for example) Oregon or Arizona that one ALSO maintains, it could be greatly facilitated if such index.html files could be organized into a California file or something whose name is close to it.   If a page is named "Sacramento" then it need not have California in the url. Indeed, longer urls are less appealing for the purposes of this endeavor.    So in using this concrete example as background, perhaps my initial question makes more sense now?   At any rate, thanks for trying

Can someone please share configFile.xml for crawling Content Server Source

Hi All
I am trying to create a Content Server Source in SES. I am unable to generate the configFile.xml as mentioned in doc, from UCM.
Can someone please share the config file.
Thanks & Regards,
Amit

Hi ,
Thanks lot for replying.
I am using the same versions. There is one xml file getting generated but its name is not configFile.xml
In my test it generated one with df2009-04-21-20-16-22-833434567.xml name.
Please tell me is this the correct file?.
some contents of it are...
Line: -----
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="2.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://xmlns.oracle.com/orarss C:\project_drive\SES Application Search\SES Format Schma\orarss.xsd">
<channel>
<title>Content</title>
<link>df2009-04-21-20-16-22-833434567.xml</link>
<description>The channel contains a feed for content</description>
<lastBuildDate>2009-04-21T14:46:22.000Z</lastBuildDate>
<channelDesc xmlns="http://xmlns.oracle.com/orarss" >
<feedType>full</feedType>
<sourceName>content_feed</sourceName>
</channelDesc>
<item>
Line: -----
Thanks a lot... :)

Indexing crawled content

Similar Messages

Maybe you are looking for