Crawler behavior

I'm having some trouble getting TREX to crawl a confluence-based wiki on our intranet. When I look at the http logs on the Confluence server, I see the crawler's visit, but only a 'HEAD' request for the home page, with no subsequent 'GET' request.
The start page for the Confluence web site is configured to visit the login page with a hardcoded userid and password in the query string. I see this information in the HTTP log. The login page should then redirect (302) to the actual starting page for the crawl.
Could the information returned in the HTTP headers keep the crawler from visiting? Following are the headers from the Confluence server:
HTTP/1.1 302 Moved Temporarily
Date: Tue, 03 Jun 2008 15:43:49 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: JSESSIONID=7EAFED82BFE05768BA89241BF1562705; Path=/confluence
Location: http://rohpedia.rohmhaas.com/confluence/display/ROHPedia/ROHPedia Home
Content-Type: text/html;charset=UTF-8
Content-Length: 0
Thanks in advance!
-jon

Are you referring to cards created by the Windows File Crawler?
If so, and you are moving the source files to a new server, then the crawler will delete the objects and then recreate them as new cards with new card ids by default. With customizations you can workaround this to keep the same card ids and prevent recrawling.
What are you upgrading, the portal suite?
If so, the portal will recrawl and reindex all items during a portal upgrade.

Similar Messages

How to specify inclusion and exclusion rules for File data sources

This is the seed URL for a file data source: file://localhost/c:/myDir/
I want to exclude indexing and searching of files under: file://localhost/c:/myDir/obsolete/
What is the exact format for the exclusion URL?
I have tried both file://localhost/c:/myDir/obsolete/ and /myDir/obsolete/
but neither of it seems to work; it still indexes everything under /myDir/
Should I just put: /obsolete/ as the exclusion URL?
Also after initial crawling, if I change the inclusion and or exclusion rules and then run the crawler again, it should update the indexes accordingly. Is that right?
The version of UltraSearch I am using is 1.0.3.
Thanks for any help on this.

Try "/c:/myDir/obsolete/"
Changing inclusion/exclusion rule does not affect files already crawled. It onyl
affects next crawling behavior.
To do any DML to existing data set, use SQL directly on wk$url table under the instance owner.

What is the default Behavior of crawler , when upgarding 5.0.5 to 6.1

Hi ,
It will re crawler the all the cards when changing to old file server to new file server?
please reply , if anybody know the answer.
Thanks
Dheeraj

Are you referring to cards created by the Windows File Crawler?
If so, and you are moving the source files to a new server, then the crawler will delete the objects and then recreate them as new cards with new card ids by default. With customizations you can workaround this to keep the same card ids and prevent recrawling.
What are you upgrading, the portal suite?
If so, the portal will recrawl and reindex all items during a portal upgrade.

Sharepoint 2013 site having NTML Crawl issue while crawling Sharepoint 2010 sites having FBA authentication

Hi,
We have Sharepoint 2013 search center site which is claim based with NTLM authentication set. we have Sharepoint 2010 farm also running which are FBA authenticated.
While crawling Sharepoint 2010 sites having FBA authentication from SP 2013 search center having NTLM auth. it does not give proper result.
Can you please help me what can be done here?
Thanks,
Prashant

Hi Prashant,
According to your description, my understanding is that the search cannot work correctly when crawling the SharePoint site which is form-based authentication.
Per my knowledge, the crawl component requires NTLM to access content. At least one zone must be configured to use NTLM authentication. If NTLM authentication is not configured on the default zone, the crawl component can use a different
zone that is configured to use NTLM authentication.
However, if crawling a non-default zone of the web application, URLs of results will always be relative to the non-default zone that was crawled for queries from any zone, and this can cause unexpected or problematic behavior.
I recommend to make sure that the default zone of the SharePoint 2010 web application to use NTLM authentication.
More references:
http://technet.microsoft.com/en-us/library/dn535606(v=office.15).aspx
http://technet.microsoft.com/en-us/library/cc262350.aspx#planzone
Best regards.
Thanks
Victoria Xia
TechNet Community Support

Crawler time issue

Dear Experts,
I have an issue regarding the custom crawler which I had created for the web repository.
System Landscape - EP 2004s SP 14.
I have created a custom crawler (System Admin -> System Config -> Knowledge Management -> Global Services -> Crawler Parameters) by duplicating a standard crawler.
This custom crawler has following parameters -
No. of retrieving threads = 4.
No. of providing threads = 4.
Follow Redirects = ON.
Now when I try to re index the web repository it takes so long (About 20 - 25 hours) to deliver about 300000 documents.
Even for the incremental index it takes 19 hours to deliver about 100 documents.
I don't think this is a normal behavior of the crawler.
Everything looks fine except this is taking lot of time to deliver the documents .
Note - 1. I don't find anything in the default trace regarding crawler.
2. The web repository is a 'static' web repository with start page information.
3. As per customer there is no change on the web server.
Any advice how to speed up the crawler.
Thanks & Regards,
Amit

Hello Amit,
You can speed up the queue parameters refer to this thread.
<a href="https://www.sdn.sap.com/irj/sdn/thread?threadID=526552&messageID=3973244#3973244">https://www.sdn.sap.com/irj/sdn/thread?threadID=526552&messageID=3973244#3973244</a>

The crawler could not communicate with the server on SharePoint 2010

Dear Friends,
we have two tier farm application and web is running in one server , and other one is database server, search crawling is not working ,My search service application log giving these below errors , Can you please suggest me how to fix these
errors and how to make the search crawling with out errors.This is very urgent friends, because I have these issues on production server. kindly help me to fix these issues as soon as possible.
The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly. If the repository was temporarily unavailable, an incremental crawl will fix this error.
An unrecognized HTTP response was received when attempting to crawl this item. Verify whether the item can be accessed using your browser.
Please see the recent crawl history below, it is listed 124,062 24 errors.
Crawl started
Crawl completed
Crawl duration
Type
Successes
Warnings
Errors
Top Level Errors
Deletes
Not Modified
Security Updates
10/19/2014 11:55 PM
10/20/2014 12:10 AM
00:15:27
Incremental
0
0
124,062
24
0
1
0
10/19/2014 10:55 PM
10/19/2014 11:11 PM
00:16:26
Full
1
0
124,062
24
valmiki

thanks gain noddy, I found the actual problem now . It is crawling the site collection . but is not crawling the sub site. How to include sub site to crawl , I found this solution in Google , one suggested to do this but did not work out.
Can you please let me know how to include sub site to crawl its contents .
The solution to this is frustratingly simple...
At the root of the site, go to Site Actions >> Site Settings and choose Search and Offline Availability under Site Administration.
Set the page indexing behavior to “Always index all Web Parts on this site”.
Save and run a full crawl.
valmiki

Using Motion 3 to create SW crawl

I'm trying to acheive the Star Wars text crawl (text scrolling up and off into infinite).
I've done google searches and have seen plenty of tutorials on doing it in After Effects, but I think Motion 3 ought to be able to handle this too.
From what I understand in the AE tutorials, the easiest way to do this is to have the text scroll bottom-to-top on the Y axis, then tilt the camera at an angle so that it is looking at the text layer from something like a 70 degrees.
I can use a throw behavior/keyframe to have the text move on the y-axis only - no problem there. But I'm lost as to what to do after that. Should I make the text layer a 3-D layer, or just create a new camera in Motion, or what?

I did this as a test several months ago. My goal was to get it as close to the original crawl as possible. The details are important as far as making it look as good as possible. The right font type and justification, the right color, the right angle, the right speed.
Anyway, I used an Illustrator file for the "Star Wars" so that it would be sharp close up since it fills the screen at the beginning. For the timing and angle, I used a reference movie off You Tube. I also found a Wikipedia article that covers the font type, color, and other particulars.
Here's my project if you want to dissect it. Instead of animating the text, I animated the camera.
[Star Wars Crawl|http://homepages.roadrunner.com/cuttingroom/images/starwarscrawl.zip]
Andy
Message was edited by: Andy Neil

Flaky Time Capsule Wireless Behavior

Must say I am not impressed with the set-up and flaky behavior of the new TS. It seems to be working now, after fiddling with the flakyness for the past several nights, but can't help but wonder if what I experienced is normal, or if I might have an issue.
First, when transferring a large file from my TS to my Tivo, which is on an 802.11.b adapter, it slows my n netowork down to a crawl. Were talking something like 1/30th of the speed.
Second, even though I have my security set up with WEP (because my Tivo b wireless adapter is not compatible with WPA), when I connect to my network with any of my other computers, it only accepts the password if I select WPA-personal in the security setting, even though the TS is set to WEP. Very weird.
Next, my laptop will not connect to the network if I set the network as a private network (not broadcasting the SSID), which I prefer to do if I have to use a WEP security. It won't automatically connect, and when I select to "join other networks", and enter the information, I get either a time out error, or a connection failed error. As soon as I take the network off the private setting, no problems - it connects, remembers the network, and connects automatically.
When I am running a guest network, my laptop refuses to connect to my main network - it automatically connects to the guest network, and gives errors when I try to connect to the main network manually.
I also noticed that when running a guest network, it slows my main n network down to about 1/3 of the speed.
I have a set-up the works right now. It was quite a lot of debugging and fiddling to get it to work. I have set up a 5 ghz channel, and have the devices I want to share movies with only connecting to that SSID. I have made my network non-private so it is broadcasting the SSID, and finally, I am not running a guest network.
In that configuration, everything works the way it should. But shoot, what a pain to get there, and what a cut on the functionality of the TS (no private network, no guest network). Is this normal? Seems like a lot of PITA to get this thing to work right to me, and wondering if there might be something wrong with the unit. What defect would cause this behavior?

Robert Czachorski1 wrote:
First, when transferring a large file from my TS to my Tivo, which is on an 802.11.b adapter, it slows my n netowork down to a crawl. Were talking something like 1/30th of the speed.
Second, even though I have my security set up with WEP (because my Tivo b wireless adapter is not compatible with WPA), when I connect to my network with any of my other computers, it only accepts the password if I select WPA-personal in the security setting, even though the TS is set to WEP. Very weird.
I won't try to address all your issues, but is there any way that you can connect your base station to your TiVo unit with an Ethernet cable?

Web Content Source- Ability to Crawl Links

We are attempting to crawl a web-based content source where the first two levels of the site have actual anchors that point to other .aspx pages. These crawl and index just fine. The third level contains pages where all anchors are actually JavaScript function calls that ultimately submit the page after a number of JavaScript-based calculations occur.
Our problem is that when SES crawler encounters this page, it does not seem to be capable of navigating the JavaScript-based links. Is this expected behavior, or have we not configured something correctly? If this is expected behavior, what are our options? I am confident we are not the first to run into this.

SES can follow simple Javascript-based links, but if they're too complicated it may not be able to.
Where is the actual content stored? Is it in a database, or or a file system? You may be better off crawling those instead of using a web source.
Is there any way you can produce a "map" page that points to all the other pages? If so, you just need to crawl this.

Creating Crawler impact rule through Powershell

Hi All,
I want to create a "Crawler Impact rule" for one web application, I found below script but its not working or it may be not the full script. As i am new in Powershell script writing, please help me writing the full PS script or please suggest any
other script which can create a "Crawler Impact Rule". Thank Yoyu
New-SPEnterpriseSearchSiteHitRule -Name $DefaultWebApplicationName -Behavior 'SimultaneousRequests' -HitRate 64

Hi Aditya,
According to your description, my understanding is that you want to create a crawler impact rule via PowerShell.
For the New-SPEnterpriseSearchSiteHitRule, there is not the Name parameter. So, please make sure the cmdlet is correct.
More information about the New-SPEnterpriseSearchSiteHitRule, you can refer to the link:
http://technet.microsoft.com/en-us/library/ff608048(v=office.14).aspx
Best Regards,
Wendy
Wendy Li
TechNet Community Support

Need Help on using CAS Incremental Crawl with JDBC data source

Hi,
As part of one of the e-commerce implementations, we are implementing a delta pipeline which reads the data from the database views. We plan to crawl the data with the help of CAS JDBC data source. In order to optimize the data reads, we want to use the CAS capabilities of incremental crawl. We rely on CAS to be able to identify the updates available in the subsequent incremental crawl and only read the updated data for any given crawl unless we force a full crawl.
We have tried implementing the above setup using JDBC data source. CAS reads from the database and stores it in the record store. The Full crawl works fine however; we observed some unexpected behavior during the incremental crawl run. Even when there is no change in the database, crawl metrics show that certain number of records have been updated in the record store and the number of updates differ in every subsequent run.
Any pointers what can be the issue ? Does CAS has the incremental crawl capability using JDBC data source ?
Regards,
Nitin Malhotra

Hi Ravi,
Generic Extraction is used to extract data from COPA tables. And delta method used to extract delta records (records created after initilization) is timestamp.
What is this timestamp?
Assume that, you have posted 10 records one after the other to the COPA tables. And we do initilization to move all these 10 records to BW system. Later another 5 records are added to COPA tables. How do you think the system identifies these new 5 records (delta records).?
It identifies based on a timestamp field (eg : Document created on ,a 16 digit decimal field).
Assume that, in our previous initilzation, say "document created on" field for the last (or the latest) record is 14/11/2006, 18.00 hrs. and timestamp is set to the 14/11/2006, 18.00hrs. then when you do delta update, the system treats all the records with "document created on" field is greater than 14/11/2006, 18.00 hrs as delta records. This is how new 5 records are extracted to bw system and timestamp is again set to the new value besed in field in the records.(say 14/11/2006, 20.00hrs)
Assume that, you have two records with "document created on" field as 14/11/2006, 17.55hrs and 14/11/2006, 17.57hrs. and they were updated to the COPA table after 14/11/2006,20.00hrs (at this time last delta load is done) due to some reason. How can you bring these two records? For this, purpose we can reset the timestamp at KEB5. In this eg, we can reset as 14/11/2006, 17.50 hrs) and do the delta again. So, the system picks all the records which are posted after 14/11/2006,17.50hrs again. But remember that, doing this, you sending some of the records again (duplicate records). So, make sure that you sending ODS object. otherwise you end up with inconsistant data due to duplicate records.
Hope this helps!!
Vj

Not able to crawl SP 2003 website in SP 2013

Hi,
I am not able to crawl SP 2003 website in SP 2013. however It is getting crawled in SP2010 without any issue. While crawling in SP2013 I am getting below error:
The SharePoint item being crawled returned an error when requesting data from the web service.
Have looked into event logs as well. Getting event Id of
1314.
I Have also tried to get client context of SP2003 site by passing the network credential but always getting 500 internal server error.
This behavior is totally different as site is getting crawled in SP2010 without any issue.
Ashish Baranwal To know what you know and what you do not know, that is true knowledge

Hi Ashish,
The connector named "Sts2 Windows SharePoint Services 2.0 and SharePoint Portal Server 2003 sites" and prior versions are not supported.
Sts2 is not listed as supported connector in SharePoint 2013.Can refer the below urls.
http://technet.microsoft.com/en-us/library/jj219746%28v=office.15%29.aspx.
Can read the Supported versions by the respective content source in SP2013 under the heading named "Plan to crawl different kinds of content" in the below url.
http://technet.microsoft.com/en-us/library/jj219577%28v=office.15%29.aspx
It is working in SP2010,because it is listed as supported connector here.Can refer the below url
http://technet.microsoft.com/en-in/library/gg153530%28v=office.14%29.aspx
Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem.
s p kumar

IPhoto hangs on opening preferences, slows to a crawl after multiple edits

while i reallly like most of the new features and clean ups for iPhoto, i keep running into spinning beach balls - usually for 5 minutes or more at a time. it's not just iPhoto, but other parts of iLife as well, but iPhoto is where it hits me hardest. and it seems to be iTunes related. Yes, i have a large library, but WHY does it take iPhoto so long to read the same library file the iTunes does in a few seconds? And WHY does it happen every time i open preferences after logging in? 5 minutes is way too long to wait to use a program i routinely use each day.
also, after making a number of edits on a number of pictures, iPhoto slows down to a crawl. it beach balls between images for minutes, then at last never comes back. i have to do a force quit.
what's up with this strange behavior? is it going to be fixed REAL SOON? and it sounds like i can't even go back to 06 since the libraries have changed. Any advice?
d

Dale:
Welcome to the Apple Discussions. I believe TD's suggestion was directed primarily towards the editing issue. The preferences and accessing iTunes from iPhoto issues we'll have to live with until iPhoto issues a fix via an update.
Boot into Safe Mode and try the editing again. If the problem goes away there there's an issue with your account. If it persists then it's either the library or the application.
If the edits work OK in the Safe Mode then create a test library, import a couple of photos and see if the editing issue continues. If not, import some more and try again. You can use the paid version of iPhoto Library Manager to copy events from the problem library to the new one and keep your keywords, comments and titles intact. If the new library continues to be able to edit OK as you continue importing via iPLM then you may want to continue until the new library is complete and trash the old one.
If the editing problem continues in the test library then a reinstall of iPhoto might be warranted. To do so you'll have to delete the current application and all files with "iPhoto" in the file name that reside in the HD/Library/Receipts folder. Then reinstall from the disks that iPhoto came on.
Do you Twango?
TIP: For insurance against the iPhoto database corruption that many users have experienced I recommend making a backup copy of the Library6.iPhoto database file and keep it current. If problems crop up where iPhoto suddenly can't see any photos or thinks there are no photos in the library, replacing the working Library6.iPhoto file with the backup will often get the library back. By keeping it current I mean backup after each import and/or any serious editing or work on books, slideshows, calendars, cards, etc. That insures that if a problem pops up and you do need to replace the database file, you'll retain all those efforts. It doesn't take long to make the backup and it's good insurance.
I've created an Automator workflow application (requires Tiger), iPhoto dB File Backup, that will copy the selected Library6.iPhoto file from your iPhoto Library folder to the Pictures folder, replacing any previous version of it. It's compatible with iPhoto 08 libraries. iPhoto does not have to be closed to run the application, just idle. You can download it at Toad's Cellar. Be sure to read the Read Me pdf file.

My Firefox has slowed down to a crawl for no apparent reason. It starts out okay but gradually slows until it's inoperable. Nothing changed. Everything I can find to do has been checked. What happened?

Oh, good, space. I was using ver 8.x; it was working fine. I left for a couple of hours. When I returned it had slowed to a crawl and was unusable. I'd been getting nagged to update, so I did that first. Cleared cache, cookies, history. Disabled hardware acceleration. Restarted Win (XP SP3), ran a virus check, etc., all the usual stuff. Don't have a proxy. The behavior looks like very, very bad lags. Often the page refreshes itself over and over as it loads. This happens on EVERY page. Chrome and Safari are working fine. I would love my Firefox back so any help appreciated.

Hi mpblonestarlady,
Have you looked at our [https://support.mozilla.org/en-US/kb/firefox-slow-or-takes-too-long-start performance troubleshooting section]? There is a lot of good information in there that should help.
Hopefully this helps!

Random mail behavior

Hi folks,
I have been glued to this forum for weeks now, and I am reading so many strange issues with Mail.
Mail for me has been playing nicely lately, but when I send mail with attachments, for me this is sheet music as .tiff files, I get strange behavior from mail. It dose send the mail, but upon my next restart, I get the mail with attachments in the DRAFTS folder, then mail takes an age to find (?) the mail, and I have to force quit, as it is "not responding". I send via my .mac account.
The only way I have found around this is to sign in to my .mac account on line, delete the mail in question. Though this works it seem counter productive, and seems to deem using the mail app a waste of time.
Is there way to stop this behavior from Mail app?, this is driving me mad, and I am going to crawl up in the fetal position soon and find my 'happy place" if it happens one more time . I may have to seek therapy on account of my mail client, and start a whole 2 part show on Opra, “Apple Mail drove me mad, and now I’m in Recovery”
Thanks,
NoteFarm
(I have tried rebuilding mail boxes)

Thank you Ernie,
You are very kind to respond. You are very kind to respond, and I thank you for your help.
I am not sure how to go about setting up .mac as a pop, but I have been using it as a regular .mac for a long time (years?) with out problem until recently.
I realized after being forced so many times to go the web sign in to my .mac account, to delete mail manually that the only folder that had mail in it was my "sent" mail” folder, as a last ditch effort, I just decided to delete that folder, I bit the bullet and pressed delete as it seemed that, that was the folder causing problems, deleted it, all the mail I had replied to from my father before he passed away, gone, I was so beat up by this I just had to do anything.
For the time being it seems as long as I store nothing on my .mac account (at $99 a year) mail app. behaves. Hmmmmm I am very, very disappointed in Apple over this. These are words I thought I would never say. But honesty compels me. If I wanted to work this hard I would use a cheap Windows machine, but no I spend the extra $$$ and this is what it gets me. Shame on you Apple, shame on you,
If there were a web site that was complaining about me as a musician (like these message boards do about apple mail), I would be e-mailing everyone an apology, begging forgiveness and trying to help, not expecting other people to fix my lack of care. I dont expect every one to like what I play, but it should at least be in tune, and work, and this program, just is not in tune or working.
Ahhhhhhhh!!!!!!!!!
NoteFarm

Crawler behavior

Similar Messages

Maybe you are looking for