Content sources and Crawling rules

Hi,
I tried using "http://www.microsoft.com/downloads" as a content source. For some reason directories like  "http://www.microsoft.com/documentation" get crawled too. But I just want to crawl the subdirectories of the "downloads"
directory...
I think there should be a better way then excluding each wrong path with a crawl rule...
What am I doing wrong?
Looking forward to your reply.

Hi Severin,
If you set the content source using
http://www.microsoft.com/downloads, SharePoint will crawl starting with the URL.However, if there is a link linked to other directories in a page which is under the URL, it will also crawl the directories.
To avoid this issue, you need to limit page depth for crawling the content source. You can set page depth as Content source->[your content source]->Crawl settings.
Best Regards,
Wendy
Wendy Li
TechNet Community Support

Similar Messages

  • Table for Data source and Transfer rule mapping

    Hi all,
    Do we have any table where in we can know the different transfer rules(for different source systems) assigned to the same Data source.
    thanks and regards,
    Rk.

    thank you all,
    the table RSDS does not have 3.x data sources, it has only 7.0 Data sources also it does not have the field mapping.
    RSOSFIELDMAP and RSTSRULES have the field mapping of the datasources, do we have any similiar tables for 7.0 Datasources.
    regards,
    RK

  • Web Content Source - Not Working

    Created a new web-content source & full crawled it.
    When i searched with content text, not working.

    Hi ,
    According to your description, my understanding is that it did not work when you searched a new web site content source in SharePoint 2013.
    Please check whether the web site allows anonymous access to its content. If not, we need to configure a rule for the URL which allows us to enter the credential to access the content of site
    More information, please refer to the link:
    http://mohitvash.wordpress.com/2012/03/07/configure-external-site-as-content-sources-in-sharepoint-search/
    I hope this helps.
    Thanks,
    Wendy
    Wendy Li
    TechNet Community Support

  • Setting permissions in File shares content sources

    Hello,
    we want to crawl File Shares and set them available to searches. The set up of the crawler and the indexing is not a problem and works just fine.
    The problem is that we would like to set the permissions on the results given from this content source and if possible to map the permission already on the files to the result scope. In simple words, only show the results (files) in which the AD User has
    at least read privilege.
    Is this possible? and if not which would be a solution of setting the permissions to this content source?
    Thanx in advance!
    Ioannis

    That is the default behavior of SharePoint search. It will automatically read and use the file ACL.
    One thing to note is that often users can see files but not read them on the file system, in which case they will be able to find the files in the search engine but not open them nor see within the files.

  • SSL Web Applications in Content Sources

    Hi there,
    I'm a bit stumped.  I have created two web applications - an Intranet and My Sites Host.  When I created them I selected the option on both for an SSL site.
    I have now set up the Search Service Application and added my site addresses to the "Local SharePoint Sites" content source.  When I start a full crawl, I get the following error:-
    This item could not be crawled because the repository did not respond within the specified timeout period. Try to crawl the repository at a later time, or increase the timeout value on the Proxy and Timeout page in search administration. You might
    also want to crawl this repository during off-peak usage times.
    Found some answers to that problem on the internet but none of them worked for me.
    So, I then created a new content source to crawl the "SharePoint - 80" web app which is an http app and the crawl was successful.
    Is there a little trick I am missing because my web apps are SSL?

    Open SharePoint Central Administration
    Select Manage service applications, from the Application Management section
    Select the Search Service Application. e.g. Search Service 1
    Select Farm Search Administration
    Toggle the Ignore SSL warnings to Yes
    http://www.infotext.com/help/sharepoint-could-not-estabilish-trust-relationship-for-the-ssltls-secure-channel-when-crawling-ssl-enabled-websites/

  • Getting error while editing content source of search service application

    Hi,
    when i edit the content source of SharePoint Search Service Application and click on OK Button then i get the following error.
    Error it gives : An item with the same key has already been added.
    and when i check it in ULS log it gives this.
    Application error when access /_admin/search/editcontentsource.aspx, Error=An
    item with the same key has already been added.  Server stack trace:    
     at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version,
    FaultConverter faultConverter)   
     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)   
     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs,
    TimeSpan timeout)   
     at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)   
     at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)    Exception rethrown
    at [0]:    
     at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.ErrorHandler(Object sender, EventArgs e)   
     at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.OnError(EventArgs e)   
     at System.Web.UI.Page.HandleError(Exception e)   
     at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)   
     at System.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)   
     at System.Web.UI.Page.ProcessRequest()   
     at System.Web.UI.Page.ProcessRequest(HttpContext context)   
     at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()   
     at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
    System.ServiceModel.FaultException`1[[System.ServiceModel.ExceptionDetail, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]:
    An item with the same key has already been added.   Server stack trace:    
     at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version,
    FaultConverter faultConverter)   
     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)   
     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs,
    TimeSpan timeout)   
     at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)   
     at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)    Exception rethrown
    at [0]:    
     at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.ErrorHandler(Object sender, EventArgs e)   
     at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.OnError(EventArgs e)   
     at System.Web.UI.Page.HandleError(Exception e)   
     at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)   
     at System.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)   
     at System.Web.UI.Page.ProcessRequest()   
     at System.Web.UI.Page.ProcessRequest(HttpContext context)   
     at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()   
     at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
    Getting Error Message for Exception System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: An
    item with the same key has already been added. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is: System.ArgumentException: An
    item with the same key has already been added.  
     at Microsoft.Office.Server.Search.Administration.SearchApi.WriteAndReturnVersion(CodeToRun`1 remoteCode, VoidCodeToRun localCode, Int32 versionIn)   
     at SyncInvokeEditContentSource(Object , Object[] , Object[] )   
     at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)   
     at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)   
     at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)   
     at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage31(MessageRpc&
    rpc)   
     at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)).
    Can anyone please help me, how can i resolved this issue.i am stuck :(
    Thanks in advance
    Muhammad Luqman

    Also tried with this code, but no succeed.
    Add-PSSnapin "Microsoft.SharePoint.PowerShell"
    $searchapp = Get-SPEnterpriseSearchServiceApplication "Search Service Application"
    $StartAddresses = "http://insite"
    $cs = Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $searchapp -identity "Local SharePoint sites"
    $cs | Set-SPEnterpriseSearchCrawlContentSource –StartAddresses $StartAddresses
    I have now only single content source and all others are removed now but still getting same error.
    Muhammad Luqman

  • Search Refinement Panel webpart - filter on Content Source

    Hi,
    I have 6 content sources and would like the search refinement panel to be able to filter by Content
    Source.
    I have the refinement panel filtering for various other Managed Properties, but I'm struggling to get it to work for Content
    Source.
    So, for example, if my content sources were called 'Content Source 1', 'Content Source 2' etc. then then Search refinement Panel webpart would show:
    Content Source
    All Content Sources
    Content Source 1
    Content Source 2
    And clicking on the filter link would refresh the page and show results only from that particular content source.
    The content sources are a mixture of SharePoint and BCS databases and I have no access to modify either.
    I have tried making my own 'MyContentSource' Managed Property in Search administration, but I don't know what the mappings would be.
    Can anyone help?
    Thanks.

    Thanks Lisa, but scopes won't work here.
    We have scopes defined already, one for each content source and 1 convering all the content sources, along with other conbinations of multiple content sources.
    The scenario is this:
    When I first visit the search results page, I want to see results from all the content sources so I use the scope containing all 6.
    Then I want to use the refinement panel webpart to filter down to a single scope, all from the search results page.
    This is something that is very easy to do using the Google Search Appliance and our company are currently choosing between it and Sharepoint search going forward.  I think this will be the deal-breaker if I can't get this to work under SharePoint search.
    At the moment I'm thinking that I might be able to do it using the SiteUrl or ContentType or something, along with a custom filter in the xml of the Refinement panel.  Haven't tried it yet.
    So along the lines of this:
    <CustomFilters MappingType="ValueMapping" DataType="String" ValueReference="Absolute" ShowAllInMore="False">
          <CustomFilter CustomValue="[name of my content source]">
            <OriginalValue>[specific site url or content type maybe?]</OriginalValue>
          </CustomFilter>
    Unless anyone has any better ideas?

  • Url will use to include in crawl rules settings and in content source

    HI
    i have a internet application and i want to add some urls to crawl rules to not display in search results.
    below is  alternate access mappings settings , which has an url for Default zone , and one for Intranet zone
    which url i will use to include in crawl rules settings and in content source
    adil

    Hi,
    As I understand, you want to know which Url of the application should be added to the crawl rules setting and content
    source.
    I recommend you to use default Url.
    If you use non-Default zone Url, it will impact at query time by crawling the non-Default zone, because the query processor
    attempts to translate the URL to the Default zone's Public URL before processing the query.
    The article below is about Alternate Access Mappings (AAMs) *Explained.
    http://blogs.msdn.com/b/sharepoint_strategery/archive/2013/05/27/alternate-access-mappings-explained.aspx
    Best regards
    Sara Fan
    TechNet Community Support

  • Crawl Rule for Crawling Specific Page across all the site collection under one content source

    I have a MOSS 2007 web Application added to SharePoint 2010 Search Service Application Content Source, which is having 50+ Site Collections which follows same template. Every Site collection having one CustomPages Library and CustomPage.aspx. 
    If search Service would like to crawl only CustomPage.aspx from all the site collection under the web application, what would be the Path or Regular expression while creating the Crawl Rule. 
    i have given the path as  http://webapplication/sites/*/CustomPages/CustomPage.aspx, but this is not working. Can anyone help me out with correct path or regular expression in this case.
    Thanks..

    To crawl that page you'd also need to crawl the pages beneath it, otherwise SharePoint will never get to the page to check if it matches a rule. I assume you have some other rules that are blocking the rest of the content of the site?
    Try adding another rule that allows http://webapplication/sites/* then have your include rule beneath that and another exclude rule for
    http://webapplication/sites/*/* beneath that. That should eliminate nearly all the other content and provide you with your custompage.aspx results.

  • Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Ful

    I am trying to resolve this after setting up my new Farm.I am having 2 wfe ,1 sppserver,1 server dedicated for crawl ,1 for search and index  in my farm. I guess dedicated crawl server  is the root cause of the issue,i also did
    disableloopback check settings but still facing the same issue,any solution?
    Please Mark it as answer if this reply helps you in resolving the issue,It will help other users facing similar problem

    Hi Aditya,
    Please refer to the links below and try if they help:
    Add the full read rights to Default Content Access Account of Search Administration via the web application’s user policy.
    http://sharepoint.stackexchange.com/questions/88696/access-is-denied-verify-that-either-the-default-content-access-account-has-acce
    Grant the Default Content Access Account permission in User Profile Service Application
    http://www.sysadminsblog.com/microsoft/sharepoint-search-service-access-is-denied/
    Modify you crawl rule
    http://wingleungchan.blogspot.com/2011/11/access-is-denied-when-crawling-despite.html
    Add crawl servers ip to local host file
    http://wellytonian.com/2012/04/sharepoint-search-crawl-errors-and-fixing-them/
    Regards,
    Rebecca Tu
    TechNet Community Support

  • Web Content Source- Ability to Crawl Links

    We are attempting to crawl a web-based content source where the first two levels of the site have actual anchors that point to other .aspx pages. These crawl and index just fine. The third level contains pages where all anchors are actually JavaScript function calls that ultimately submit the page after a number of JavaScript-based calculations occur.
    Our problem is that when SES crawler encounters this page, it does not seem to be capable of navigating the JavaScript-based links. Is this expected behavior, or have we not configured something correctly? If this is expected behavior, what are our options? I am confident we are not the first to run into this.

    SES can follow simple Javascript-based links, but if they're too complicated it may not be able to.
    Where is the actual content stored? Is it in a database, or or a file system? You may be better off crawling those instead of using a web source.
    Is there any way you can produce a "map" page that points to all the other pages? If so, you just need to crawl this.

  • Modify Content Source Crawl Settings Programmatically

    Hi,
    I am working with SharePoint 2010 Search Administration project, in our project we need to create content sources from custom SharePoint web part. I am using SearchContext.ContentSources.Create() function to create the content source but the issue is
    that I want to set the Crawl Settings (Limit Page Depth & Limit Server Hops) values while or after creation programmtically.
    I know that it can be done from Central Administration and PowerShell commands but how can I can do it programmatically, Is there a way for that.
    Regards,
    Ehab
    Ehab

    You don't run PowerShell commands from the web part. (Technically you might be able to do it but it's not a good idea).
    First make the commands work from either PowerShell or find the alternatives for C#/VB.NET. The syntax is not identical but it's very similar and the object model is the same.
    Once you've got that working, port it to C# if it's not already there so it runs as a console app. Then bolt on the custom web part at the ednd.

  • To transfer contents of one MBP to another do they both have to have the same version of the OS? I have one which is SL (source) and the other is ML (target) .. thanks

    To transfer contents of one MBP to another do they both have to have the same version of the OS? I have one which is SL (source) and the other is ML (target) .. thanks

    No, they can be running different versions of OS X, but beware that any old PowerPC apps will not run with Mountain Lion.

  • Content objects and datasources for different sources...

    Dear Experts,
    Please write me, which content cubes, dso obects, datasources should be used for the following modules:
    HR
    MM
    SD
    CRM
    SCM
    FI
    CO
    CO-PA
    Thank you very much in advance!

    Hi Joe,
    SAP provides a content help document, in which you can find all the BI content prvided by SAP in all these modules.
    Check the following link for details.
    [BI Content|http://help.sap.com/saphelp_nw70/helpdata/en/3d/5fb13cd0500255e10000000a114084/frameset.htm]
    Go to repesctive functional area , select a functional substream(in the left navigational panel) on which you wish to start reporting. Once you select the sub functional stream you will see all BW objects under it, you could then choose the correct infoobject for more details.
    [HR|http://help.sap.com/saphelp_nw70/helpdata/en/26/77eb3cad744026e10000000a11405a/content.htm]
    [SD|http://help.sap.com/saphelp_nw70/helpdata/en/17/cd5e407aa4c44ce10000000a1550b0/frameset.htm]
    [CRM|http://help.sap.com/saphelp_nw70/helpdata/en/04/47a46e4e81ab4281bfb3bbd14825ca/frameset.htm]
    [SCM|http://help.sap.com/saphelp_nw70/helpdata/en/29/79eb3cad744026e10000000a11405a/frameset.htm]
    SCM contains parts of MM.
    [FI|http://help.sap.com/saphelp_nw70/helpdata/en/65/7beb3cad744026e10000000a11405a/frameset.htm]
    [CO|http://help.sap.com/saphelp_nw70/helpdata/en/65/7beb3cad744026e10000000a11405a/frameset.htm]
    [CO-PA|http://help.sap.com/saphelp_nw70/helpdata/en/53/c1143c26b8bc00e10000000a114084/frameset.htm] - This is generally a custom data source and is generated.. not sure if the content would be useful
    Hope it helps,
    Best regards,
    Sunmit.

  • The account password was not specified. Error after setting up a second Crawl Database and Crawler on different server.

    We have a three server farm. (SPWebTest, SPAppTest, SPDBTest)
    We have added an additional database server (SPDBRMTest) and an additional application server (SPAPPRMTest).
    Today I created a new Crawl Database on SPDBRMTest and a new Crawler on SPAPPRMTest.
    I create a distribution rule to all crawling activity for 1 web application to the new crawl database on SPDBRMTest.
    This web application was part of the original crawl and had no errors or issues. We are trying to scale our Search to improve performance but when a full crawl is executed against this content source I get the following crawl error:
    "The account password was not specified. Specify the password."
    I have tried re-entering the "Default Content Access Account" but the issue continues.

    Hi Brian,
    when you add the crawl rules, do the account that is provided have the permission to read the content at the web application?
    http://technet.microsoft.com/en-us/library/jj219686(v=office.15).aspx
    please disable the loopback check, perhaps it may help:
    http://support.microsoft.com/kb/896861
    Regards,
    Aries
    Microsoft Online Community Support
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

Maybe you are looking for