Crawling/Indexing external links in CM-Repository

Hi all,
I would like to do the following:
- Create an external Link or a HTML site with external links in a CM-Repository.
- Create an index with a crawler, which crawls the
created  links and HTML site and follows links on
external sites to a certain depth.
I used the standard crawler for this. But it had not the
desired effect: Only the HTML sites directly in
CM-Repository and behind the external links are crawled
but no links in these sites are recognized by the crawler
I adjuste the crawler parameters like:
- Maximum Depth: 10
- Follow Links (ticked)
- Follow Redirects on Web-Sites (ticked)
but it did not help!
I know it sounds like you could just say: "Use a
Web-Repository for this..." but I can not do this in
this context. Or is this the only repository manager
which is aware of following links?
Any help will be appreciated!
Best regards!
Mirko

Hi Matthias,
thank you for your reply!
The reason why I did not want to use Web-Repositories for this was, that I could not find a documentation on how to create the necessary objects in KM programmatically.
(And I think it is not obvious from the documentation that you specifically need a Web-Repository for this!)
Is there a API to create a HTTP-System, WebSites and Web-Repositories which is documented?
I could not find any on SDN, but perhaps you or anybody else knows some resources?
Best regards!
Mirko Heger

Similar Messages

  • External Links in KM Repository

    Hi,
    How to create external links in KM repository in Web Dynpro Java application?
    Thanks
    Sundar

    Hi,
    Creating Links
    When you create a link, you can specify whether it is internal or external. You create a link as follows:
    IResource link = parent.createLink(u2033linku2033, u2033/documents/fileu2033,LinkType.INTERNAL, null);
    Identifying the Link Type
    It is useful to know the link type because it gives an indication of the operations that are possible on the link target. The repository framework distinguishes between internal and external links. Internal links refer to resources that are integrated in the repository framework, whereas external links refer to objects that are not integrated in the repository framework, for example, documents stored on a web site. If a link is internal, you can access the target as a resource that offers the operations associated with repository framework resources. If a link is external, you can only access the target as a URL and not as a resource object. As a consequence, most of the operations associated with resources are not available.
    Determining the link type is therefore often a prerequisite for handling a link target effectively.
    The code extract shows how you can find out what type of link is involved.
    if( LinkType.INTERNAL.equals(resource.getLinkType()) ) {
    // resource is an internal link to another resource
        IResource target = resource.getTargetResource();
        if( target == null ) {
            // link is broken, because target does not exist anymore as a resource,
            // so use the RID of the internal linku2019s target instead.
            URL targetURL = resource.getTargetURL();
    } else if( LinkType.EXTERNAL.equals(resource.getLinkType()) ) {
    // resource is an external link to an URL
    URL target = resource.getTargetURL();
    } else { // if( LinkType.NONE.equals(resource.getLinkType()) )
    // resource is not a link
    The repository framework also distinguishes between internal links that are static and dynamic. Dynamic links follow their target after the execution of operations like copy and move, whereas static links do not. When you create a link, the repository framework decides whether it will be static or dynamic.
    For more information: SAP NETWEAVER DEVELOPER'S GUIDE 2004S. Page .20
    Patricio.

  • TREX - Indexing external links' metadata, but not content

    Hi All,
    I have a number of external links in the CM repository and would like to include them into the search results.
    However during indexing, all external links fail pre-processing because I believe it is trying to get the content of the link but it does not have authorisation.
    Is there a way to just index the metadata and ignore the content?
    Cheers,
    Vic

    Hi Achim,
    I saw that there was host specified with
    http://<%hostname%>. So i thought it should work. I tried editing this value and give the host  name.
    For example my portal address is
    http://portal.company.com:50000
    so i gave the address
    http://portal in my host parameters.
    This is still not working. Should i change the parameter to something else??
    Thanks,
    Kiran.

  • Trex failing to index word docs in FS repository

    Hi,
    I am having errors while indexing a FS repository.
    The errors in TREX monitor are
    return message:Content-Length -vs- Actual Read mismatch
    return code:8030
    Document Status: Preparation Failed
    I tried reindexing these failed entries, but did not work out.
    All of these are word docs. Moreover Crawler errors for
    folders are also visible in the errors list.
    Looking into the TREX trace , was able to get the follwing.
    HTTP-GET failed for URL <............>with Errorcode -30 , but HTTP-HEAD worked, trying again,
    Mimetype application/msword is not based on TEXT, but was detected as text type; content might be corrupted and will be ignored
    When I tried opening some of these docs from Windows,
    I did not have access to the same itself and got an error. Would this also be the case for the index_service user?
    Also not all docs which are visible on the Windows side are visible from the portal in the KM nav iview.
    Platform :EP 7.0 SP13, TREX 7.0
    Any help would be appreciated
    Rgds

    Hello,
    A problem related to the search for this FS repository is that search takes a very long time to display results.
    The datasource is quite big with nearly 90,000 docs and the index is supposed to index external links too with
    indexContentOfExternalLink, indexContentOfExternalLink properties set also.The search scope is based on Indexes in the search iview .
    When a normal user who has the relevant role for the searh iview runs a search it takes a very long time, nearly 30 min!!
    But if a super admin runs the same , it comes up with results immediately.
    Is this some kind of authorisation issue.The index is having everyone full control and I was not able to see any thing much in the default trace too.
    Is there any particular trace/log file to be checked for this?Has something been missed out in the index creation process?
    Hope someone can comment on this
    Rgds

  • Indexing of external links in repositories

    Hello,
      I have a doubt regarding the concept of indexing of external links in FS repositories. When we set indexContentOfExternalLink and showWithoutDatasource
    to true how does this actually work?The help documentation does not provide much information .
    I have a FS rep where I have links to some external pages.
    a)Will search pick up terms in the target of these links(i.e in these pages)
    b)Also if the target is a html page which has further hyperlinks on that , will a search for those hyper links work ?
    This feature is also not returning any results for links now ..
    I dont want to use a web Rep for this .
    Hope someone can clear this up.
    Rgds

    Had seen somewhere that if the target has javascript it is not picked up. But the target of the link is a html page which has only href tags , and None of these are also picked up by search.
    Only the link is picked up by Trex jst like it displays normal files.Also would Trex pick up links to docs on other FS repositories ?
    There are no crawler errors.
    Inviting any comments on this , because I quite lonely on this thread.
    Rgds

  • IWeb 08 external links correct on podcast entries, however wrong on index

    Hope the title makes sense; Basically the podcast entries we create have an external link to some referenced sites and they work fine when looking at that podcast entry, HOWEVER the auto generated index page (listing all the podcast entries) shows a preview of the text of the Podcast and we've noticed that the SAME external links are altered, forward slashes are replaced with %253 and spaces with %25220 (I can understand spaces being replaced with %20! but NOT this)
    This only happens with EXTERNAL links. You can see it on the Index page BEFORE you even upload the site. So clearly something is odd in iWeb 08 and not specific to using .MAC or an external host.
    Is anyone else seeing this? We recreated the Podcast from scratch to ensure no iWeb 06 bad info was present.
    This just tops it off... The LACK OF NOTIFICATION on sites URL's being moved due to folder name changes and now this. iWeb 08 needs some attention ASAP. Is there a clean way to back rev to iWeb06 (can you keep both installed at the same time?)

    I'm having the exact same issue, not necessarily with podcasts, just that any external link works fine except in the rss feed, where characters are replaced with %xxx and return 404 errors.

  • Is there a limit for external links in TOC using RoboHelp 9.2?

    Hi,
    This is my first time posting to the Adobe forums. Here is my problem.
    I have been maintaining help projects using RoboHelp 9.2 for quite some time now to generate WebHelp. I now have four large projects that are linked through a URL in each project's TOC.  Example: I have projects A B C D. In A's TOC, I include a link to B C D default topics. In B's TOC, I include a link to A C D default topics. And so on ... The fourth project (D) is new.
    The problem I've encountered is with the image appearing in the TOC next to the external link entry for project D in the generated WebHelp.  The image showing up for project D is different from the other two, even though the properties for that page are exactly the same as the other two.  This happens in all three remaining projects (A B C). Even when I change the image (from the image index), the same image (i.e. the standard image for a topic page) appears regardless.
    What should appear in all cases is image 13.gif.
    (I'll have to figure out later why 19.gif is selected in the properties dialog, whereas 13.gif is actually showing up in WebHelp. Doh.)
    Any ideas?
    thanks,
    denise

    karl219 wrote:
    1-Time machine seems to be coming up often in the discussions.If my backup is on my external hard drive, why would space be taken up on my macbook hard drive?
    Read the first extract for the definition of backup snapshots and how they are handled by the OS.
    2-Shoud I delete older backups? If for example, all I need from Time Machine is the latest backup done today, can I get rid of the 2+ yeras of backups, or do they not take up that much space?
    No.  Time Machine should be on a separate HDD and will take care of deleting backups when necessary on that drive.
    3-Back to the issue, considering I don't have access to a an apple store for the next few months, what is my best plan of action?
    I would consider installing a larger internal HDD.  Capacities to 1 TB are now available.
    4-I am considering buying a new macbook 15inch, and get more space, however, I'm scared that the same problem will happen again. Is this common on all macs? Will using Time Machine to "Restore" my backups bring along the issues as well?
    All Macs using Time Machine will experience the same conditions.  The problem is that many users do not under stand what 'Other' is and that leads to confusion.  There are many essential OS files in that category that should not be touched as well as some user files.  The issue is not to reduce 'Other' as such, but reduce the total amount of data on the HDD if space is becoming a premium.
    Please read these two extracts:
    http://pondini.org/OSX/LionStorage.html
    http://pondini.org/OSX/DiskSpace.html
    Ciao.

  • KM Search and External Link Question

    Q1: I have created a Meta Data Properties for a Folder, so any document Created in this folder will require these data to be entered, I want to put a Filter Option on the this folder Iview where user can filter data on the basis on Meta data Property or Search document based on Multiple Meta Data Property using TREX or any other search.
    Q2. If I create a External link to a document in KM folder, will TREX search in that document.
    Thanks in Advance
    Jagraj Dhillon

    Hi Jagraj,
    Q1: You have to define the Meta Property as Indexable than you can use TREX for searching for documents with the specific value of this property. Of cause you can as well filter the documents when displaying the content of the folder. In this case you have to implement a resource list filter.
    Q2: TREX is able to index links as well.
    The question is it you really mean External Links for referencing documents in KM folders because normaly you do this by an internal link. A external link in most cases is a reference outside the portal. Nevertheless see http://help.sap.com/saphelp_nw70/helpdata/en/73/66c090acf611d5993700508b6b8b11/frameset.htm so you can see you can define a parameter indexContentOfExternalLink and you can define a parameter IndexInternalLinks. In this case the index will contain as well the content of the links.
    Best Regards
    Frank

  • External link for documentation in Query

    Hi,
    in a query I have trained the display document link for Date and Masterdata. When I select the link I can visualize the content of the document created in BW. 
    I have the necessity, instead, that when I call the link of the document from the query it has directly or indirectly aim also (double jump) to an external link where the documentation there is (example: in the document contain link: http://www.lg.com/). It is possible?
    Thanks.

    Hi Kams,
    I have not made sense of well perhaps. In the property of the query I am able setting the visualization of the in partnership documentation to every InfoObject. On Click the "leaflet" opens in the web all documents associated for the InfoObject. My problem is that the documentation resides on an external link to BW and therefore I want to make to aim not to the Repository SAP but to an iperlink that open the documentation of that object. Is it possible?
    If I use the RRI, in the context menu opens only on the characteristics in drill down and not on all those contained in the query as instead it happens if imposes the visualization of the documentation.
    Thanks.
    Best Regards.
    Charly

  • Search Error: This item could not be crawled because the crawler could not connect to the repository.

    Our Search Service got hosed after install of SP1/June CU (didn't upgrade), so I re-created it, but when setting up rules, content sources, I am getting the following error:
    This item could not be crawled because the crawler could not connect to the repository.  I tried crawling using admin account, so I don't think it is permission related.  The DisableLoopbackCheck in the registry is already set...
    We previosly had crawl working on this farm, so shouldn't be because we are using host headers and the loopback issue.  Does anybody have any ideas?  Another error that may be related...
    SharePoint Web Services Round Robin Service Load Balancer Event: EndpointFailure
    Process Name: OWSTIMER
    Process ID: 1612
    AppDomain Name: DefaultDomain
    AppDomain ID: 1
    Service Application Uri: urn:schemas-microsoft-com:sharepoint:service:edd515218442481abce89e54e3c63a64#authority=urn:uuid:a297b0fb965f465ca42051f55fa3f4bd&authority=https://spapp-wc-2p:32844/Topology/topology.svc
    Active Endpoints: 4
    Failed Endpoints:3
    Affected Endpoint:
    http://[ourservername]:32843/edd515218442481abce89e54e3c63a64/ProfileService.svc
    Any suggestions are appreciatted, this is a tricky one...
    Jonathan Herschel

    Make sure your hosts file on each server in your farm has 127.0.0.1  webappname in it for each SPSite you're trying to crawl
    Look in %systemroot%\system32\drivers\etc and edit the hosts file so that it has entry for web app
    make sure search account listed as access account is also service admin for search service
    check user policy on web app, make sure that account has full read
    Check that SharePoint Web Services Root app pool is started on server that is indexing farm
    if still cant crawl, give search account db_owner on content db using sql server management studio
    Set BackConnectionHostNames for site collection.
    Then in a non prod environment, reset index and perform a full crawl, repeat in prod.
    Stacy Anothersharepointblog.blogspot.com

  • Related Content links in the repository tables

    We have 150 FR reports and many of them are using Related Content links. I am trying to find the easiest way to view all the Reports that have Related Content links and which reports they are linking to. I was trying to see about writing a sql against the tables but need to know which tables to join and what fields to select.
    If anyone has a sql they have built to do similar selects or a feature in EPM that I am missing let me know.
    Thanks!
    Robert

    Indeed, we have the same issue with our Solaris 10 containers.
    None of the network interface are visible since we upgraded to Solaris 11.1.
    Highly likely that we will have to fallback to 11.0
    root%A-infmagt004[8] zoneadm -z udsr02 boot
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: net0 registered
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: dmz0 registered
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: vd155 registered
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: vd157 registered
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: vd153 registered
    Dec 7 09:55:41 gvas-infmagt004 mac: [ID 469746 kern.info] NOTICE: vd301 registered
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: net0
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: dmz0
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: vd155
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: vd157
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: vd153
    Dec 7 09:55:43 gvas-infmagt004 dlmgmtd[65]: [ID 183745 daemon.warning] Duplicate links in the repository: vd301
    root%A-infmagt004[4] zlogin udst02 ifconfig -a
    lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
    inet 127.0.0.1 netmask ff000000
    useless.

  • External links to websites in IBA

    Hi. Do Apple permitt using external links to my own and my partners websites? Or is it considered as advertising/marketing (and not allowed)?  Thanks in advance! 

    I think a lot  of your questions can be assisted by checking out the  help files under Media.
    There are widgets available  dealing  with video and adding sound... the help files will  give you the formats.
    If you  own copyright of  the words and music, no problems from apple. Add a copyright notice to your page at a the  smallest  font!
    Spotify,  not sure on Aple policy on that... insert a link test it and hope it gets  past the review.
    Answers?    Long ago as I crawled home in a drunken state carring not a jot for anything.... I  accidentally stood on an Elfs toes, he cried out in pain. I  carried on  walking as he shouted me back..... sorry.. I am too busy, I am already booked.
    Suddenly he was in front of me hopping up and down and saoid BOOKED, I'll give you BOOKS then..
    and you  can sit and help people with their problems  and learn about Elfs n Safety...... and he stuck me on this  group. 

  • Strange problem with changing external links site-wide

    Hi, i'm having a problem that i just cant work out; i'll describe whats happening and would greatly apriciate any help anyone can give.
    when i type in an external/absolute link in dreamweaver sidewide change link function i keep getting a message saying it cant find any links with that name ("no links to.....found".
    i've tried creating a simple link in my index.html like http://google.com then searching for that so i know its there and dreamweaver still tells me that http://google.com, www.google.com, /google.com and every other variation i can think of doesnt exist. In addition to that when i search for broken links and click external links to try changing them there, the search function finds the links i'm looking for and lets me type in the new link i want but when i hit enter it reverts to the old link.
    at first i thought it was a problem with my copy of dreamweaver but i downloaded my site to a hard drive and tried on my uni computers copy of dw cs3 and cs4 with the same outcome.
    i have about 1000 of the particular link in my site that i need to change and really dont want to do this manually, if anyone could help i'd be extremely grateful.
    thanks, Andy

    note: although it seems straight forward i also followed youtube guides on defining my site and changing the links to make sure i hadnt missed something obvious.
    although it still maybe something obvious i just cant see it and i've been googling to find out if other people have had this problem but couldnt find anything

  • External links suddenly open in a new window instead of a tab

    I am using Nightly (20.0a1) on WIndows 8, I use the program Teamspeak 3 and it used to be that whenever I clicked on a link it would just open in a new tab in the Nightly window that was open, but now it opens a completely new Nightly window. I have the box ticked in settings, I've gone through all of the about:config things I've seen online and nothing works, I really have no idea what to do. I just did something with envoirenmental variables which didn't help (meant that it opened a whole new instance of Nightly whenever I clicked on an external link) so I deleted the variable. Anyone got any suggestions? If it is relevant, before this problem started I dragged a tab from one window of Nightly to another (think it might've been the cause of this), and yes, I've tried dragging tabs around from window to window to fix this and it didn't work.

    Thanks, but I already had it set to that, I've heard of many people with this problem and tried literally all fo the about:config fixes that I could find but none worked, I think it is some sort of assosiation problem with hyperlinked text with windows but I really don't know. Anyone who can help me please do, this is getting annoying. Keep in mind that I have Windows 8 when making suggestions also.

  • How to force external links to open in new browser window/tab?

    Greetings. I'm here because searching elsewhere kept coming up with references to javascript. I've created an interactive PDF with both internal and external links. It works just fine when viewed in either Acrobat or Reader. However, it is being downloaded to users' browsers and when they access the external URLs, the new sites replace the existing page in the same window.
    Is there a simple way for me to include an action in the PDF prior to distribution that will force it to open external URLs in a new browser window or tab?
    And for the record, I barely know how to spell Javascript, much less how to write or implement it. So if there is a simple solution, I hope you can also help walk me through the steps to implement it.
    Thanks a ton.

    There are two issues here:
    - How to do it using JS
    - How to do it in a non-Adobe browser plugin
    The first is pretty straight forward. You use a code like this, replacing the dummy URL with your own:
    app.launchURL("http://www.example.com", true);
    The second issue is more problematic, and in fact there isn't really a solution to it. If the plugin used supports this method then it will work, if not, then you're out of luck. Unfortunately, outside of the Adobe software I don't think that any other plugin supports it.

Maybe you are looking for

  • A dynamic logic

    Hello friends, How can sort below weekdays meaningfully like Monday, Tuesday, Wednesday, Thursday, instaed. There must be a dynamic logic . This is the output in the ALV-Grid-Output. WD        Nr-Accidents Monday        1 Saturday      2 Tuesday     

  • Service Call from BAPI

    Hi, I am new to ABAP WebDynpro and facing some problems. I am creating a service call using BAPI BAPI_MATERIAL_DISPLAY using the wizard.  In the Adapter step the wizard shows that the IMPORTING, EXPORTING, and CHANGING parameters are being called as

  • I have no data service after io6? Anyone know a solution?

    As in no 3G or edge. Can't send texts.

  • Save blob to jpeg

    Hi Dear Friends, I have a table with image column blob/clob. How do I convert this binary image data to jpeg so i can open it in ie? Thanks a lot

  • Problem importing Iphoto 7.1.5 library to Iphoto 9.1.1

    Hi, I just bought the new Iphoto 9.1.1 but it can't import my existing library from the current Iphoto 7.1.5! Does anyone know how to fix the problem? If I can't import this library it makes no sense installing the new Iphoto Thanks for your help