Robots.txt file

How do I create a robots.txt file for my Muse site?

You can follow the guidelines from Google to create a robots.txt file and place it at the root of your remote site.
https://support.google.com/webmasters/answer/156449?hl=en
Thanks,
Vinayak

Similar Messages

  • Placement of robots.txt file

    Hi all,
    I want to disallow search robots from indexing certain directories on a MacOS X Server.
    Where would I put the robots.txt file?
    According to the web robots pages at http://www.robotstxt.org/wc/exclusion-admin.html it needs to go in the "top-level of your URL space", which depends on the server and software configuration.
    Quote: "So, you need to provide the "/robots.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration."
    Quote: "For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be /usr/local/etc/httpd/htdocs/robots.txt".
    On a MacOS X Server would the robots.txt go into the "Library" or "WebServer" directory or somewhere else?
    Thanxx
    monica
    G5   Mac OS X (10.4.8)  

    The default document root for Apache is /Library/WebServer/Documents so your robots.txt file should be at /Library/WebServer/Documents/robots.txt

  • Use of robots.txt to disallow system/secure domain names?

    I've got a client who's system and secure domains are ranking very high on google.  My SEO advisor has mentioned that a key way to eliminate these URLs from google is through the use of disallowing content through robots.txt.  Given BC's unique nature of dealing with system and secure domains I'm not too sure if this is even possible as any disallowances I've seen or used before have been directories and not absolute URL's, nor have I seen any mention of this possibility around.  Any help or advice would be great!

    Hi Mike
    Under Site Manager > Pages, when accessing a specific page, you can open the SEO Metadata section and tick “Hide this page for search engines”
    Aside from this, using the robots.txt file is indeed an efficient way of instructing search engine robots which pages are not to be indexed.

  • Robots.txt and Host Named Site Collections (SEO)

    When attempting to exclude ALL SharePoint Sites from external indexing, when you have multiple web apps and multiple Host Named Site Collections, should I add the robots.txt file to the root of each web app, as well as each hnsc? I assume so, but, thought
    I would check with the gurus...
    - Rick

    I think, one for each site collection as each site collection has different name and treated as web site.
    "he location of robots.txt is very important  It must be in the main directory because otherwise user agents (search engines) will not be able to find it.  Search engines look first in the main directory (i.e.http://www.sitename.com/robots.txt)
    and if they don’t find it there, they simply assume that this site does not have a robots.txt file"
    http://www.slideshare.net/ahmedmadany/block-searchenginesfromindexingyourshare-pointsite
    Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem. ****************************************************************************************** Thanks -WS MCITP(SharePoint 2010, 2013) Blog: http://wscheema.com/blog

  • Robots.txt -- how do I do this?

    I'm not using iWeb, unfortunately, but I wanted to protect part of a site I've set up. How do I set up a hidden directory under my domain name? I need it to be invisible except to people who have been notified of its existence. I was told, "In order to make it invisible you would need to not have any links associated with it on your site, make sure you have altered a robots.txt file in your /var/www/html directory so bots cannot spider it. A way to avoid spiders crawling certain directories is to place a robots.txt file in your web root directory that has parameters on which files or folders you do not want indexed."
    But, how do I get/find/alter this robots.txt file? I unfortunately don't know how to do this sort (hardly any sort) of programming. Thank you so much.

    Muse does not generate a robots.txt file.
    If your site has one, it's been generated by your hosting provider, or some other admin on your website. If you'd like google or other 'robots' to crawl your site, you'll need to edit this file or delete it.
    Also note that you can set your page description in Muse using the page properties dialog, but it won't show up immediately in google search results - you have to wait until google crawls your site to update their index, which might take several days. You can request google to crawl it sooner though:
    https://support.google.com/webmasters/answer/1352276?hl=en

  • Question about robots.txt

    This isn't something I've usually bothered with, as I always thought you didn't really need one unless you wanted to disallow access to pages / folders on a site.
    However, a client has been reading up on SEO and mentioned that some analytics thing (possibly Google) was reporting that "one came back that the robot.txt file was invalid or missing. I understand this can stop the search engines linking in to the site".
    So I had a rummage, and uploaded what I thought was a standard enough robots.txt file :
    # robots.txt
    User-agent: *
    Disallow:
    Disallow: /cgi-bin/
    But apparently this is reporting :
    The following block of code contains some errors.You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted. Please, remove all the reported errors and check again this robots.txt file.
    Line 1
    # robots.txt
    Line 2
    User-agent: *
    Line 3
    Disallow:
    You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
    Line 4
    Disallow: /cgi-bin/
    You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
    If anyone could set me straight on how a standard / default robots.txt file should look like, that would be much appreciated.
    Thanks.

    Remove the blank disallow line so it looks like this:
    User-agent: *
    Disallow: /cgi-bin/
    E. Michael Brandt
    www.divahtml.com
    www.divahtml.com/products/scripts_dreamweaver_extensions.php
    Standards-compliant scripts and Dreamweaver Extensions
    www.valleywebdesigns.com/vwd_Vdw.asp
    JustSo PictureWindow
    JustSo PhotoAlbum, et alia

  • Robots.txt question?

    I am kind of new to web hosting, but learning.
    I am hosting with just host, I have a couple of sites (addons). I am trying to publish my main site now and there is a whole bunch of stuff in site root folder that I have no idea what it is. I don't want to delete anything and I am probably not going too lol. But should I block a lot of the stuff in there in my Robots.txt file?
    Here is some of the stuff in there:
    .htaccess
    404.shtml
    cgi-bin
    css
    img
    index.php
    justhost.swf
    sifr-addons.js
    sIFR-print.cs
    sIFR-screen.css
    sifr.js
    should I just disallow all of this stuff in my robots.txt? or any recommendations would be appreciated?  Thanks

    Seaside333 wrote:
    public_html for the main site, the other addons are public_html/othersitesname.com
    is this good?
    thanks for quick response
    Probably don't need the following files unless youre using text image-replacement techniques - sifr-addons.js, sIFR-print.cs, sIFr-screen.css, sifr,js
    Good to keep .htaccess - (can insert special instrcutions in this file) - 404.shtml (if a page can't be found on your remote server it goes to this page) - cgi-bin (some processing scripts are placed in this folder)
    Probably you will have your own 'css' folder.  'img' folder not needed. 'index.php' is the homepage of the site and what the browser looks for initially, you can replace it with your own homepage.
    You don't need justhst.swf.
    Download the files/folders to you local machine and keep them in case you need them.

  • No robots.txt?

    Hello,
    just a short question: Why does Muse not create a robots.txt?
    A couple of months ago a had a client who didn´t showed up on any search results but the site was online for more than a year.
    We investigated and found out that the client had no robots.txt on his server. Google mentions ( sorry i cannot find the source right now) that it will not index a page if there is no robots file.
    I think that it is important to know this. It would be cool if there is a feature in the export dialog ( checkbox "create robots.txt" - and maybe a Settings Panel (follow, nofollow, no directories...)
    Regards
    Andreas

    Here's one example of the text Google is posting:
    http:/ / webcache. googleusercontent. com/ search? rlz= 1T4GGLR_enUS261US323&hl= en&q= cache:SSb_hvtcb_EJ:http:/ / www. inmamaskitchen. com/ RECIPES/ RECIPES/ poultry/ chicken_cuban. html+cuban+chicken+with+okra&ct= clnk            Robots.txt File            May 31, 2011
                http:/ / webcache. googleusercontent. com/ search? q= cache:yJThMXEy-ZIJ:www. inmamaskitchen. com/ Nutrition/            Robots.txt File            May 31, 2011
    Then there are things relating to Facebook????
    http:/ / www. facebook. com/ plugins/ like. php? channel_url= http%3A%2F%2Fwww. inmamaskitchen. com%2FNutrition%2FBlueberries. html%3Ffb_xd_fragment%23%3F%3D%26cb%3Df2bfa6d78d5ebc8%26relation%3Dparent. parent%26transport%3Dfragment&href= http%3A%2F%2Fwww. facebook. com%2Fritzcrackers%3Fsk%3Dapp_205395202823189&layout= standard&locale= en_US&node_type= 1&sdk= joey&send= false&show_faces= false&width= 225
    THNAK YOU!

  • Robots.txt

    Hi,
    Has anyone created a robots.txt file for an external plumtree portal??
    The company I work for is currently using PT 4.5 SP2 and I'm just wondering what directories I should dissallow that will prevent spiders etc from crawling certain parts of the web site. This will help impove search results on search engines.
    See http://support.microsoft.com/default.aspx?scid=kb;en-us;217103

    The robots.txt file live at the root level of the server where your web pages are. What is the URL of your website?

  • Problems with robots.txt Disallow

    Hi
    I have a problem with the robots.txt and google.
    I have this robots.txt file:
    User-agent: *
    Disallow: page1.html
    Disallow: dir_1/sub_dir_1/
    Disallow: /data/
    When I enter 'site:www.MySite.com' into Google search box,
    Goolge gets the content from the 'data' directory as well. Google
    should not have indexed the content of data directory.
    So why is google getting the results from 'data' directory,
    whereas I have disallowed it.
    How can I restrict everyone from accessing the data
    directory?
    Thanks

    I found workaround. To have sitemap URL linked pub page, pub page needs to be in the Internet zone. If you need to have sitemap URL linked to the real internet address (e.g. www.company.example.com) you need to put auth page in the default zone, pub
    page in the intranet zone and create AAM http://company.example.com in the internet zone.

  • I need autocomplete  for search for words in a txt. file

    i am not so good in java.
    I have a running code for search in text with a txt. file (from user bluefox815).
    But I need a solution with autocomplete for search for words in a txt. file.
    test_file.txt (Teil des Inhaltes):
    Roboter robots
    Mechatronik mechatronics
    and so on
    Can you help me please.
    Here is the code:
    import javax.swing.*;
    import java.awt.*;
    import java.awt.event.*;
    import java.io.*;
    * this program searches for a string in a text file and
    * says which line it found the string on
    public class SearchText implements ActionListener {
    private String filename = "test_file.txt";
    private JFrame frame;
    private JTextField searchField;
    private JButton searchButton;
    private JLabel lineLabel;
    private String searchFor;
    private BufferedReader in;
    public SearchText() {
    frame = new JFrame("SearchText");
    frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    searchField = new JTextField(80);
    searchButton = new JButton("Search");
    // this is used later in our actionPerformed method
    searchButton.setActionCommand("search");
    // this sets the action listener for searchButton, which is the current class
    // because this class implements ActionListener
    searchButton.addActionListener(this);
    lineLabel = new JLabel("nach dem Fachbegriff suchen");
    public void createGUI() {
    JPanel topPanel = new JPanel();
    topPanel.setLayout(new BoxLayout(topPanel, BoxLayout.X_AXIS));
    JPanel bottomPanel = new JPanel();
    JPanel mainPanel = new JPanel();
    mainPanel.setLayout(new BoxLayout(mainPanel, BoxLayout.Y_AXIS));
    topPanel.add(searchField);
    topPanel.add(searchButton);
    bottomPanel.add(lineLabel);
    mainPanel.add(topPanel);
    mainPanel.add(bottomPanel);
    frame.getContentPane().add(mainPanel);
    frame.pack();
    frame.setVisible(true);
    public void actionPerformed(ActionEvent e) {
    // now we get the action command and if it is search, then it is the button
    if ("search".equals(e.getActionCommand())) {
    searchFor = searchField.getText();
    searchTheText();
    private void searchTheText() {
    // I initialize the buffered reader here so that every time the user searches
    // then the reader will start at the beginning, instead of where it left off last time
    try {
    in = new BufferedReader(new FileReader(new File(filename)));
    } catch (IOException e) {
    String lineContent = null;
    int currentLine = 0;
    // this will be set to true if the string was found
    boolean foundString = false;
    while (true) {
    currentLine++;
    // get a line of text from the file
    try {
    lineContent = in.readLine();
    } catch (IOException e) {
    break;
    // checks to see if the file ended (in.readLine() returns null if the end is reached)
    if (lineContent == null) {
    break;
    if (lineContent.indexOf(searchFor) == -1) {
    continue;
    } else {
    lineLabel.setText(String.valueOf(lineContent));
    foundString = true;
    break;
    if (!foundString)
    lineLabel.setText("Es kann kein Fachbegriff gefunden werden.");
    try {
    in.close();
    } catch (IOException ioe) {
    public static void main(String[] args) {
    SwingUtilities.invokeLater(new Runnable() {
    public void run() {
    new SearchText().createGUI();
    }

    Markus1 wrote:
    But I need a solution with autocomplete for search for words in a txt. file.What is your question? What have you tried so far? What are you having difficulty with?
    Mel

  • How to check on-thy-fly dynamic txt-files

    hello.
    i want to check out a dynamic txt-file (dynamic, because it is a running log from a chat) for specific words.
    the program should start to check the incoming content at the actual end of the file(previous content is in this case uninteresting) and shouldn't stop to do that(unless i want it).
    my problem is, that the program stops after it reachs the end of the file.
    any idea?
    many thx.
    try
              String zeile;
    //Wir lesen aus "eingabe.txt".
              File eingabeDatei = new File(new URI("file:///C:/Log.txt"));
    FileReader eingabeStrom = new FileReader(eingabeDatei);
    BufferedReader eingabe = new BufferedReader(eingabeStrom);
    while ((zeile = eingabe.readLine()) != null)
         Pattern p = Pattern.compile("TESTWORD");
    // Muster: Ziffer, mind. ein Buchstabe, Ziffer
    Matcher m = p.matcher(zeile);
    if (m.find())
         Robot rob = new Robot();
         rob.keyPress( '1' );
    System.out.println (m.group());
         System.out.println("Pattern at Pos. " + m.start());
    System.out.println("Pattern is: " + m.group());
    else
    System.out.println("***No pattern found***");
    //System.out.println(zeile);
    } catch (IOException e)
    e.printStackTrace();
    }

    hello.
    i want to check out a dynamic txt-file (dynamic,
    because it is a running log from a chat) for specific
    words.
    the program should start to check the incoming
    content at the actual end of the file(previous
    content is in this case uninteresting) and shouldn't
    stop to do that(unless i want it).
    my problem is, that the program stops after it reachs
    the end of the file.
    any idea?
    many thx.
    Please use the code tag next time you post code. Thanks.
    (see more comments below)
    try {
         String zeile;
         //Wir lesen aus "eingabe.txt".
         File eingabeDatei = new File(new URI("file:///C:/Log.txt"));
         FileReader eingabeStrom = new FileReader(eingabeDatei);
         BufferedReader eingabe = new BufferedReader(eingabeStrom);
         while ((zeile = eingabe.readLine()) != null)
              Pattern p = Pattern.compile("TESTWORD");
              // Muster: Ziffer, mind. ein Buchstabe, Ziffer
              Matcher m = p.matcher(zeile);
              if (m.find())
                   Robot rob = new Robot();
                   rob.keyPress( '1' );
                   System.out.println (m.group());
                   System.out.println("Pattern at Pos. " + m.start());
                   System.out.println("Pattern is: " + m.group());
              else
                   System.out.println("***No pattern found***");
              //System.out.println(zeile);
    catch (IOException e) {
         e.printStackTrace();
    }The line:
    while ((zeile = eingabe.readLine()) != null)is why your app quits. The while loop ends on null which happens when there is nothing to read from the reader
    From the API:
    Returns:
    A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached.

  • Error 404 - /_vti_bin/owssvr.dll  and robots.txt

    Hi
    My webstats tell me that I have had various Error 404s and
    this is because of files being "required but not found":
    specifically /_vti_bin/owssvr.dll and robots.txt.
    Can someone tell me what these are?
    Also, there are various other status code pages coming up
    such as
    302 Moved temporarily (redirect) 6 27.2 % 2.79 KB
    401 Unauthorized 5 22.7 % 9.32 KB
    403 Forbidden 3 13.6 % 5.06 KB
    206 Partial Content
    Why are these arising and how can I rid myself of them?
    Many thanks : )

    Example of httpmodule that uses the PreRequestHandlerExecute and how to return if it encounters the owssvr.dll
     class MyHttpModule : IHttpModule, IRequiresSessionState
       public void Init(HttpApplication context)
                context.PreRequestHandlerExecute += new EventHandler(context_PreRequestHandlerExecute);
      void context_PreRequestHandlerExecute(object sender, EventArgs e)
      if (app.Context.Request.Url.AbsolutePath.ToLower().Contains("owssvr.dll"))
                        return;

  • [solved]Wget: ignore "disallow wget" +comply to the rest of robots.txt

    Hello!
    I need to wget a few (maybe 20 -.- ) html files that are linked on one html page (same domain) recursively, but the robots.txt there disallows wget. Now I could just ignore the robots.txt... but then my wget would also ignore the info on forbidden links to dynamic sites which are forbidden in the very same robots.txt for good reasons. And I don't want my wget pressing random buttons on that site. Which is what the robots.txt is for. But I can't use the robots.txt with wget.
    Any hints on how to do this (with wget)?
    Last edited by whoops (2014-02-23 17:52:31)

    HalosGhost wrote:Have you tried using it? Or, is there a specific reason you must use wget?
    Only stubborness
    Stupid website -.- what do they even think they achieve by disallowing wget? I should just use the ignore option and let wget "click" on every single button in their php interface. But nooo, instead I waste time trying to figure out a way to exclude those GUI links from being followed even though wget would be perfectly set up to comply to that automatically if it weren't for that one entry to "ban" it. *grml*
    Will definitely try curl next time though - thanks for the suggestion!
    And now, I present...
    THE ULTIMATIVE SOLUTION**:
    sudo sed -i 's/wget/wgot/' /usr/bin/wget
    YAY.
    ./solved!
    ** stubborn version.
    Last edited by whoops (2014-02-23 17:51:19)

  • Is ROBOT.TXT supported

    The Robots Exclusion Protocol uses the robot.txt configuration file to give
    instructions to Web crawlers (robots) about how to index your pages.
    The functionality is available through a specific META tags in HTML
    documents.
    Is ROBOT.TXT supported with WLS ?
    Bernard DEVILLE

    According to [https://en.wikipedia.org/wiki/Snapdragon_%28system_on_chip%29 Wikipedia it is a Snapdragon 2]. Which would make it a compatible device.
    If you search the Play Store for Firefox do you see a Firefox app to install?

Maybe you are looking for

  • Hissing sound with wi-fi and power adapter connected

    This is quite annoying - when the power adapter is connected and I have a w-fi connection to the internet I get intermittent 'hissing' sounds - however when I remove the lead and run off the battery this no longer happens. Does this happen to anyone

  • Outbound Picking Confirmation Idoc - which one?

    Hi all, I havea client who desires to send a picking confirmation notice to their customer once a order is picked.  In essence, it seems similar to an Delivery notification, for which we are using idoc DELVRY01/SHPCON.  Is this Idoc suitable for a pi

  • GETWA_NOT_ASSIGNED run time error_MIGO processing stock transport order

    Hi, I have an issue  in which I would like to ask for your comments / help. In MIGO, I type in the number of the stock transfer order and want to post a GI from the transit.Right efter I pressed the enter button, the run time error occurs.           

  • EJB Module deployed but encounters java.lang.ClassNotFoundException

    Hi. I have successfully deployed my ejb in Weblogic 8.1 but I get still get the java.lang.ClassNotFoundException when I invoke the bean. The application will work only if i include the jar in the classpath in the startweblogic.cmd. However, this opti

  • Weird: receiving and sending old emails

    I am both receiving and sending old emails - from about a month ago. Emails that I've previously received are turning up again in my inbox. They are flagged as newly received, but the send date is from about 6 weeks ago. No message appears more than