Robots.txt -- how do I do this?

I'm not using iWeb, unfortunately, but I wanted to protect part of a site I've set up. How do I set up a hidden directory under my domain name? I need it to be invisible except to people who have been notified of its existence. I was told, "In order to make it invisible you would need to not have any links associated with it on your site, make sure you have altered a robots.txt file in your /var/www/html directory so bots cannot spider it. A way to avoid spiders crawling certain directories is to place a robots.txt file in your web root directory that has parameters on which files or folders you do not want indexed."
But, how do I get/find/alter this robots.txt file? I unfortunately don't know how to do this sort (hardly any sort) of programming. Thank you so much.

Muse does not generate a robots.txt file.
If your site has one, it's been generated by your hosting provider, or some other admin on your website. If you'd like google or other 'robots' to crawl your site, you'll need to edit this file or delete it.
Also note that you can set your page description in Muse using the page properties dialog, but it won't show up immediately in google search results - you have to wait until google crawls your site to update their index, which might take several days. You can request google to crawl it sooner though:
https://support.google.com/webmasters/answer/1352276?hl=en

Similar Messages

We have 2 iphones which recently, perhaps due to a software upgrade, started receiving each others txts. How do we stop this?

We have 2 iphones with different phone numbers which recently started receiving each others txts. How do we stop this?

Hello sealecliff,
This is typically caused by both devices sharing an Apple ID for iMessage and FaceTime. I would recommend signing out of iMessage on both devices. Go to Settings > Messages > Send & Receive, tap your Apple ID, and select Sign Out. Then sign in with the desired Apple ID.
iOS: Troubleshooting FaceTime and iMessage activation
http://support.apple.com/kb/TS4268
Cheers,
Allen

Does anyone know why my iphone keeps switching all of my alert tones and ringtones assigned to contacts? i restored it and it still keeps doing it and now when ever i get a txt or a phone call the tones just sound terrible. how do i fix this?

does anyone know why my iphone keeps switching all of my alert tones and ringtones assigned to contacts? i restored it and it still keeps doing it and now when ever i get a txt or a phone call the tones just sound terrible. how do i fix this?

Your only chance is setting it up as new device without using the latest backup afterwards, which you already did.
If this does not work, you should get it serviced:
Apple - Support - Service Answer Center
How to back up your data and set up as a new device

After the latest update my phone shows I have a new TXT Message and there are NO TXT messages. How do I fix this?

After the latest update my phone shows I have a new TXT Message and there are NO TXT messages. How do I fix this? I have tried clearing the cache, I have tried the factory reset, I have no 3rd party messaging apps and have done noting different with my phone that I have been doing the last 6 months.

Hi Johnnybdamned!
I regret to learn of any issues after the software update. Let's check some things out on your phone! Does the text message say who it came from? Is it a contact you message on a regular basis? Are you using the messaging application that came with the device, or a 3rd party application?
I recommend deleting any unneeded messaging threads and then powering the device off/on. Keep me posted!
ChristinaB_VZW
VZW Support
Follow us on Twitter @VZWSupport

I'm having problems with 7.1 update my flash "flashes" now when I receive txt's and notifications! And I'm also having problems with freezing and wifi problems! How do I solve this?

I'm having problems with 7.1 update my flash "flashes" now when I receive txt's and notifications! And I'm also having problems with freezing and wifi problems! How do I solve this?

Doh! Rectified flash!
But when face timing 2 seconds after it connects wifi disconnects? Any thoughts

I have a new iPhone 6 and I am not getting iMessages, only txt messages. How do I change this to receive iMessages, too? My friends are sending me iMessages that I am not receiving.

I have a new iPhone 6 and I am not getting iMessages, only txt messages. How do I change this to receive iMessages, too? My friends are sending me iMessages that I am not receiving.

If you didn't have imessage on and now do, it sometimes takes a while to get synced up.

Robot.txt and duplicat conent i need help

Hello Guys i´m new in BC i have 2
Questions. 1
My startpage is available as xxxx.de
and xxxx.de/index.html
and xxx.de/index.aspx
how can i change this "Duplicate Content!!!!
and the 2 Questions where i have to load the robot.txt.
THX

As long as you do not link to other versions and be inconsistent you do not need to worry about your start page.

Error 404 - /_vti_bin/owssvr.dll and robots.txt

Hi
My webstats tell me that I have had various Error 404s and
this is because of files being "required but not found":
specifically /_vti_bin/owssvr.dll and robots.txt.
Can someone tell me what these are?
Also, there are various other status code pages coming up
such as
302 Moved temporarily (redirect) 6 27.2 % 2.79 KB
401 Unauthorized 5 22.7 % 9.32 KB
403 Forbidden 3 13.6 % 5.06 KB
206 Partial Content
Why are these arising and how can I rid myself of them?
Many thanks : )

Example of httpmodule that uses the PreRequestHandlerExecute and how to return if it encounters the owssvr.dll
class MyHttpModule : IHttpModule, IRequiresSessionState
   public void Init(HttpApplication context)
            context.PreRequestHandlerExecute += new EventHandler(context_PreRequestHandlerExecute);
void context_PreRequestHandlerExecute(object sender, EventArgs e)
if (app.Context.Request.Url.AbsolutePath.ToLower().Contains("owssvr.dll"))
                    return;

Question about robots.txt

This isn't something I've usually bothered with, as I always thought you didn't really need one unless you wanted to disallow access to pages / folders on a site.
However, a client has been reading up on SEO and mentioned that some analytics thing (possibly Google) was reporting that "one came back that the robot.txt file was invalid or missing. I understand this can stop the search engines linking in to the site".
So I had a rummage, and uploaded what I thought was a standard enough robots.txt file :
# robots.txt
User-agent: *
Disallow:
Disallow: /cgi-bin/
But apparently this is reporting :
The following block of code contains some errors.You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted. Please, remove all the reported errors and check again this robots.txt file.
Line 1
# robots.txt
Line 2
User-agent: *
Line 3
Disallow:
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
Line 4
Disallow: /cgi-bin/
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
If anyone could set me straight on how a standard / default robots.txt file should look like, that would be much appreciated.
Thanks.

Remove the blank disallow line so it looks like this:
User-agent: *
Disallow: /cgi-bin/
E. Michael Brandt
www.divahtml.com
www.divahtml.com/products/scripts_dreamweaver_extensions.php
Standards-compliant scripts and Dreamweaver Extensions
www.valleywebdesigns.com/vwd_Vdw.asp
JustSo PictureWindow
JustSo PhotoAlbum, et alia

[solved]Wget: ignore "disallow wget" +comply to the rest of robots.txt

Hello!
I need to wget a few (maybe 20 -.- ) html files that are linked on one html page (same domain) recursively, but the robots.txt there disallows wget. Now I could just ignore the robots.txt... but then my wget would also ignore the info on forbidden links to dynamic sites which are forbidden in the very same robots.txt for good reasons. And I don't want my wget pressing random buttons on that site. Which is what the robots.txt is for. But I can't use the robots.txt with wget.
Any hints on how to do this (with wget)?
Last edited by whoops (2014-02-23 17:52:31)

HalosGhost wrote:Have you tried using it? Or, is there a specific reason you must use wget?
Only stubborness
Stupid website -.- what do they even think they achieve by disallowing wget? I should just use the ignore option and let wget "click" on every single button in their php interface. But nooo, instead I waste time trying to figure out a way to exclude those GUI links from being followed even though wget would be perfectly set up to comply to that automatically if it weren't for that one entry to "ban" it. *grml*
Will definitely try curl next time though - thanks for the suggestion!
And now, I present...
THE ULTIMATIVE SOLUTION**:
sudo sed -i 's/wget/wgot/' /usr/bin/wget
YAY.
./solved!
** stubborn version.
Last edited by whoops (2014-02-23 17:51:19)

Web Repository Manager and robots.txt

Hello,
I would like to search an intranet site and therefore set up a crawler according to the guide "How to set up a Web Repository and Crawl It for Indexing".
Everything works fine.
Now this web site uses a robots.txt as follows:
<i>User-agent: googlebot
Disallow: /folder_a/folder_b/
User-agent: *
Disallow: /</i>
So obviously, only google is allowed to crawl (parts of) that web site.
My question: If I'd like to add the TRex crawler to the robots.txt what's the name of the "User-agent" I have to specify here?
Maybe the name I defined in the SystemConfiguration > ... > Global Services > Crawler Parameters > Index Management Crawler?
Thanks in advance,
Stefan

Hi Stefan,
I'm sorry but this is hard coded. I found it in the class : com.sapportals.wcm.repository.manager.web.cache.WebCache
private HttpRequest createRequest(IResourceContext context, IUriReference ref)
        HttpRequest request = new HttpRequest(ref);
        String userAgent = "SAP-KM/WebRepository 1.2";
        if(sessionWatcher != null)
            String ua = sessionWatcher.getUserAgent();
            if(ua != null)
                userAgent = ua;
        request.setHeader("User-Agent", userAgent);
        Locale locale = context.getLocale();
        if(locale != null)
            request.setHeader("Accept-Language", locale.getLanguage());
        return request;
So recompile the component or changing the filter... I would prefer to change the roberts.txt
hope this helps,
Axel

Disallow URLs by robots.txt but still Appear In Google Search Results.

disallow URLs by robots.txt but still Appear In Google Search Results.

Can you expand on your problem? Are you being indexed despite not wanting to be indexed?
You are almost certainly in the wrong forum as this relates to SharePoint search, not how Google indexes your content.

Placement of robots.txt file

Hi all,
I want to disallow search robots from indexing certain directories on a MacOS X Server.
Where would I put the robots.txt file?
According to the web robots pages at http://www.robotstxt.org/wc/exclusion-admin.html it needs to go in the "top-level of your URL space", which depends on the server and software configuration.
Quote: "So, you need to provide the "/robots.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration."
Quote: "For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be /usr/local/etc/httpd/htdocs/robots.txt".
On a MacOS X Server would the robots.txt go into the "Library" or "WebServer" directory or somewhere else?
Thanxx
monica
G5 Mac OS X (10.4.8)

The default document root for Apache is /Library/WebServer/Documents so your robots.txt file should be at /Library/WebServer/Documents/robots.txt

Problems with robots.txt Disallow

Hi
I have a problem with the robots.txt and google.
I have this robots.txt file:
User-agent: *
Disallow: page1.html
Disallow: dir_1/sub_dir_1/
Disallow: /data/
When I enter 'site:www.MySite.com' into Google search box,
Goolge gets the content from the 'data' directory as well. Google
should not have indexed the content of data directory.
So why is google getting the results from 'data' directory,
whereas I have disallowed it.
How can I restrict everyone from accessing the data
directory?
Thanks

I found workaround. To have sitemap URL linked pub page, pub page needs to be in the Internet zone. If you need to have sitemap URL linked to the real internet address (e.g. www.company.example.com) you need to put auth page in the default zone, pub
page in the intranet zone and create AAM http://company.example.com in the internet zone.

I just installed the yahoo messenger. Every time I do that my extensions go missing. How do you fix this instead of having to keep re installing firefox?

I just installed the yahoo messenger. Every time I do that my extensions go missing. How do you fix this instead of having to keep re installing firefox?

hello, this sounds like the files in your [[Profiles|profile folder]] that store the information about your extensions might have become corrupted. please go to ''firefox > help > troubleshooting information > profile - show folder...'' then windows explorer should open up your profile folder. in there look for the files named extensions.ini, extensions.sqlite, extensions.sqlite-journal (and in some cases also extensions.txt, extensions.rdf). delete or rename those files and quit firefox, they will be regenerated the next time you launch firefox.
also see [https://support.mozilla.org/en-US/kb/Unable%20to%20install%20add-ons#w_corrupt-extension-files Corrupt extension files]

Robots.txt -- how do I do this?

Similar Messages

Maybe you are looking for