Placement of robots.txt file

Hi all,
I want to disallow search robots from indexing certain directories on a MacOS X Server.
Where would I put the robots.txt file?
According to the web robots pages at http://www.robotstxt.org/wc/exclusion-admin.html it needs to go in the "top-level of your URL space", which depends on the server and software configuration.
Quote: "So, you need to provide the "/robots.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration."
Quote: "For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be /usr/local/etc/httpd/htdocs/robots.txt".
On a MacOS X Server would the robots.txt go into the "Library" or "WebServer" directory or somewhere else?
Thanxx
monica
G5 Mac OS X (10.4.8)

The default document root for Apache is /Library/WebServer/Documents so your robots.txt file should be at /Library/WebServer/Documents/robots.txt

Similar Messages

Robots.txt file

How do I create a robots.txt file for my Muse site?

You can follow the guidelines from Google to create a robots.txt file and place it at the root of your remote site.
https://support.google.com/webmasters/answer/156449?hl=en
Thanks,
Vinayak

Robots.txt -- how do I do this?

I'm not using iWeb, unfortunately, but I wanted to protect part of a site I've set up. How do I set up a hidden directory under my domain name? I need it to be invisible except to people who have been notified of its existence. I was told, "In order to make it invisible you would need to not have any links associated with it on your site, make sure you have altered a robots.txt file in your /var/www/html directory so bots cannot spider it. A way to avoid spiders crawling certain directories is to place a robots.txt file in your web root directory that has parameters on which files or folders you do not want indexed."
But, how do I get/find/alter this robots.txt file? I unfortunately don't know how to do this sort (hardly any sort) of programming. Thank you so much.

Muse does not generate a robots.txt file.
If your site has one, it's been generated by your hosting provider, or some other admin on your website. If you'd like google or other 'robots' to crawl your site, you'll need to edit this file or delete it.
Also note that you can set your page description in Muse using the page properties dialog, but it won't show up immediately in google search results - you have to wait until google crawls your site to update their index, which might take several days. You can request google to crawl it sooner though:
https://support.google.com/webmasters/answer/1352276?hl=en

Use of robots.txt to disallow system/secure domain names?

I've got a client who's system and secure domains are ranking very high on google. My SEO advisor has mentioned that a key way to eliminate these URLs from google is through the use of disallowing content through robots.txt. Given BC's unique nature of dealing with system and secure domains I'm not too sure if this is even possible as any disallowances I've seen or used before have been directories and not absolute URL's, nor have I seen any mention of this possibility around. Any help or advice would be great!

Hi Mike
Under Site Manager > Pages, when accessing a specific page, you can open the SEO Metadata section and tick “Hide this page for search engines”
Aside from this, using the robots.txt file is indeed an efficient way of instructing search engine robots which pages are not to be indexed.

Robots.txt and Host Named Site Collections (SEO)

When attempting to exclude ALL SharePoint Sites from external indexing, when you have multiple web apps and multiple Host Named Site Collections, should I add the robots.txt file to the root of each web app, as well as each hnsc? I assume so, but, thought
I would check with the gurus...
- Rick

I think, one for each site collection as each site collection has different name and treated as web site.
"he location of robots.txt is very important  It must be in the main directory because otherwise user agents (search engines) will not be able to find it.  Search engines look first in the main directory (i.e.http://www.sitename.com/robots.txt)
and if they don’t find it there, they simply assume that this site does not have a robots.txt file"
http://www.slideshare.net/ahmedmadany/block-searchenginesfromindexingyourshare-pointsite
Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem. ****************************************************************************************** Thanks -WS MCITP(SharePoint 2010, 2013) Blog: http://wscheema.com/blog

Question about robots.txt

This isn't something I've usually bothered with, as I always thought you didn't really need one unless you wanted to disallow access to pages / folders on a site.
However, a client has been reading up on SEO and mentioned that some analytics thing (possibly Google) was reporting that "one came back that the robot.txt file was invalid or missing. I understand this can stop the search engines linking in to the site".
So I had a rummage, and uploaded what I thought was a standard enough robots.txt file :
# robots.txt
User-agent: *
Disallow:
Disallow: /cgi-bin/
But apparently this is reporting :
The following block of code contains some errors.You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted. Please, remove all the reported errors and check again this robots.txt file.
Line 1
# robots.txt
Line 2
User-agent: *
Line 3
Disallow:
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
Line 4
Disallow: /cgi-bin/
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
If anyone could set me straight on how a standard / default robots.txt file should look like, that would be much appreciated.
Thanks.

Remove the blank disallow line so it looks like this:
User-agent: *
Disallow: /cgi-bin/
E. Michael Brandt
www.divahtml.com
www.divahtml.com/products/scripts_dreamweaver_extensions.php
Standards-compliant scripts and Dreamweaver Extensions
www.valleywebdesigns.com/vwd_Vdw.asp
JustSo PictureWindow
JustSo PhotoAlbum, et alia

Robots.txt question?

I am kind of new to web hosting, but learning.
I am hosting with just host, I have a couple of sites (addons). I am trying to publish my main site now and there is a whole bunch of stuff in site root folder that I have no idea what it is. I don't want to delete anything and I am probably not going too lol. But should I block a lot of the stuff in there in my Robots.txt file?
Here is some of the stuff in there:
.htaccess
404.shtml
cgi-bin
css
img
index.php
justhost.swf
sifr-addons.js
sIFR-print.cs
sIFR-screen.css
sifr.js
should I just disallow all of this stuff in my robots.txt? or any recommendations would be appreciated? Thanks

Seaside333 wrote:
public_html for the main site, the other addons are public_html/othersitesname.com
is this good?
thanks for quick response
Probably don't need the following files unless youre using text image-replacement techniques - sifr-addons.js, sIFR-print.cs, sIFr-screen.css, sifr,js
Good to keep .htaccess - (can insert special instrcutions in this file) - 404.shtml (if a page can't be found on your remote server it goes to this page) - cgi-bin (some processing scripts are placed in this folder)
Probably you will have your own 'css' folder. 'img' folder not needed. 'index.php' is the homepage of the site and what the browser looks for initially, you can replace it with your own homepage.
You don't need justhst.swf.
Download the files/folders to you local machine and keep them in case you need them.

No robots.txt?

Hello,
just a short question: Why does Muse not create a robots.txt?
A couple of months ago a had a client who didn´t showed up on any search results but the site was online for more than a year.
We investigated and found out that the client had no robots.txt on his server. Google mentions ( sorry i cannot find the source right now) that it will not index a page if there is no robots file.
I think that it is important to know this. It would be cool if there is a feature in the export dialog ( checkbox "create robots.txt" - and maybe a Settings Panel (follow, nofollow, no directories...)
Regards
Andreas

Here's one example of the text Google is posting:
http:/ / webcache. googleusercontent. com/ search? rlz= 1T4GGLR_enUS261US323&hl= en&q= cache:SSb_hvtcb_EJ:http:/ / www. inmamaskitchen. com/ RECIPES/ RECIPES/ poultry/ chicken_cuban. html+cuban+chicken+with+okra&ct= clnk Robots.txt File May 31, 2011
http:/ / webcache. googleusercontent. com/ search? q= cache:yJThMXEy-ZIJ:www. inmamaskitchen. com/ Nutrition/ Robots.txt File May 31, 2011
Then there are things relating to Facebook????
http:/ / www. facebook. com/ plugins/ like. php? channel_url= http%3A%2F%2Fwww. inmamaskitchen. com%2FNutrition%2FBlueberries. html%3Ffb_xd_fragment%23%3F%3D%26cb%3Df2bfa6d78d5ebc8%26relation%3Dparent. parent%26transport%3Dfragment&href= http%3A%2F%2Fwww. facebook. com%2Fritzcrackers%3Fsk%3Dapp_205395202823189&layout= standard&locale= en_US&node_type= 1&sdk= joey&send= false&show_faces= false&width= 225
THNAK YOU!

Robots.txt

Hi,
Has anyone created a robots.txt file for an external plumtree portal??
The company I work for is currently using PT 4.5 SP2 and I'm just wondering what directories I should dissallow that will prevent spiders etc from crawling certain parts of the web site. This will help impove search results on search engines.
See http://support.microsoft.com/default.aspx?scid=kb;en-us;217103

The robots.txt file live at the root level of the server where your web pages are. What is the URL of your website?

Problems with robots.txt Disallow

Hi
I have a problem with the robots.txt and google.
I have this robots.txt file:
User-agent: *
Disallow: page1.html
Disallow: dir_1/sub_dir_1/
Disallow: /data/
When I enter 'site:www.MySite.com' into Google search box,
Goolge gets the content from the 'data' directory as well. Google
should not have indexed the content of data directory.
So why is google getting the results from 'data' directory,
whereas I have disallowed it.
How can I restrict everyone from accessing the data
directory?
Thanks

I found workaround. To have sitemap URL linked pub page, pub page needs to be in the Internet zone. If you need to have sitemap URL linked to the real internet address (e.g. www.company.example.com) you need to put auth page in the default zone, pub
page in the intranet zone and create AAM http://company.example.com in the internet zone.

Accessing txt file in executable vi

Hi,
I have built a user interface for a project im working on. The data entered by the user on the interface is saved in a text file upon closing of the program. This is currently working fine but when I create an executable of my code, data is not written to the txt file upon closing. Is there some special way to build these types of VIs,
The code for writing to the file is in a sub VI and i use the "Write to spreadsheet file" function, i pass in the reference of "C:\Documents and Settings\asha264\Desktop\Single Line\Single Line (9.0)\Data.txt".
Thanking You,
Adnan Sharief
Solved!
Go to Solution.

Make sure that your means for signalling the other vi's to exit doesn't have an inadvertent race condition. Remember that LabVIEW is data flow language, that to make sure that certain events occur in a certain order that there is a signal dependecy to force the order of execution. I got bit by that recently when we used a functional global to signal all of the various programmatic loops in the various vi's when it was time to shut down. There was at least one place where, under certain conditions, the loop stop terminal was getting the signal before the "stuff" in the loop had executed. Putting an error in/out on the FG, then puting it in the error line after the functions (actually file writing) guaranteed that the file writes would occur before the "exit" condition was evaluated.
Putnam
Certified LabVIEW Developer
Senior Test Engineer
Currently using LV 6.1-LabVIEW 2012, RT8.5
LabVIEW Champion

When I try to run a FindChangeByList.jsx in InDesignCS4, it will not let me select a txt file

Hi all,
I have a 100 page document that uses two javascript FindChange queries to cleanup an imported XML file. I've been using the same cleanup files for years and they have worked just fine. (I still only open it in CS4 because I've had some issues when trying to use this XML/Find-Change combo in CS5 and 6.) The only difference is my computer has been upgraded, the apps reinstalled, and I am now running 10.9.5
I can import the XML just fine into my template, but when I double click on FindChangeByList.jsx, select Document, hit OK, nothing happens. I do not get a second dialog box asking me to choose my txt cleanup file.
I only run this job once a year, so it's possible I am doing something wrong. I've place copies of my txt files in as many scripts folders as I can find on my computer, but they are greyed out in the FIndChange Support folder within InDesign.
Please help, this is the only way to fix this XML!

Hi,
2 things:
1. Assuming your script is original - it is not asking for a TXT file till it is found in expected location. You can choose between 2 solutions:
to remove FindChangeList.txt from FindChangeSupport folder ==> script will ask for another file
to override this file by your query ==> script will not ask but execute your query
2. You can see greyed TXT files in Script Panel since this panel shows executable files (script's formats)
Jarek

How can i read only .txt file and skip other files in Mail Sender Adapter ?

Hi Friends ,
 I am working on scenario like , I have to read an mail attachement and send the data to R3.
 It is working fine if only the .txt file comes.
 Some times ,html files also coming along with that .txt files. That time my Mail adapter fails to read the .txt file.
 I am using PayLoadSwap Bean and MessageTransformBean to swap and send the attachment as payload .
 Michal as told to write the Adapter module to skip the files .But i am not ware of the adapter moduel . If any blogs is there for this kind of scenarios please give me the link.
 Otherwise , please tell me how to write adapter module for Mail Sender Adapter?
 How to download the following
 newest patch of XI ADAPTER FRAMEWORK CORE 3.0
from SAP Service Marketplace. Open the file with WinZip and extract the following
SDAs:
aii_af_lib.sda, aii_af_svc.sda
aii_af_cpa_svc.sda
 I have searche in servive market place .But i couldn't find that . Can you please provide me the link to download the above .
 If any other suggestions other than this please let me know.
Regards.,
V.Rangarajan

=P
Dude, netiquette. Messages like "i need this now! Do it!" are really offensive and no one here is being payed to answer anyone's questions. We're here because we like to contribute to the community.
Anyway, in your case, just perform some search on how you could filter the files that are attached to the message. The sample module is just an example, you'll have to implement your own. Tips would be to query the filename of the attachments (or maybe content type) and for the ones which are not text, remove them.
Regards,
Henrique.

Loading *.txt files in web service

I have a running web service on Axis, and can communicate with it from a j2se client and a browser test client.
I'm attempting to load a txt file of data in the web service (stores a list of usernames). But the code can't locate the text file. Where are external resource placed if they're needed in a web service? I was assuming I could just place it next to my *.java files.
Thanks

I should add that I'm using Eclipse , and as such my web service seems to be self contained in my workspace for Eclipse. The structure is
EclipseWorkspace -> myProject -> projectNamespace -> *.java
I thought I could place my *.txt files inside the namespace directory with my Java files. I have also tried placing the *.txt file in the project directory, still no luck.

Read txt files in DoJa?

I don't know if this is the right place to post this, but I haven't found any DoJa topic neither have I found good DoJa forums. So I thought I post it here.
My problem is: I'm trying to read and write .txt files in DoJa. I've wrote this script in JCreator (with other imports ofcourse), and it worked perfectly. But as I already expected it won't work in javaaplitool (2.5):
import com.nttdocomo.io.*;
import com.nttdocomo.ui.*;
class ReadFile
void ReadMyFile()
DataInputStream data_input_stream = null;
String input = null;
try
File file = new File("testfile.txt");
FileInputStream input_stream = new FileInputStream(file);
BufferedInputStream buffered_input_stream = new BufferedInputStream(input_stream);
data_input_stream = new DataInputStream(buffered_input_stream);
while ((input=data_input_stream.readLine()) != null)
System.out.println(input);
I've tried other things but it won't do and I'm not an expert in this :-(. Anyone know how to do it or have a small sample script or anything that can help?
Any help is greatly appreciated!

You're not doing it correctly. Use getClass().getResourceAsStream("yourtext.txt") to read data. Note that you cant just read files on phones, Java runs in a sandbox and only has access to files includes in tje Midlet JAR.

Placement of robots.txt file

Similar Messages

Maybe you are looking for