Question about robots.txt

This isn't something I've usually bothered with, as I always thought you didn't really need one unless you wanted to disallow access to pages / folders on a site.
However, a client has been reading up on SEO and mentioned that some analytics thing (possibly Google) was reporting that "one came back that the robot.txt file was invalid or missing. I understand this can stop the search engines linking in to the site".
So I had a rummage, and uploaded what I thought was a standard enough robots.txt file :
# robots.txt
User-agent: *
Disallow:
Disallow: /cgi-bin/
But apparently this is reporting :
The following block of code contains some errors.You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted. Please, remove all the reported errors and check again this robots.txt file.
Line 1
# robots.txt
Line 2
User-agent: *
Line 3
Disallow:
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
Line 4
Disallow: /cgi-bin/
You specified both a generic path ("/" or empty disallow) and specific paths for this block of code; this could be misinterpreted.
If anyone could set me straight on how a standard / default robots.txt file should look like, that would be much appreciated.
Thanks.

Remove the blank disallow line so it looks like this:
User-agent: *
Disallow: /cgi-bin/
E. Michael Brandt
www.divahtml.com
www.divahtml.com/products/scripts_dreamweaver_extensions.php
Standards-compliant scripts and Dreamweaver Extensions
www.valleywebdesigns.com/vwd_Vdw.asp
JustSo PictureWindow
JustSo PhotoAlbum, et alia

Similar Messages

Robots.txt question?

I am kind of new to web hosting, but learning.
I am hosting with just host, I have a couple of sites (addons). I am trying to publish my main site now and there is a whole bunch of stuff in site root folder that I have no idea what it is. I don't want to delete anything and I am probably not going too lol. But should I block a lot of the stuff in there in my Robots.txt file?
Here is some of the stuff in there:
.htaccess
404.shtml
cgi-bin
css
img
index.php
justhost.swf
sifr-addons.js
sIFR-print.cs
sIFR-screen.css
sifr.js
should I just disallow all of this stuff in my robots.txt? or any recommendations would be appreciated? Thanks

Seaside333 wrote:
public_html for the main site, the other addons are public_html/othersitesname.com
is this good?
thanks for quick response
Probably don't need the following files unless youre using text image-replacement techniques - sifr-addons.js, sIFR-print.cs, sIFr-screen.css, sifr,js
Good to keep .htaccess - (can insert special instrcutions in this file) - 404.shtml (if a page can't be found on your remote server it goes to this page) - cgi-bin (some processing scripts are placed in this folder)
Probably you will have your own 'css' folder. 'img' folder not needed. 'index.php' is the homepage of the site and what the browser looks for initially, you can replace it with your own homepage.
You don't need justhst.swf.
Download the files/folders to you local machine and keep them in case you need them.

Robot.txt and duplicat conent i need help

Hello Guys i´m new in BC i have 2
Questions. 1
My startpage is available as xxxx.de
and xxxx.de/index.html
and xxx.de/index.aspx
how can i change this "Duplicate Content!!!!
and the 2 Questions where i have to load the robot.txt.
THX

As long as you do not link to other versions and be inconsistent you do not need to worry about your start page.

Robots.txt - default setup

Hey!
Since im using Iweb for creating my websites i know that i have to setup robots.txt for SEO.
I have made several sites: one for restaurant, one is about photography, one personal etc...
There is nothing i want to "hide" from google robots on those websites.
So my question is:
When we create a website and publish it is there at least a default setup for robots.txt ?
For example:
Website is parked in folder: public_html/mywebsitefolder
Inside mywebsitefolder folder i have:
/nameofthewebsite
/cgi-bin
/index.html
Structure is same for all websites created with Iweb so what should we by default put in robots.txt ?
Ofcourse, in case you dont want to hide any of the pages or content.
Azz.

If you don't want to stop the bots crawling any folder - don't bother with one at all.
The robots.txt should go in the root folder since the crawler looks for....
http://www.domain-name.com/robots.txt
If your site files are in a sub folder the robots.txt would be like...
User-agent: *
Disallow: /mywebsitefolder/folder-name
Disallow: /mywebsitefolder/file.file-extension
To allow all access...
User-agent: *
Disallow:
I suppose you may want to use robots.txt if you want to allow/disallow one particular bot.

Questions About JSP?

hi;
I am php devloper and I am learning Java now.
I am using NB6.1 , Tomcat 6.0 , J2ee 5.
I have some questions about devleoping web application with jsp.
1.What are
/WEB-INF/web.xml
/META-INF/context.xml
/META-INF/MANIFEST.MF
files that generated with NB.are important or optional.are auto generated or may I edit.
2.can I distribute my web application in non text format.How?
3.if I want to add some files to my application that its running in tomcat.how can i do that with out rebuild all.
thank you
Add to mtz1406's Reputation

thank you all for helping.
I will write more information about my questions may this help others:
*{ /WEB-INF/web.xml* - The +Web Application Deployment
Descriptor+ for your application. This is an XML file describing
the servlets and other components that make up your application,
along with any initialization parameters and container-managed
security constraints that you want the server to enforce for you.
This file is discussed in more detail in the following subsection.
As mentioned above, the <code>/WEB-INF/web.xml</code> file contains the
Web Application Deployment Descriptor for your application. As the filename
extension implies, this file is an XML document, and defines everything about
your application that a server needs to know (except the context path,
which is assigned by the system administrator when the application is
deployed).
The complete syntax and semantics for the deployment descriptor is defined
in Chapter 13 of the Servlet API Specification, version 2.3. Over time, it
is expected that development tools will be provided that create and edit the
deployment descriptor for you. In the meantime, to provide a starting point,
a [basic web.xml file|http://localhost:8080/docs/appdev/web.xml.txt]
is provided. This file includes comments that describe the purpose of each
included element.
NOTE - The Servlet Specification includes a Document
Type Descriptor (DTD) for the web application deployment descriptor, and
Tomcat 6 enforces the rules defined here when processing your application's
<code>/WEB-INF/web.xml</code> file. In particular, you must
enter your descriptor elements (such as <code><filter></code>,
<code><servlet></code>, and <code><servlet-mapping></code> in
the order defined by the DTD (see Section 13.3).
h4. } from tomcat documentation
Tomcat Context Descriptor
bq. A /META-INF/context.xml file can be used to define Tomcat specific \\ configuration options, such as loggers, data sources, session manager \\ configuration and more. This XML file must contain one Context element, which \\ will be considered as if it was the child of the Host element corresponding \\ to the Host to which the The Tomcat configuration documentation contains \\ information on the Context element.
}from tomcat documentation
but I still want more information about this question:
Q3: I want to distribute (sell to another organaization) without give sorce code in jsp files.So I want to precompile it to be just class files or jar files.
I want to use ant that become with netbeans 6.1.can anyone give me information about how to do that.
thank you again

Sql developer: question about exporting data

Hi,
we're recently working with sql-developer. i've got a question about how we can export query results to txt/csv files for use in other applications.
First a problem: if we start a query that looks like this:
select * from
select * from A where start_date = &date
) a,
select * from B where start_date = &date
) b
where a.name = b.name
Sql-developer asks twice to input a value for the variable 'date', although it's the same variable and it's supposed to have the same value.
We solve this by making a script:
first we define the variable, then we put the query.
When we start the script, the query runs ok and sql developer asks to input the value for the variable once.
But now the result of the query is shown in the script output. The script output seems to be limited in number of lines and difficult to export.
So my question is: what's the best way to export query results to txt/csv files, avoiding the problem mentioned above?
i hope there is a solution where we can use a single query or script.
Thanks in advance!

Using bind variables like ":date" should solve the problem of being asked twice for the same thing.
Executing the query normally (F9), gives you the export options you require through the context menu inside the Results grid.
Regards,
K.

Robots.txt and Host Named Site Collections (SEO)

When attempting to exclude ALL SharePoint Sites from external indexing, when you have multiple web apps and multiple Host Named Site Collections, should I add the robots.txt file to the root of each web app, as well as each hnsc? I assume so, but, thought
I would check with the gurus...
- Rick

I think, one for each site collection as each site collection has different name and treated as web site.
"he location of robots.txt is very important  It must be in the main directory because otherwise user agents (search engines) will not be able to find it.  Search engines look first in the main directory (i.e.http://www.sitename.com/robots.txt)
and if they don’t find it there, they simply assume that this site does not have a robots.txt file"
http://www.slideshare.net/ahmedmadany/block-searchenginesfromindexingyourshare-pointsite
Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem. ****************************************************************************************** Thanks -WS MCITP(SharePoint 2010, 2013) Blog: http://wscheema.com/blog

Question about dependent projects (and their libraries) in 11g-Oracle team?

Hello everyone,
I have a question about dependent projects. An example:
In JDeveloper 10.1.3.x if you had for instance 2 projects (in a workspace): project 1 has one project library (for instance a log4j library) and project 2 is a very simple webapplication which is dependent on project 1. Project 2 has one class which makes use of log4j.
This compiles fine, you can run project 2 in oc4j, and the libraries of project 1 (log4j) are added on the classpath and everything works fine. This is great for rapid testing as well as keeping management of libraries to a minimum (only one project where you would update a library e.g.)
However in 11g this approach seems not to work at all anymore now that weblogic is used, not even when 'export library' is checked in project 1. The library is simply never exported at all - with a noclassdeffound error as result. Is this approach still possible (without having to define multiple deployment profiles), or is this a bug?
Thanks!
Martijn
Edited by: MartijnR on Oct 27, 2008 7:57 AM

Hi Ron,
I've tried what you said, indeed in that .beabuild.txt when 'deploy by default' is checked it adds a line like: C:/JDeveloper/mywork/test2/lib/log4j-1.2.14.jar = test2-view-webapp/WEB-INF/lib/log4j-1.2.14.jar
Which looks fine, except that /web-inf/lib/ is empty. I presume its a sort of mapping to say: Load it like it in WEB-INF/lib? This line is not there when the deploy by default is not checked.
I modified the TestBean as follows (the method that references Log4j does it thru a Class.forName() now only):
public String getHelloWorld() {
try {
Class clazz = Class.forName("org.apache.log4j.Logger");
System.out.println(clazz.getName());
catch(Exception e) {
e.printStackTrace();
return "Hello World";
In both cases with or without line, it throws:
java.lang.ClassNotFoundException: org.apache.log4j.Logger
     at weblogic.utils.classloaders.GenericClassLoader.findLocalClass(GenericClassLoader.java:283)
     at weblogic.utils.classloaders.GenericClassLoader.findClass(GenericClassLoader.java:256)
     at weblogic.utils.classloaders.ChangeAwareClassLoader.findClass(ChangeAwareClassLoader.java:54)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
     at weblogic.utils.classloaders.GenericClassLoader.loadClass(GenericClassLoader.java:176)
     at weblogic.utils.classloaders.ChangeAwareClassLoader.loadClass(ChangeAwareClassLoader.java:42)
     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at nl.test.TestBean.getHelloWorld(TestBean.java:15)
Secondly I added weblogic.xml with your suggested code, in the exploded war this results in a weblogic.xml which looks like:
<?xml version = '1.0' encoding = 'windows-1252'?>
<weblogic-web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bea.com/ns/weblogic/weblogic-web-app.xsd" xmlns="http://www.bea.com/ns/weblogic/weblogic-web-app">
<container-descriptor>
<prefer-web-inf-classes>true</prefer-web-inf-classes>
</container-descriptor>
<jsp-descriptor>
<debug>true</debug>
<working-dir>/C:/JDeveloper/mywork/test2/view/classes/.jsps</working-dir>
<keepgenerated>true</keepgenerated>
</jsp-descriptor>
<library-ref>
<library-name>jstl</library-name>
<specification-version>1.2</specification-version>
</library-ref>
<library-ref>
<library-name>jsf</library-name>
<specification-version>1.2</specification-version>
</library-ref>
</weblogic-web-app>
The only thing from me is that container-descriptor tag, the rest is added to it during the deployment. Unfortunately, it still produces the same error. :/ Any clue?

A question about the SPOOL command in sqlplus

Dear all,
I have a question about the SPOOL Command and I would appreciate if you could kindly give me a hand. Consider the following sql script.
SPOOL result.txt
SELECT * FROM mytable;
SPOOL OFF;This works pretty well, and the whole content of the table "mytable" is exported to the text file "result.txt". However, sqlplus prints also the number of lines
printed after each query. As a result, after running this script, at the end of the file, I have always a line like
"20541 lines returned"How can I avoid this line (the number of returned lines) in my result file?
Thanks in advance,
Dariyoosh

Peter Gjelstrup wrote:
Hi Dariyoosh,
As you are about to find out, SQL*Plus is a really powerful tool once the wonders of it are discovered.
You really should study the reference
http://download.oracle.com/docs/cd/E11882_01/server.112/e10823/toc.htm
In your current case especially the SET command
http://download.oracle.com/docs/cd/E11882_01/server.112/e10823/ch_twelve040.htm#BACGAJIC
Regards
PeterHello there,
Thank you very much for your attention to my problem and in particular the interesting links.
Kind Regards,
Dariyoosh

Question about Lego Mindstorms Capabilities

Hello, thank you for reading my question!
I am currently working on the creation of a teleoperated robotics lab. for a introductory robotics course. I wish to allow students to control a labyrinth-solver robot via Labview, with the Lego NXT toolbox. A computer server will be connected to the Lego via Bluetooth, and the student communicates with the server via a web browser VI (a web published VI). I am interested on the Lego Mindstorm platform but I wanted to ask some questions about its software capabilities:
*Is it posible via a web VI interface for the student to load a program onto the Lego Mindstorm? My idea is to allow the student to load a VI (or the equivalent of a LEGO NXT ".exe")onto the server, and the server to load the program to the Lego. The Web VI already has some parameters the student can change a simple program execution (see [1]), but I also wanted to give the student the chance of loading their own program for more complicated algorithms.
Thank you in advanced for your time and patience!
[1] Maze Solving Algorithms
http://en.wikipedia.org/wiki/Maze_solving_algorithm

thats the exact error, could it be something of the automathic update of the driver once the joystick was conected through the usb?

Web Repository Manager and robots.txt

Hello,
I would like to search an intranet site and therefore set up a crawler according to the guide "How to set up a Web Repository and Crawl It for Indexing".
Everything works fine.
Now this web site uses a robots.txt as follows:
<i>User-agent: googlebot
Disallow: /folder_a/folder_b/
User-agent: *
Disallow: /</i>
So obviously, only google is allowed to crawl (parts of) that web site.
My question: If I'd like to add the TRex crawler to the robots.txt what's the name of the "User-agent" I have to specify here?
Maybe the name I defined in the SystemConfiguration > ... > Global Services > Crawler Parameters > Index Management Crawler?
Thanks in advance,
Stefan

Hi Stefan,
I'm sorry but this is hard coded. I found it in the class : com.sapportals.wcm.repository.manager.web.cache.WebCache
private HttpRequest createRequest(IResourceContext context, IUriReference ref)
        HttpRequest request = new HttpRequest(ref);
        String userAgent = "SAP-KM/WebRepository 1.2";
        if(sessionWatcher != null)
            String ua = sessionWatcher.getUserAgent();
            if(ua != null)
                userAgent = ua;
        request.setHeader("User-Agent", userAgent);
        Locale locale = context.getLocale();
        if(locale != null)
            request.setHeader("Accept-Language", locale.getLanguage());
        return request;
So recompile the component or changing the filter... I would prefer to change the roberts.txt
hope this helps,
Axel

Question about HTML files using TextEdit

First off, I wanted to ask if this is the proper place to post a question about HTML/XHTML. I couldn't really find anywhere else that seemed to fit better, but please point me in the right direction if this is not the place. Thanks.
Moving on -- here's my question:
I'm having trouble with working with HTML files in TextEdit. I'm on a Mac, using TextEdit as my HTML editor (I can not afford one of these other HTML editors, and I like using a simple text editor like TextEdit for HTML). Here's my problem: I open a new rich text document in TextEdit and write my HTML and then choose File>Save As and choose "HTML" under the File Format drop-down box. Having saved this file as an HTML file, I then open Safari and choose File>Open File and select my HTML file; however, when I do this, my web page's text does not appear in the browser window. Instead, the HTML code itself appears in the browser window, as if the browser was not interpreting it as HTML and converting it. The same problem happens when saving the file as a .htm file using Microsoft Word for Mac.
So, as another solution (at the suggestion of a helpful poster in a previous thread), I tried creating a plain text file in TextEdit (instead of a rich text file like before). Now, in TextEdit there is no option for saving plain text files as HTML files, so I simply save it as a Unicode-8 format and then find the file in Finder and change the extension to .htm myself (I've tried .html as well). This, fortunately enough, actually works! When I open the file in Safari I get to see my web page as expected. However, the first time I quit out of the application TextEdit and then try to reopen my .htm file in TextEdit, I no longer can see my HTML code. Instead, TextEdit shows me the actual web page text that I would expect to see when I open the file in a browser, and my HMTL code is lost.
Can someone please help me here? There has to be a way to edit HTML in TextEdit without the code disappearing every time you quit out of the application and reopen the HTML file. Any help is greatly, greatly appreciated. Thank you.

That's the problem -- once I convert it to plain text there is no longer an HTML option under the Save As drop down menu. The drop down menu's title is Plain Text Encoding instead of File Format and the only options I get are Unicode-8, Unicode-16, Western, Chinese, and so forth. I can save it as an HTML file, but only if the file is rich text, which doesn't work for HTML.
So, I can save it as plain text (.txt) and then go and change the extension myself, and like I said, it works if I do that. I can edit HTML in TextEdit and open the file properly in Safari to view my web page. The problem is (I went over all this in my first post) that, after that, if I exit TextEdit and reopen it, my document is no longer HTML code -- it is now simply the text of the web site as if I had opened it in a broswer.
What do you suggest I do?

Question about the custom panel language

I have a question about the custom panel language...
The document you provide seems to lack details on some features. Namely the icon and picture widgets. I see from looking at the examples and other vendor's web pages that these features exist, but I don't find any detailed descriptions of them in the documentation. Is there a more complete document describing these and other features...
http://www.adobe.com/products/xmp/custompanel.html
Alternatively, can someone fill me in on the syntax and options for at least the icon and picture widget. For instance, how do you load external icons or pictures...
Tom

Gunar,
It could be interesting to have something like
icon(url: 'http://www.adobe.com/Images/logo.gif', width: 20, height: 20);
or better
picture(url: 'http://www.adobe.com/Images/logo.gif', width: 20, height: 20);
for the pictures and
include(url: 'http://www.adobe.com/xml/custompanel/camera1.txt');
for include the cusmtom panel's dynamic portions
Juan Pablo

Is ROBOT.TXT supported

The Robots Exclusion Protocol uses the robot.txt configuration file to give
instructions to Web crawlers (robots) about how to index your pages.
The functionality is available through a specific META tags in HTML
documents.
Is ROBOT.TXT supported with WLS ?
Bernard DEVILLE

According to [https://en.wikipedia.org/wiki/Snapdragon_%28system_on_chip%29 Wikipedia it is a Snapdragon 2]. Which would make it a compatible device.
If you search the Play Store for Firefox do you see a Firefox app to install?

No robots.txt?

Hello,
just a short question: Why does Muse not create a robots.txt?
A couple of months ago a had a client who didn´t showed up on any search results but the site was online for more than a year.
We investigated and found out that the client had no robots.txt on his server. Google mentions ( sorry i cannot find the source right now) that it will not index a page if there is no robots file.
I think that it is important to know this. It would be cool if there is a feature in the export dialog ( checkbox "create robots.txt" - and maybe a Settings Panel (follow, nofollow, no directories...)
Regards
Andreas

Here's one example of the text Google is posting:
http:/ / webcache. googleusercontent. com/ search? rlz= 1T4GGLR_enUS261US323&hl= en&q= cache:SSb_hvtcb_EJ:http:/ / www. inmamaskitchen. com/ RECIPES/ RECIPES/ poultry/ chicken_cuban. html+cuban+chicken+with+okra&ct= clnk Robots.txt File May 31, 2011
http:/ / webcache. googleusercontent. com/ search? q= cache:yJThMXEy-ZIJ:www. inmamaskitchen. com/ Nutrition/ Robots.txt File May 31, 2011
Then there are things relating to Facebook????
http:/ / www. facebook. com/ plugins/ like. php? channel_url= http%3A%2F%2Fwww. inmamaskitchen. com%2FNutrition%2FBlueberries. html%3Ffb_xd_fragment%23%3F%3D%26cb%3Df2bfa6d78d5ebc8%26relation%3Dparent. parent%26transport%3Dfragment&href= http%3A%2F%2Fwww. facebook. com%2Fritzcrackers%3Fsk%3Dapp_205395202823189&layout= standard&locale= en_US&node_type= 1&sdk= joey&send= false&show_faces= false&width= 225
THNAK YOU!

Question about robots.txt

Similar Messages

Maybe you are looking for