New robots.txt commands: make sure that Google can index your website more efficient

Posted in Google, Search Engine Optimization, Yahoo! on Nov 30, 2007

It seems that Search Spiders is currently messing around with new robots.txt commands. If your robots.txt file accidentally contains some of the new commands, it is possible that the robots.txt file commands Search Spiders to not index your site.

What is a robots.txt file?
The robots.txt file is a small text file that must be placed at your root folder (http://www.anysite.com/robots.txt). It tells the search engine bot which section on your website has to be indexed and which section should be de-indexed.

You may use text editor to generate a robots.txt file. The content of a robots.txt file consists of so-called "records".

Records the information for a particular search engine bot. Each record consists of two fields: the user agent data and one or more Disallow data. Here's an example:

User-agent: googlebot Disallow: /cgi-bin/

This robots.txt file would allow the bot "googlebot", which is the search engine spider of Google, to index every section from your website and not including files from the "cgi-bin" folder. All files in the "cgi-bin" folder will be ignored by googlebot.

Which new commands is Google testing?
Webmasters have identified that search engines seems to be with a Noindex commands for the robots.txt file. It basically seems to do exactly as Disallow code so it's not clear why search engines is using this code.

Other commands that are being tested by search engines are Noarchive and Nofollow. However, none of these commands is official yet.

How does this affect your rankings on Google? If you mistakenly use the wrong commands then you are telling search engines to go back-off although you want them to notice your section.

This is why it is important that you re-check the text of your robotx.txt file.

How to check your robots.txt file
Open your web browser and enter http://www.anysite.com/robots.txt to view the data of your robots txt file. Here are the most important tips for a good robots.txt file:

1. There are only two documented commands by search engines for the robots.txt file: User-agent and Disallow. Refrain from using more commands than these.

2. Do not change the order of the commands. Begin with the user-agent line and then add the disallow commands:

User-agent: * Disallow: /cgi-bin/

3. Do not use other than one folder in a Disallow line. "Disallow: /support /cgi-bin/ /images/" does not work. Generate another Disallow line for every directory:

User-agent: *
Disallow: /support
Disallow: /secret/

4. Be sure to use the right case (UPPER CASE or lower case). The file names on your web servers are case sensitive. If the name of your folder is "Support", never write "support" in the robots.txt file.

You can find user agent names in your log files by checking for requests to robots.txt. Usually, all search engine spiders should be given the same rights. To do that, use User-agent: * in your robots.txt file.

What happens if you don't have a robots.txt file?
If your website doesn't have a robots.txt file, (you can check this by entering your http://www.anysite.com/robots.txt in your web browser) then bots will automatically index everything they can find on your website.

Maintaining your robots.txt file is important if you want Google to index your web pages.

Popularity: 10% [?]

Comments are closed.


  • Categories

  • Archives


ss_blog_claim=d33782a8ef89e9c44e63b4d3cf06e98c