Robots.txt

April 2015

Presentation of the robots.txt File

robots.txt is a text file that contains commands for search engine indexing robots that specify the pages that can and cannot be indexed. When a search engine explores a website, it starts by looking for the robots.txt file at the root of the site.

robots.txt File Format

The robots.txt file is an ASCII file found at the root of the site. It can contain the following commands:

  • User Agent: used to specify the robot that is subject to the following orders. The value * means "all search engines"
  • Disallow: used to identify the pages to be excluded during indexing. Each page or path that is to be excluded must be on a separate line and must start with / The value / alone means "all of the website's pages".

Warning: The robots.txt file should not contain any empty lines!

Here are some examples of robots.txt files:

  • All pages are excluded:
    User Agent: *
    Disallow: /
  • No pages are excluded (equivalent to having no robots.txt file, meaning that all the pages are visited):
    User Agent: *
    Disallow: 
  • Only one robot is authorized:
    User Agent: RobotName
    Disallow:
    User Agent: *
    Disallow: /
  • One robot is excluded:
    User Agent: RobotName
    Disallow: /
    User Agent: *
    Disallow:
  • One page is excluded:
    User Agent: *
    Disallow: /directory/path/page.html
  • All pages from a directory and its subfolders are excluded:
    User Agent: *
    Disallow: /directory/

Examples of User Agents

Here are a few examples of User Agents for the most popular search engines:

Search Engine Name User Agent:
Alta Vista Scooter
Excite ArchitextSpider
Google Googlebot
HotBot Slurp
InfoSeek InfoSeek Sidewinder
Lycos T-Rex
Voilà Echo

For More Information

The web robots page

For unlimited offline reading, you can download this article for free in PDF format:
Robots-txt.pdf

See also


Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
Robots.txt
This document entitled « Robots.txt » from Kioskea (en.kioskea.net) is made available under the Creative Commons license. You can copy, modify copies of this page, under the conditions stipulated by the license, as this note appears clearly.