What is robots.txt?
Robots.txt is a text file that contains site indexing parameters for the search engine robots.
We recommend watching How to manage site indexation.
How to set up robots.txt
- Create a file named robots.txt in a text editor and fill it in using the guidelines below.
- Check the file in the Yandex.Webmaster service (Robots.txt analysis in the menu).
- Upload the file to your site's root directory.
The User-agent directive
The Yandex robot supports the robots exclusion standard with enhanced capabilities described below.
The Yandex robot's work is based on sessions: for every session, there is a pool of pages for the robot to download.
A session begins with the download of the robots.txt file. If the file is missing, is not a text file, or the robot's request returns an HTTP status other than
200 OK, the robot assumes that it has unrestricted access to the site's documents.
In the robots.txt file, the robot checks for records starting with
User-agent: and looks for either the substring
Yandex (the case doesn't matter) or
*. If a string
User-agent: Yandexis detected, directives for
User-agent: * are ignored. If the
User-agent: Yandex and
User-agent: * strings are not found, the robot is considered to have unlimited access.
You can enter separate directives for the following Yandex robots:
- YandexBot — The main indexing robot.
- YandexDirect — Downloads information about the content on Yandex Advertising Network partner sites for selecting relevant ads. Interprets robots.txt in a special way.
- 'YandexDirectDyn' — Generates dynamic banners. Interprets robots.txt in a special way.
- YandexMedia — Indexes multimedia data.
- YandexImages — Indexer of Yandex.Images.
- YaDirectFetcher — The Yandex.Direct robot. Interprets robots.txt in a special way.
- YandexBlogs — The Blog search Blog search robot. Indexes posts and comments.
- YandexNews — The Yandex.News robot.
- YandexPagechecker — Semantic markup validator.
- YandexMetrika — The Yandex.Metrica robot.
- YandexMarket — TheYandex.Market robot.
- YandexCalendar — The Yandex.CalendarYandex.Calendar robot.
If there are directives for a specific robot, directives
User-agent: Yahoo and
User-agent: * aren't used.
User-agent: YandexBot # will be used only by the main indexing robotDisallow: /*id=User-agent: Yandex # will be used by all Yandex robotsDisallow: /*sid= # except for the main indexing robotUser-agent: * # won't be used by Yandex robotsDisallow: /cgi-bin
Disallow and Allow directives
To prohibit the robot from accessing your site or certain sections of it, use the
User-agent: YandexDisallow: / # blocks access to the whole siteUser-agent: YandexDisallow: /cgi-bin # blocks access to the pages # starting with '/cgi-bin'
According to the standard, you should insert a blank line before every
# character designates commentary. Everything following this character, up to the first line break, is disregarded.
Allow directive to allow the robot to access specific parts of the site or the entire site.
User-agent: YandexAllow: /cgi-binDisallow: /# prohibits downloading anything except for the pages # starting with '/cgi-bin'
Disallow directives from the corresponding
User-agent block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match
a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect the way they are used by the robot. Examples:
# Source robots.txt:User-agent: YandexAllow: /catalogDisallow: /# Sorted robots.txt:User-agent: YandexDisallow: /Allow: /catalog# only allows downloading pages# starting with '/catalog'
# Source robots.txt:User-agent: YandexAllow: /Allow: /catalog/autoDisallow: /catalog# Sorted robots.txt:User-agent: YandexAllow: /Disallow: /catalogAllow: /catalog/auto# prohibits downloading pages starting with '/catalog',# but allows downloading pages starting with '/catalog/auto'.
Allowdirective takes precedence.
Allow and Disallow directives without parameters
If the directives don't contain parameters, the robot handles the data as follows:
User-agent: YandexDisallow: # same as Allow: /User-agent: YandexAllow: # isn't taken into account by the robot
Using the special characters * and $
You can use the special characters
$ to set regular expressions when specifying paths for the
* character indicates any sequence of characters (or none). Examples:
User-agent: YandexDisallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx' # and '/cgi-bin/private/test.aspx'Disallow: /*private # prohibits both '/private', # and '/cgi-bin/private'
The $ character
By default, the
* character is appended to the end of every rule described in the robots.txt file. Example:
User-agent: YandexDisallow: /cgi-bin* # blocks access to pages # starting with '/cgi-bin'Disallow: /cgi-bin # the same
* at the end of the rule, use the
$ character, for example:
YandexDisallow: /example$ # prohibits '/example', # but allows '/example.html'
YandexDisallow: /example # prohibits both '/example', # and '/example.html'
$character doesn't forbid
*at the end, that is:
YandexDisallow: /example$ # prohibits only '/example'Disallow: /example*$ # exactly the same as 'Disallow: /example' # prohibits both /example.html and /example
The Sitemap directive
If you use a Sitemap file to describe your site's structure, indicate the path to the file as a parameterof the
Sitemap directive (if you have multiple files, indicate all paths). Example:
User-agent: YandexAllow: /sitemap: https://example.com/site_structure/my_sitemaps1.xmlsitemap: https://example.com/site_structure/my_sitemaps2.xml
The directive is intersectional, meaning it is used by the robot regardless of its location in robots.txt.
The robot remembers the path to your file, processes your data and uses the results during the next visit to your site.
If your site has mirrors, a special mirror bot (Mozilla/5.0 (compatible; YandexBot/3.0; MirrorDetector; +http://yandex.com/bots)) detects them and forms a mirror group for your site. Only the main mirror is included in the search. You can indicate which
site is the main mirror in the robots.txt file. The name of the main mirror should be listed in the
Host directive does not guarantee that the specified main mirror will be selected. However, the decision-making algorithm will
assign it a high priority. Example:
#If https://www.main-mirror.com is your site's main mirror, then #robots.txt for all your sites from the mirror group will look like this: User-Agent: *Disallow: /forumDisallow: /cgi-binHost: https://www.main-mirror.com
To maintain compatibility with robots that may deviate from the standard when processing robots.txt, add the
Host directive to the group that starts with the
User-Agent record right after the
Allow directives. The
Host directive argument is the domain name with the port number (80 by default), separated by a colon.
#Example of a well-formed robots.txt file, where#the
Hostdirective will be taken into account during processingUser-Agent: *Disallow:Host: https://www.myhost.ru
Host directive is intersectional and is used by the robot regardless of its location in robots.txt.
Hostdirective is processed. If several directives are indicated in the file, the robot will use the first one.
Host: myhost.ru # is usedUser-agent: *Disallow: /cgi-binUser-agent: YandexDisallow: /cgi-binHost: https://www.myhost.ru # isn't used
Host directive should contain:
The HTTPS protocol if the mirror is available only over a secure channel. If you use the HTTP protocol, there is no need to indicate it.
One valid domain name that conforms to RFC 952 and is not an IP address.
The port number, if necessary (
An incorrectly formed
Host directive is ignored.
# Examples of Host directives that will be ignoredHost: www.myhost-.comHost: www.-myhost.comHost: www.myhost.com:100000Host: www.my_host.comHost: .my-host.com:8000Host: my-host.com. Host: my..host.comHost: www.myhost.com:8080/Host: 22.214.171.124Host: www.firsthost.ru,www.secondhost.comHost: www.firsthost.ru www.secondhost.com
Examples of the
Host directive usage:
# domain.myhost.ru is the main mirror for# www.domain.myhost.com, so the correct use of # the Host directive is:User-Agent: *Disallow:Host: domen.myhost.ru
The Crawl-delay directive
If the server is overloaded and it isn't possible to process downloading requests, use the
Crawl-delay directive. You can specify the minimum interval (in seconds) for the search robot to wait after downloading one page, before
starting to download another.
To maintain compatibility with robots that may deviate from the standard when processing robots.txt, add the
Crawl-delay directive to the group that starts with the
User-Agent entry right after the
The Yandex search robot supports fractional values for
Crawl-Delay, such as "0.5". This doesn't mean that the search robot will access your site every half a second, but it may speed up the
User-agent: YandexCrawl-delay: 2 # sets a 2-second timeoutUser-agent: *Disallow: /searchCrawl-delay: 4.5 # sets a 4.5-second timeout
The Clean-param directive
If your site page addresses contain dynamic parameters that don't affect the content (for example, identifiers of sessions,
users, referrers, and so on), you can describe them using the
The Yandex robot uses this information to avoid reloading duplicate information. This improves the robot's efficiently and reduces the server load.
For example, your site contains the following pages:
www.example.com/some_dir/get_book.pl?ref=site_1&book_id=123 www.example.com/some_dir/get_book.pl?ref=site_2&book_id=123 www.example.com/some_dir/get_book.pl?ref=site_3&book_id=123
ref parameter is only used to track which resource the request was sent from. It doesn't change the page content. All three URLs
will display the same page with the
book_id=123 book. Then, if you indicate the directive in the following way:
User-agent: YandexDisallow:Clean-param: ref /some_dir/get_book.pl
the Yandex robot will converge all the page addresses into one:
If a page without parameters is available on the site:
all other URLs are replaced with it after the robot indexes it. Other pages of your site will be crawled more often, because there will be no need to update the pages:
Clean-param: p0[&p1&p2&..&pn] [path]
In the first field, list the parameters that must be disregarded, separated by the
In the second field, indicate the path prefix for the pages the rule should apply to.
The prefix can contain a regular expression in the format similar to the one used in the robots.txt file, but with some restrictions: you can only use the characters
A-Za-z0-9.-/*_. However, * is interpreted in the same way as in robots.txt.
* is always implicitly appended to the end of the prefix. For example:
Clean-param: s /forum/showthread.php
means that the
s parameter is disregarded for all URLs that begin with /forum/showthread.php. The second field is optional, and in this case the rule will apply to all pages on the site.
It is case sensitive.
The maximum length of the rule is 500 characters.
Clean-param: abc /forum/showthread.php Clean-param: sid&sort /forum/*.php Clean-param: someTrash&otherTrash
#for addresses like:www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243 www.example1.com/forum/showthread.php?s=1e71c4427317a117a&t=8243 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s /forum/showthread.php
#for addresses like:www.example2.com/index.php?page=1&sort=3a&sid=2564126ebdec301c607e5df www.example2.com/index.php?page=1&sort=3a&sid=974017dcd170d6c4a5d76ae #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: sid /index.php
#if there are several of these parameters:www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311 www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243 www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243 #robots.txt will contain the following: User-agent: Yandex Disallow: Clean-param: s /forum/index.php Clean-param: s /forum/showthread.php
Using Cyrillic characters
The use of the Cyrillic alphabet is not allowed in robots.txt file and HTTP server headers.
For domain names, use Punycode. For page addresses, use the same encoding as the one used for the current site structure.
Example of the robots.txt file:
#Incorrect: User-agent: Yandex Disallow: /cart Host: online-store.ru #Correct: User-agent: Yandex Disallow: /%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0 Host: xn----8sbalhasbh9ahbi6a2ae.xn--p1ai
The Yandex robot supports only the robots.txt directives listed on this page. The file processing rules described above represent an extension of the basic standard. Other robots may interpret robots.txt contents in a different way.
The results when using the extended robots.txt format may differ from results that use the basic standard, particularly:
User-agent: Yandex Allow: /Disallow: /# without extensions everything was prohibited because 'Allow: /' was ignored, # with extensions supported, everything is allowedUser-agent: YandexDisallow: /private*html# without extensions, '/private*html' was prohibited, # with extensions supported, '/private*html', # '/private/test.html', '/private/html/test.aspx', and so on are prohibited as wellUser-agent: YandexDisallow: /private$# without extensions supported, '/private$' and '/private$test', and so on were prohibited, # with extensions supported, only '/private' is prohibitedUser-agent: *Disallow: /User-agent: YandexAllow: /# without extensions supported, because of the missing line break, # 'User-agent: Yandex' would be ignored # the result would be 'Disallow: /', but the Yandex robot # parses strings based on the 'User-agent:' substring. # In this case, the result for the Yandex robot is 'Allow: /'User-agent: *Disallow: /# comment1...# comment2...# comment3... User-agent: YandexAllow: /# same as in the previous example (see above)
Examples using the extended robots.txt format:
User-agent: YandexAllow: /archiveDisallow: /# allows everything that contains '/archive'; the rest is prohibitedUser-agent: YandexAllow: /obsolete/private/*.html$ # allows HTML files # in the '/obsolete/private/... path' Disallow: /*.php$ # probibits all '*.php' on siteDisallow: /*/private/ # prohibits all subpaths containing # '/private/', but the Allow above negates # part of the prohibitionDisallow: /*/old/*.zip$ # prohibits all '*.zip' files containing # '/old/' in the pathUser-agent: YandexDisallow: /add.php?*user= # prohibits all 'add.php?' scripts with the ' user ' option
When forming the robots.txt file, you should keep in mind that the robot places a reasonable limit on its size. If the file size exceeds 32 KB, the robot assumes it allows everything, meaning it is interpreted the same way as:
Similarly, robots.txt is assumed to allow everything if it couldn't be downloaded (for example, if HTTP headers are not set properly or a
404 Not found status is returned).
A number of Yandex robots download web documents for purposes other than indexing. To avoid being unintentionally blocked
by the site owners, they may ignore the robots.txt directives designed for random robots (
In addition, robots may ignore some robots.txt restrictions for certain sites if there is an agreement between “Yandex” and the owners of those sites.
Yandex robots that don't follow common disallow directives in robots.txt:
- YaDirectFetcher downloads ad landing pages to check their availability and content. This is needed for placing ads in the Yandex search results and on partner sites. When crawling a site, the robot does not use the robots.txt file and ignores the directives set for it.
- YandexCalendar regularly downloads calendar files by users' requests. These files are often located in directories prohibited from indexing.
- YandexDirect downloads information about the content of Yandex Advertising network partner sites to identify their topic categories to match relevant advertising.
- YandexDirectDyn is the robot that generates dynamic banners.
- YandexMobileBot downloads documents to determine if their layout is suitable for mobile devices.
- YandexAccessibilityBot downloads pages to check their accessibility for users.
- YandexScreenshotBot takes a screenshot of a page.
- YandexMetrika is the Yandex.Metrica robot.
- YandexVideoParser is the Yandex video indexer.
- YandexSearchShop regularly downloads product catalogs in YML files by users' requests. These files are often placed in directories prohibited for indexing.
To prevent this behavior, you can restrict access for these robots to some pages or the whole site using the robots.txt directives, for example:
User-agent: YandexCalendarDisallow: /
User-agent: YandexMobileBotDisallow: /private/*.txt$