The internet is based on the links that lead from one page to another or from one site to another. The Yandex search robot follows the links and analyzes them. If documents on your site don't contain links to other pages, the Yandex robot will never learn about them, and they will be ignored in the search. This is why it is important to monitor how your pages are linked together. Here are a few tips on organizing the site structure:
Keep a clear structure of links on the site. Each document should belong to a certain section. Each document must be available by an ordinary link marked by the <A> tag in the HTML code of the page: <a href=...>...</a>. The time the Yandex robot needs to index a site page depends, among other factors, on the nesting depth of this page. The deeper the page is nested, the longer it may take to include it in the index.
When you link site documents, take one more factor into account: most often, the home page serves as the entry point to your site. It is much easier for people to remember the site name (domain name) than an internal page with a complicated URL. Site navigation should allow the user to find documents quickly and easily. It should prevent situations when the user fails to find the information and leaves the site disappointed.
Use the site map. For large projects containing many pages, use the site map. You can upload it in the Yandex.Webmaster section or specify the link in the robots.txt file. This will help the search robot to index and analyze documents on your site.
Restrict indexing of technical information. Numerous duplicate pages, site search results, visits statistics and similar pages can consume the robot's resources and prevent it from indexing the site's main content. Such pages have no value for the search engine because they don't provide any unique information for the users in the search output. You should prohibit such pages from indexing in the robots.txt file. If you don't exclude them from indexing, technical pages may be indexed often because they are regularly added and updated, while the pages with important information might remain unnoticed by the robot.
Each page should have a unique URL. The URL should provide a clue about the page content. Using transliteration in the page URLs will let the robot understand what the page may be about. For example, the URL http://download.yandex.ru/company/experience/Baitin_Korrekciya%20gramotnosti.pdf gives the search robot a lot of information about the document: it can be downloaded, the format is probably PDF, the document must be relevant for the query “correcting grammar” (in Russian), and so on.
Provide text links to other sections of the site to give the robot more information about their content.
Make sure that symlinks are correct so that the URL doesn't grow infinitely when the user navigates the site. Pages with multiple repeated tokens in the URL might not be indexed. For example, example.com/vasya/vasya/vasya/vasya/.
- Use the robots.txt file to prevent indexing of pages not intended for users.
- Use the same encoding for the site pages and Cyrillic URLs. When the robot finds a Cyrillic link like href="/корзина" on a page with UTF-8 encoding, it saves the link in this encoding. This means the link should be available at "/%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0".