robots.txt
web development
Wednesday, 31 March 2021
Hacker
web development
robots.txt
A robots.txt file tell search engine crawler's which posts, pages or files the crawler's can or can not request from your website. This is used mainly to avoid overloading of your website with requests. this is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, you should use no-index directive, or password protect your page.
What is robots.txt
robots.txt is used primarily to manage search engine crawler traffic to our website, and usually to keep a page off Google, ‘robots’ or ‘spiders’ to crawl and index post/pages on the website. That's robots are also known as "user-agents".
Sometimes, robots.txt would make their way on to posts that website owners didn’t want to get indexed. For example, The under construction website or private sites.
The robots.txt file consists of single or more rules. Each rule blocks (or allows) access for a search engine crawler to a Defined file path in that site.
Here is an simple robots.txt code :
Example
User-agent:Media partners-Google
User-agent: *
Disallow: /search
Disallow: /category
Allow: /
Sitemap: siteurl/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: weburl/sitemap.xml
Good Comments
ReplyDeletePaigham e Nijat
ReplyDelete