Google’s Search advocate, John Mueller, removes confusion about updating robots.txt file multiple times daily, saying it doesn’t make any difference.
Robots.txt file recap
Robot.txt are commands you can add to your site or server to instruct Google to take specific actions, such as blocking access to particular URLs, not showing media files in search results, and accessing sensitive files.
Google’s explanation:
“A robots.txt file tells search engine crawlers which URLs the argentina mobile numbers list crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.”
Blocking Googlebot to stop server overloading
Muller was asked a question on Bluesky about stopping Googlebot from crawling a website at different times a day to avoid overloading it; John said it was a bad idea because robots.txt can be cached every 24-hours and it won’t necessarily know you don’t want it to crawl a page at 10:00 am and do at 4:00 pm.
The question:
“One of our technicians asked if they could upload a robots.txt file in the morning to block Googlebot and another one in the afternoon to allow it to crawl, as the website is extensive, and they thought it might overload the server. Do you think this would be a good practice?”
“(Obviously, the crawl rate of Googlebot adapts to how well the server responds, but I found it an interesting question to ask you) Thanks!”
John’s response to the question:
“It’s a bad idea because robots.txt can be cached up to 24 hours (developers.google.com/search/docs/… ). We don’t recommend dynamically changing your robots.txt file like this over the course of a day. Use 503/429 when crawling is too much instead.”
Google Advises Not To Update Robots.txt File Throughout The Day
-
- Posts: 202
- Joined: Sat Dec 28, 2024 8:56 am