26

Web crawler stupidity

qr-code for this page's url

From my access logs:

65.55.3.134 - - [15/Oct/2010:09:49:13 +0000] "GET /robots.txt HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:14 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:16 +0000] "GET /robots.txt HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:17 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:20 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:22 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:25 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:28 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:31 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [15/Oct/2010:09:49:33 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"

65.55.3.134 - - [20/Oct/2010:17:04:13 +0000] "GET /robots.txt HTTP/1.1" 200 76 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:04:13 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:15 +0000] "GET /robots.txt HTTP/1.1" 200 76 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:17 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:19 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:22 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:28 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:30 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:33 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [20/Oct/2010:17:05:38 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"

65.55.3.134 - - [25/Oct/2010:19:22:24 +0000] "GET /robots.txt HTTP/1.1" 200 76 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:25 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:27 +0000] "GET /robots.txt HTTP/1.1" 200 76 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:28 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:31 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:34 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:37 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:39 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:42 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"
65.55.3.134 - - [25/Oct/2010:19:22:44 +0000] "GET / HTTP/1.1" 404 270 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._"

I don't know why they think they need to try eight times within about a quarter of a minute to fetch a page that they are told every time doesn't exist.

Plus, what do they think will have changed in the robots.txt in the two seconds since they last checked?

What a waste of their and my resources…