Anatoly Lubarsky

Logo
MSSQL, .NET, Design. Life and Music

Tailrank Bot Does not Respect robots.txt and Autodiscovery

Tailrank is a service that spiders thousands of weblogs trying to find highly linked and discussed links and citations. Then store these in their index. Once Tailrank discovers something important it promotes it to the main website.


So, Tailrank has a bot that identifies itself like so:


Mozilla/5.0+(X11;+U;+Linux+i686;+en-US;+rv:1.2.1;+aggregator: Tailrank;+http://tailrank.com/robot)+Gecko/20021130


Rather stupid Tailrank bot because it continues to request feeds on blogs.x2line.com that do not exist such as "/al/atom.xml" or "/atom.xml". Each up to 600 requests respectfully this month only. Despite 404 (not exists) HTTP response.


Tailrank bot:


  • Does not respect HTTP 404 status response.
  • Does not know how to use autodiscovery for feeds (my blog has both for atom and rss), or may be this data is not in sync ?

Anyone from Tailrank reading this (or may be spider this :)) ? Please, fix this.


Update: Also tailrank bot does not respect robots.txt since it does not request it.


Related Posts:

Saturday, December 30, 2006 12:59 AM

If your feedback doesn't appear right away, please be patient as it may take a few minutes to publish.

Post a Comment

Protected by CAPTCHAEnter the code you see
Name (*)  
E-mail (*)  
Url
Remember

Comment (*)