How news sites keep robots away

Written by Adrian Holovaty on August 20, 2002

After today's lunchtime links entry and the reader comments it brought about, I got to thinking about news sites' robots.txt files. Are other sites as robot-hostile as nytimes.com? I took a peek at a few news sites' files -- after all, if they're accessible to robots, they're accessible to humans -- and here are a few observations, along with links to the files themselves.

Robot-hostile sites

Trends

Random observations

Comments

Posted by Jay Small on August 21, 2002 at 9:23 a.m.:

Hi, Adrian. Projo.com happens to be one of the sites I oversee for Belo Interactive, so I was surprised to see you got an error message on it with Mozilla. I'm not able to repeat that error, at least not on the first several pages I browse this morning (either with Mozilla or Netscape 7 preview).

But I'd certainly like to investigate it -- wondering if you can provide any details that would help track it down. As for the robots.txt files on BI sites, you're right. We're still trying to figure out how best to combine a robots.txt file with the user registration protocols we run. As I commented yesterday, it may be handy to negotiate a back door for key searchers to index articles without running into registration.

Posted by Adrian Holovaty on August 21, 2002 at 9:40 a.m.:

No, no...Projo was fine...I meant the main Belo Interactive site. :) Sorry for the confusion; I should've been more clear. I've reworded the blog entry.

Posted by Jay Small on August 21, 2002 at 11:30 a.m.:

Whew! I feel better. Of course, we really ought to get the corporate site straight, too, though it's managed completely differently than our local sites. One small step at a time! <g>

Posted by Anil on August 22, 2002 at 1:55 a.m.:

We changed servers at the Voice during the days after the attacks in September, pushing up a scheduled move, which is why our logging system changed then. One of our servers will show the old logs at the URL you linked to, and one of ours will show current stats, so some of your visitors might not see what you've described.

Posted by Adrian on August 22, 2002 at 9:39 a.m.:

Ah, I see. Sure enough, I clicked on the link again and saw the current stats. Thanks for the insight.

Posted by KEvin on October 31, 2002 at 11:45 p.m.:

I would recommend www.imhosted.com - I have used them for a year and its great..

Comments have been turned off for this entry.