adrian holovaty

Low-tech edition (Skip to navigation)

August 20, 2002, 8:50 PM ET

How news sites keep robots away

After today's lunchtime links entry and the reader comments it brought about, I got to thinking about news sites' robots.txt files. Are other sites as robot-hostile as nytimes.com? I took a peek at a few news sites' files -- after all, if they're accessible to robots, they're accessible to humans -- and here are a few observations, along with links to the files themselves.

Robot-hostile sites

Trends

Random observations

Comments (7) / Permalink

August 20, 2002, 12:51 PM ET

Tuesday's lunchtime links

Evolt.org: Describing Document Text for Accessibility -- "A key focus of accessible web site design is providing equivalent alternatives to auditory and visual content."

Joe Gregorio points out nytimes.com's robots.txt file (the file that delineates which parts of the Web site a robot is allowed to index) isn't very friendly. Namely, it bans robots from just about every file on the server. (More info about robots.txt.) Now that he mentions it, I've never come across an NYT story via a Google search. What a hostile policy!

Comments (6) / Permalink



Thanks for reading.

A Django site.