Google News smarter than ever

Written by Adrian Holovaty on September 19, 2002

The geniuses at Google have debuted a much-improved Google News site. It's still in beta, according to the about page, but it's nothing short of an incredible technological feat already. Get this:

The headlines that appear on our homepage are selected entirely by a mathematical algorithm, based on how and where the stories appear elsewhere on the web. There are no human editors at Google selecting or grouping the headlines and no individual decides which stories get top placement.

Some kinks still need to be worked out -- since when is the Austin (Texas) American-Statesman an authority on Bush's dealings with Saddam Hussein? -- but I really forsee this concept taking off when they fix a few things. Some suggestions:

  • Don't link to duplicate wire stories. Right now, many of the story links are the same Associated Press or Reuters stories, repurposed on different sites.
  • Create a "hierarchy of trust" that defines which news organizations are more accountable than others -- or let users specify their own hierarchies. This would eliminate the American-Statesman oddity mentioned above.
  • In the long term, dynamically generate a full-blown news article for each story that has more than one source, by combining information from several news sources to create one "definitive" piece.

I suspect many in the journalism world will be quick to criticize. Some -- likely those with backgrounds in editing -- will decry the lack of human judgment in the site's story selection. Others will hasten to remove their sites from the search engine's indexing, claiming Google's deep links will cut the number of users who access their sites' home pages directly -- with Google, in effect, removing the middleman (news sites) between users and the individual news stories they want to read.

But deep inside, operators of news Web sites will panic. If you listen closely, you just might start hearing the screams.

UPDATE, 3:30 p.m. EST: It appears the site is only being released to some users, because others have told me they still see the old page (as cached here). Upon further inspection, I'm only able to access the new Google News site in Mozilla on my PC -- but NOT in IE, Netscape 6, or Opera on the very same PC, which still bring up the old site. Similarly, I'm not able to access the new site in IE, Netscape 6, Opera or Mozilla on my Mac. Here's a screenshot of the new site, in case your browser still brings up the old one.

UPDATE, 6:30 p.m. EST: Aaron Swartz of the Google Weblog tells me one of his users reported the new Google News showed up only in IE/Mac. And on my home PC, there's no sign of the new site -- no matter which browser I use. Clearly this is still in the testing phase, so only a few, seemingly random, users will see the new page.

Comments

Posted by Stephen Downes on September 19, 2002, at 7:22 p.m.:

Since when is the Austin (Texas) American-Statesman NOT an authority on Bush's dealings with Saddam Hussein? They probably understand Bush better than, say, a NY or DC paper. And they work from essentially the same sources.

Posted by Adrian Holovaty on September 19, 2002, at 7:35 p.m.:

Point taken; that was a bad example. But I still sense something suspicious here. I checked Google's entertainment news page a few seconds ago, and The American-Statesman showed up as a news source on Damien Hirst's Sept. 11 comment and Iman, the model -- both topics that I'm pretty sure have nothing to do with Austin, Texas. This leads me to believe that Google has a preference for the A-S for some sort of technological reasons; maybe the way their pages are structured is extra Google-friendly. Very strange.

Posted by anonymous on September 19, 2002, at 8:29 p.m.:

The new site is only in beta on IE for Mac. Weird.

Posted by Adrian Holovaty on September 19, 2002, at 9:05 p.m.:

Yeah, something's fishy. It appears the new site is only accessible to a few users; others still get the old one. In fact, I just checked the site with IE on a PC, and I got the old page. And my Mac only shows the old page, too. So the only browser that brings up the new page for me is Mozilla 1.1 on a PC. Here's a screenshot of what the new site looks like.

Posted by Ben Meadowcroft on September 19, 2002, at 9:14 p.m.:

"In the long term, dynamically generate an full-blown news article for each story that has more than one source, by combining information from several news sources to create one 'definitive' piece."

This is something that I doubt, mainly for copyright reasons, just because it would be google does not mean they can arbitrarily take portions of other peoples work. If anyone started doing this on a large scale (without having an agreement in place etc) you can be sure lawyers would be all over it, especially when you consider potential losses in advertising revenues.

Posted by Craig Saila on September 19, 2002, at 9:33 p.m.:

It's coming Ben. Wired.com has a good story on the Columbia Newsblaster and a similar service called News Troll.

Both combine various versions of a news story and produce a one paragraph summary. They seem to work pretty well.

Posted by anonymous on September 19, 2002, at 11:51 p.m.:

The headlines are "selected entirely by a mathematical algorithm" based on criteria that are defined by humans (editors). It's not brain surgery to figure out that if the same keywords are in the top story position on cnn.com, nyt.com, and washingtonpost.com then you've found an important story.

Don't get me wrong, the site is cool, but it hardly makes editors irrlevant. Google just doesn't have to hire them.

Posted by Tim Chambers on September 20, 2002, at 12:25 a.m.:

Maybe it's just the layout that's special in Mozilla. I see a plain list of stories, but the about page still says the stories are selected algorithmically.

Posted by Ashitaka on September 20, 2002, at 12:27 a.m.:

It isn't working for me.

Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826 MultiZilla/v1.1.22

Sniff... sniff... I think I smell a UserAgent detector?

Posted by Adrian Holovaty on September 20, 2002, at 12:43 a.m.:

I don't think it's a UserAgent detector, because I tried it with identical browsers (at home and work), and it worked in one but not the other. It could possibly be on only one of Google's many servers, or (who knows) employ a randomizer of some sort.

Posted by larcher on September 20, 2002, at 1:01 a.m.:

Maybe they grabbed cookies at random. :)

That is, randomly (or not?) select a handful of the cookies stored on the server, and when a browser comes to the news.google.com, check its google cookie and see if it's supposed to get the new news page or old news page.

Posted by Wondergirl on September 20, 2002, at 1:59 a.m.:

Intense.

Posted by Guy Teague on September 20, 2002, at 9:15 p.m.:

tks for the tip re the new google news site. when i worked for the abilene reporter-news in the 80's we provided support in various ways to the austin-american statesman as they did not have a large staff at that time. we were both owned by harte-hanks communications. the statesman has all the ap feeds that larger papers have and is a well-respected paper in texas, especially for political matters, as you might imagine, considering its offices are catty-corner to the state capital building. when i worked for the arn, the stateman had a polling expert whose poll results were widely distributed throughout the harte-hanks chain. /gt

Posted by bryan on September 23, 2002, at 4:27 a.m.:

I noticed earlier they had links to NYTimes articles (which require registration on their site to actually view the article). It's annoying to click on a headline news link only to be presented with a "sorry you must register first" page. If they are going to index these types of sites they should clearly indicate "Registration Required" next to the link so you don't run into a brick wall...

my 2 cents...

Posted by Adrian on September 23, 2002, at 4:45 a.m.:

Along those lines, it'd be really cool if Google used something like the NYT Random Login Generator to bypass the registration screen entirely! :-)

Posted by Randy on September 23, 2002, at 3:41 p.m.:

AllTheWeb with FAST Search & Transfer has had news search for a long time.

Google is copying allTheWeb.

Copycat.

Posted by Ashitaka on September 23, 2002, at 5:49 p.m.:

Google has had a news search for a long time, too.

This is just a new layout for it.

Silly.

Posted by Randy on September 23, 2002, at 6:12 p.m.:

It's no secret that Google has lost its grip.

Now when they are copying allTheWeb it just gets more obvious.

Google is becoming a copycat.

allTheWeb has the best site, so no wonder google is shaking a bit...

Posted by Adrian on September 23, 2002, at 7:05 p.m.:

Yes, AllTheWeb has had news search for a while, but I don't believe it uses automated editorial judgment to produce a dynamic news page with a clear hierarchy of what stories are more important than others. That's the big innovation here, in my opinion -- the presentation, not just the search engine part of it.

Posted by Ben Meadowcroft on September 24, 2002, at 1:20 a.m.:

News search has been around for ages, for example TotalNews.

www.wired.com/news/business/0,1367,4385,00.html covers the story of Total News and the troubles they had copyright wise with framing other peoples content. But google isn't framing the content right, well the question to be asked then is framing the same as embedding? http://www.gcwf.com/articles/journal/jil_june98_2.html

I am sure Google has lawyers who have thought through the implications of this but I think it will still be intresting to see if anything happens in the near future with regards to this. I guess the guys at www.google-watch.org will have a pop at this in the near future.

Posted by Ben on May 1, 2003, at 4:07 a.m.:

I am most interested in many of the comments above. I am preparing a presentation based on 'algorithm-based' news platforms, i.e. google news. I would be greatful for any links or discussion questions surrounding the issues of ethics, transparency in programming or simply any contemporary news stories dealing with these relatively new news services.

Comments have been turned off for this page.