(It has been way too long since I launched a side project. Time to get back into the game. This blog post is adapted from a five-minute talk I gave today at Ignite Chicago.)
Everybody knows YouTube comments are atrocious. This is referenced all over the place, from a bunch of blogs to xkcd (twice!). One guy even took the time to make YouTube Comment Snob, a brilliant browser extension that hides poor quality comments.
If only we could extract some value from all that crap.
The YouTube community is so huge at this point that "meta" comments are very common. One particular "meta" comment stems from the fact that YouTube changed its rating system in early 2010. It used to use a five-star rating system but moved to a simpler "thumbs up" or "thumbs down" model, citing the fact that most people gave either one- or five-star ratings anyway.
In this new rating system, each video displays how many people have liked -- and disliked -- it. In a classic example of interfaces influencing behavior, this has encouraged users to make insulting remarks about the dislikers. I'm sure you've seen the type: it's comments like "439 people own a zune" on the 2001 Steve Jobs iPod presentation. (439 people had disliked the video at the time the comment was posted.)
After groaning about these types of comments for a while, I realized they're a sort of semistructured information -- it's always a number, followed by some sort of insult. And, besides, some of them are actually kind of witty. Hence my new project: The YouTube Insult Generator.
This is a basically a "search engine for insults." Type in a search term, and it'll give you insults you can use against a person who doesn't like that term.
For example, enter "the godfather," and it'll give you "You sleep with the fishes," "You sleeps with horsehead in bed" and "You will get an offer you can't refuse." Enter "alfred hitchcock" and it'll say "You had your eyes plucked out by crows" and "You have Vertigo." Enter "mario brothers" and it'll say "You aren't Super enough for Mario," "You can't beat world 1-1" and "You are bowser." You get the idea.
It finds stuff only about 50% of the time, but it works surprisingly well when it does work. Try general terms ("car") and pop culture ("michael jordan", "i love lucy"). Each insult includes a link to its source YouTube video.
How does this work? It uses the YouTube API to search for the top 50 most relevant videos for your search term. For each of those videos, it grabs the latest 50 comments. Then it looks through all that for comments starting with a number followed by a word such as "people," "youtubers" or "nincompoops." (View source for the full list, a regular expression that would make Alex Gaynor proud.) Finally, it just replaces the number and the word "people" with "You."
It doesn't do anything fancy like caching or giving users a way to mark searches as particularly funny, but it's not bad for a quick hack. Enjoy, and please use the insults wisely.
UPDATE: Wired wrote about it. Also, a few people have been confused about what the point of this is. It's not intended to be literal -- I mean, I don't expect people to use this, at face value, to create insults. The larger point is that it's a demonstration of finding structured data in unexpected places. I think we should experiment more with this sort of "poor man's data mining" on things like YouTube comments. Thanks for checking it out!