<?xml version="1.0"?><rss version="0.91">
	<channel>
		<title>Holovaty.com</title>
		<link>http://www.holovaty.com/</link>
		<description>The latest Holovaty.com blog entries matching the search term: ;. Holovaty.com is a weblog discussing technical aspects of news Web sites.</description>
		<language>en-us</language>
		<item>
			<title>Request: Headless HTML rendering engine?</title>
			<description>&lt;!--pythonfeed--&gt;&lt;p&gt;Warning: Seriously geeky request ahead!&lt;/p&gt;

&lt;p&gt;I'm looking for a way to render arbitrary Web pages -- including CSS and JavaScript -- and access the resulting DOM tree programatically, i.e., in an automated/headless fashion. I want to be able to ask the following questions of the resulting DOM tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For a given element, what font family, size, and color is the text?&lt;/li&gt;
&lt;li&gt;How tall and wide (in pixels) is a given &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt;, etc.?&lt;/li&gt;
&lt;li&gt;What are the x/y coordinates of a given element (from the upper-left corner of the page, or lower-left, or wherever)?&lt;/li&gt;
&lt;li&gt;For a given element, what is its text content?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rendering must be state-of-the-art, handling advanced CSS that Firefox, Safari and IE handle. It should work on Linux. Bonus points if there's a Python API for this magical DOM tree.&lt;/p&gt;

&lt;p&gt;This is all stuff that standard in-page JavaScript could accomplish, but the catch with me is that I need to be able to do it in a completely automated way, on arbitrary pages, on a headless server.&lt;/p&gt;

&lt;p&gt;I know &lt;a href=&quot;http://en.wikipedia.org/wiki/Gecko_(layout_engine)&quot;&gt;Gecko&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/WebKit&quot;&gt;Webkit&lt;/a&gt; provide this, but I'm not sure where to start with them. The docs and articles I've read seem to be focused more on embedding the full browser window in a GUI application than embedding the rendering engine itself and manipulating the resulting pages.&lt;/p&gt;

&lt;p&gt;Help! If you have any clues, I'd be grateful if you left a comment or &lt;a href=&quot;http://holovaty.com/contact/&quot;&gt;got in touch with me&lt;/a&gt;.&lt;/p&gt;</description>
			<link>http://www.holovaty.com/blog/archive/2008/05/02/0136</link>
		</item>
		<item>
			<title>In memory of chicagocrime.org</title>
			<description>&lt;!--pythonfeed--&gt;&lt;!--djangofeed--&gt;
&lt;p&gt;It's with mixed feelings that I announce the end of one of my projects, &lt;a href=&quot;http://www.chicagocrime.org/&quot;&gt;chicagocrime.org&lt;/a&gt;. This site has been serving Chicago residents since May 2005. I hope you'll indulge me in a brief retrospective.&lt;/p&gt;

&lt;img src=&quot;http://holovaty.com/images/2008-01-30_chicagocrime.png&quot;  align=&quot;right&quot; /&gt;
&lt;p&gt;Chicagocrime.org was one of the original map mashups, combining crime data from the Chicago Police Department with Google Maps. It offered a page and RSS feed for every city block in Chicago and a multitude of ways to browse crime data &amp;#151; by type, by location type (e.g., sidewalk or apartment), by ZIP code, by street/address, by date, and even by an arbitrary route. The New York Times Magazine &lt;a href=&quot;http://www.nytimes.com/2005/12/11/magazine/11ideas1-13.html&quot;&gt;featured it&lt;/a&gt; in its 2005 &quot;Year in Ideas&quot; issue&lt;/a&gt;, and it won the &lt;a href=&quot;http://www.j-lab.org/batten05winners.shtml&quot;&gt;2005 Batten Award for Innovations in Journalism&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's been a fun ride. When I launched the site, Google Maps hadn't yet released the mapping API that's so common &amp;#151; even pass&eacute;? &amp;#151; today. I can't help but feel like an old-timer: &quot;Back in my day, we had to &lt;em&gt;reverse-engineer&lt;/em&gt; Google's obfuscated JavaScript just to get maps embedded on our own sites!&quot; Now it seems like every other Web site finds an excuse to use those familiar, bubbly, yellow-white-blue-pastel map tiles.&lt;/p&gt;

&lt;p&gt;Chicagocrime.org wasn't the &lt;em&gt;first&lt;/em&gt; Google Maps mashup. That honor belongs to Paul Rademacher's &lt;a href=&quot;http://www.housingmaps.com/&quot;&gt;HousingMaps&lt;/a&gt;, which, at that time, was modestly titled &quot;Craigslist + Google Maps.&quot; The straightforwardness of that original title illustrates the excitement of it all: just the mere fact that somebody had mixed Craigslist data with Google's maps was new and remarkable. Kudos to Paul for keeping the site up and running for all these years. Not only was it a groundbreaking technical achievement; it remains genuinely useful.&lt;/p&gt;

&lt;p&gt;A lot of good has come out of chicagocrime.org. At the local level, countless Chicago residents have contacted me to express their thanks for the public service. Community groups have brought print-outs of the site to their police-beat meetings, and passionate citizens have taken the site's reports to their aldermen to point out troublesome intersections where the city might consider installing brighter street lights.&lt;/p&gt;

&lt;p&gt;It's done some good on a larger scale, too. The site helped influence Google to &lt;a href=&quot;http://googleblog.blogspot.com/2005/06/world-is-your-javascript-enabled_29.html&quot;&gt;open up its mapping API for all to use&lt;/a&gt;. It inspired at least a dozen &quot;spin-off&quot; sites in other cities, from &lt;a href=&quot;http://berkeleyca.crimelog.org/&quot;&gt;Berkeley&lt;/a&gt; to &lt;a href=&quot;http://www.newhavencrimelog.org/&quot;&gt;New Haven&lt;/a&gt; to &lt;a href=&quot;http://houstoncrimemaps.com/&quot;&gt;Houston&lt;/a&gt; &amp;#151; most of whose designs were very similar to &lt;a href=&quot;http://www.wilsonminer.com/&quot;&gt;Wilson&lt;/a&gt;'s beautiful chicagocrime.org design. And the site's slashdotting forced me to write parts of &lt;a href=&quot;http://www.djangoproject.com/documentation/cache/&quot;&gt;Django's cache system&lt;/a&gt;. (Django itself was released open-source two months later; chicagocrime.org was the first public Django-powered site not run by the &lt;a href=&quot;http://www.ljworld.com/&quot;&gt;Lawrence Journal-World&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;A few weeks ago, I received an e-mail from the folks at &lt;a href=&quot;http://aws.amazon.com/ec2&quot;&gt;Amazon EC2&lt;/a&gt;, where the crime site is hosted, saying the server instance that houses the site will be terminated on February 15 &amp;#151; and that it will no longer be accessible after January 31. This is happening because I was an early user of EC2 and their network &lt;a href=&quot;http://developer.amazonwebservices.com/connect/ann.jspa?annID=273&quot;&gt;has gone through some changes&lt;/a&gt; that require all customers of a certain tenure to rebuild their servers. Instead of going through the hassle of upgrading my server instance, I'll let the Amazon staff shut it down on Thursday. All pages will redirect to the appropriate pages on my newest project, &lt;a href=&quot;http://www.everyblock.com/&quot;&gt;EveryBlock&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In many ways, EveryBlock is the next generation of chicagocrime.org. I've often described it to people as &quot;chicagocrime.org on steroids &amp;#151; more than just crime, and more than just Chicago.&quot; It's brought to you by the same people (Wilson and me from chicagocrime.org, plus &lt;a href=&quot;http://www.everyblock.com/about/#team&quot;&gt;Paul and Dan&lt;/a&gt;, who've worked on &lt;a href=&quot;http://www.civicfootprint.org/&quot;&gt;similar&lt;/a&gt; &lt;a href=&quot;http://www.chicagoworksforyou.com/&quot;&gt;projects&lt;/a&gt;), and it has the same philosophies. As we developed EveryBlock, we kept chicagocrime.org firmly in our minds &amp;#151; this new thing we were making had to be a &lt;em&gt;superset&lt;/em&gt;, an expansion, a significant step forward. So there's almost nothing you could do on the old chicagocrime.org that you can't do on EveryBlock. And, unlike chicagocrime.org, which was always a side project, EveryBlock has a team of four people improving it full-time, meaning we have the resources to add features, such as e-mail alerts (&lt;a href=&quot;http://blog.everyblock.com/2008/jan/29/emailalerts/&quot;&gt;just added yesterday&lt;/a&gt;), that chicagocrime.org never had. We hope EveryBlock is a worthy successor.&lt;/p&gt;

&lt;p&gt;This story has a fitting epilogue. In just a few weeks after chicagocrime.org goes offline, the site will be featured in an exhibition at New York's &lt;a href=&quot;http://www.moma.org/&quot;&gt;Museum of Modern Art&lt;/a&gt;, called &lt;a href=&quot;http://www.moma.org/exhibitions/exhibitions.php?id=5632&quot;&gt;Design and the Elastic Mind&lt;/a&gt;. Chicagocrime.org will have ended its life and become a museum piece.&lt;/p&gt;</description>
			<link>http://www.holovaty.com/blog/archive/2008/01/31/0102</link>
		</item>
		<item>
			<title>Django Book has shipped -- and, thoughts on the next book</title>
			<description>&lt;!-- pythonfeed djangofeed --&gt;

&lt;p&gt;It's here! At long last, the print copy of the Django Book has shipped. I received my author copies late last week and am still poking at them to make sure that, yes, a tangible book with my name on the cover has actually been printed, on real paper, by a &lt;a href=&quot;http://www.apress.com/&quot;&gt;real publisher&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Early drafts of the book have been available free at &lt;a href=&quot;http://www.djangobook.com/&quot;&gt;djangobook.com&lt;/a&gt; for more than a year, and co-author Jacob and I are grateful to all of the readers who submitted corrections and suggestions. Jacob is going to update the site soon with the final text of the book (which will be available free under an open-source license), and we plan to revise the online text over time with corrections and additions. There's something nice about having a paper copy, of course.&lt;/p&gt;

&lt;p&gt;The book is &lt;a href=&quot;http://www.amazon.com/gp/product/1590597257/&quot;&gt;available on Amazon&lt;/a&gt;, and I'm told the big brick-and-mortar bookstores should begin stocking it soon. Looks like it's gotten some buzz already, as it was the &lt;a href=&quot;http://www.amazon.com/gp/bestsellers/books/4016/ref=pd_zg_hrsr_b_1_5_last&quot;&gt;number one best-selling &quot;Software Development&quot; book&lt;/a&gt; and &lt;a href=&quot;http://www.amazon.com/gp/bestsellers/books/69766/ref=pd_zg_hrsr_b_3_4_last&quot;&gt;number four best-selling &quot;Internet&quot; book&lt;/a&gt;. Not bad at all! What I'm most proud of is not the fact that the &lt;em&gt;book itself&lt;/em&gt; is doing well, but the larger fact that &lt;em&gt;demand for information about the framework&lt;/em&gt; is high.&lt;/p&gt;

&lt;p&gt;Now that the Django Book is finally in the can, I'm mulling the idea of writing another book -- this time, a book about online journalism. In the past two years, I've been to (way too) many journalism-related events and conferences trying to spread the good word about &quot;journalism via computer programming,&quot; and I've detected a strong, I daresay &lt;em&gt;furious&lt;/em&gt;, demand, from journalists at all levels in the org chart, for information about this new form of journalism. Higher-ups want to know &lt;em&gt;why&lt;/em&gt; they should employ programmers; middle managers want to know how to find them and how to treat them; and working journalists want to learn these skills and strategies. The problem is that I can't point them anywhere for in-depth information. This book would attempt to solve that.&lt;/p&gt;

&lt;p&gt;I want to take a shot at writing a manual, a manifesto, a practical guidebook to this emerging discipline of database-driven Web journalism. It would be a combination of high-level strategy and low-level technique, probably split cleanly into two parts (one for the suits, one for the non-suits).&lt;/p&gt;

&lt;p&gt;That's about all the thought I've given to this idea. What do you think? If you're a journalist (or even not), is this something you'd be interested in?&lt;/p&gt;</description>
			<link>http://www.holovaty.com/blog/archive/2007/12/12/1311</link>
		</item>
		<item>
			<title>Lead video on YouTube</title>
			<description>&lt;p&gt;Wow, somehow my &lt;a href=&quot;http://www.youtube.com/watch?v=Z1CZ7yCgkOM&quot;&gt;MacGyver theme song video&lt;/a&gt; was selected to be the top featured video on &lt;a href=&quot;http://www.youtube.com/&quot;&gt;YouTube's home page&lt;/a&gt; right now. A screenshot, for posterity:&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;http://flickr.com/photos/hirefrank/478758900/&quot;&gt;&lt;img src=&quot;http://farm1.static.flickr.com/179/478758900_bcd9c1883d.jpg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I'm not sure what amuses me more -- the fact that the YouTube editors decided to feature this, or the immediate, endless stream of gratuitous &quot;Please watch my video&quot; comments that the video is getting (and I'm deleting).&lt;/p&gt;</description>
			<link>http://www.holovaty.com/blog/archive/2007/04/30/1521</link>
		</item>
		<item>
			<title>Work with me at washingtonpost.com</title>
			<description>&lt;!-- pythonfeed djangofeed --&gt;

&lt;p&gt;Attention, Web developers! We're hiring somebody to work with me at &lt;a href=&quot;http://www.washingtonpost.com/&quot;&gt;washingtonpost.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We're looking for somebody who is really good at making dynamic Web applications, on deadline. You're a great candidate if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have significant experience building database-driven Web sites.&lt;/li&gt;
&lt;li&gt;You pick up new technologies very quickly, enjoy learning new things and enjoy opportunities to apply your new knowledge.&lt;/li&gt;
&lt;li&gt;You're great at cleaning digital information -- parsing data feeds, screen scraping, etc.&lt;/li&gt;
&lt;li&gt;You enjoy automating things to save people time.&lt;/li&gt;
&lt;li&gt;You have experience using &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;Django&lt;/a&gt;. Ruby on Rails experience is fine, too, if you're willing to unlearn all that black magic. ;-)&lt;/li&gt;
&lt;li&gt;You have a solid understanding of relational databases and experience with open-source databases, particularly PostgreSQL. (MySQL experience is fine, too.)&lt;/li&gt;
&lt;li&gt;You are experienced using (X)HTML, CSS, JavaScript, Ajax...yadda yadda yadda.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get bonus points if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You've contributed to open-source projects.&lt;/li&gt;
&lt;li&gt;You've launched a side project (or two) on the Web.&lt;/li&gt;
&lt;li&gt;You have a weblog.&lt;/li&gt;
&lt;li&gt;You have journalism experience.&lt;/li&gt;
&lt;li&gt;You are passionate about improving the world through information.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In no particular order, here are some examples of the types of sites you'll be building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/fec/specials/mccain/&quot;&gt;John McCain's campaign contributions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/fallen/&quot;&gt;Faces of the Fallen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/politicalads/&quot;&gt;Mixed Messages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/maptimelines/1/&quot;&gt;President Bush Latin America trip map&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/2007/clinton-speeches/&quot;&gt;Clinton's Golden Voice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/congress/&quot;&gt;U.S. Congress votes database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://projects.washingtonpost.com/videogames/&quot;&gt;Video Game Reviews database&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a mix of short-deadline projects, long-term projects and general site improvements. There's enough variety to keep it interesting. In most cases, you'll be expected to build a site in a matter of hours or days, not weeks or months. It's an exciting, fast-paced environment.&lt;/p&gt;

&lt;p&gt;Why should you take this job?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fun and freedom&lt;/strong&gt; -- Building Web apps with Django is fun, and you'll have significant say in what your apps should do and how they should work. You won't be a cog in the wheel; in many cases, the development team will be &lt;em&gt;you&lt;/em&gt;, or &lt;em&gt;you and I&lt;/em&gt;. No requirements documents, if I can help it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Visibility&lt;/strong&gt; -- Your work will be seen by hundreds of thousands of people -- maybe more -- around the world.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cool tools&lt;/strong&gt; -- You get to use open-source technologies such as Python, Django and PostgreSQL, and get paid for it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Great people&lt;/strong&gt; -- Since Day One, I've been continually impressed with the talent and dedication of Washington Post employees. This is the cream of the crop.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Great company&lt;/strong&gt; -- C'mon, it's the Washington Post, one of the most highly reputable news organizations in the world. The Post is, hands down, the most innovative large newspaper company around. You won't find our killer combination -- dedication to quality journalism and willingness to innovate -- at any other company of our size in this industry.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Journalism experience is not required. A formal computer-science background is not required. I'm much more interested in seeing your work than reading bullets on a resume.&lt;/p&gt;

&lt;p&gt;Also, you don't necessarily have to be a designer. Our design team makes things look good.&lt;/p&gt;

&lt;p&gt;The job is located in the Washington, D.C., area -- technically, Arlington, Virginia. The washingtonpost.com office is near the Court House Metro stop on the Orange Line.&lt;/p&gt;

&lt;p&gt;If you think you're a good fit, &lt;a href=&quot;http://holovaty.com/contact/&quot;&gt;contact me&lt;/a&gt;. Send some links to work you've done, along with a resume.&lt;/p&gt;</description>
			<link>http://www.holovaty.com/blog/archive/2007/03/08/2108</link>
		</item>
	</channel>
</rss>
