Wanted: Browser filter features

Written by Adrian Holovaty on October 26, 2003

Mozilla Firebird is superb, but it hasn't stopped me from thinking about a few power-user features I'd like to see in my Web browser:

URL filters. I want to be able to write a small script handler that is invoked every time my browser loads a new page. The script would receive the URL of the page it was about to request and could run it through various text parsing routines (e.g. searches/replacements) before the page was requested. It would function, essentially, as a client-side mod_rewrite.

This could serve as a typo corrector (e.g. rewriting "google.co" to "google.com"), at the very least. For paranoid types, it could also be a NSFW-protector, which could rewrite a URL with a naughty word in it to a verification Web page on the local machine -- just in case you stumble upon something inappropriate at work. The dishonest folks in the crowd could whip up the one-line regular expression that converts a subscription-required Wall Street Journal article into a free one. I'm sure there are plenty of other applications.

HTML filters. Same as above, but the script would be able to alter the entirety of the document's HTML -- after it was downloaded but before it was rendered. The applications for this are vast. From a design/accessibility perspective, it'd be an exponentially more flexible type of user stylesheet, capable of altering layouts completely. It could also be an advanced adult-check filter that looks out for inappropriate words and serves up a warning/confirmation page. But most importantly, it'd introduce the ability to filter plain text to users' liking -- for example, replacing text in all caps with the lowercase equivalent, or automatically linking-up URLs and e-mail addresses that aren't already linked.

Clearly, there'd have to be a "revert to original HTML" button. But with such an intense level of customization, it'd be hard to go back.

Comments

Posted by Wolfgang Flamme on October 26, 2003, at 12:07 p.m.:

How about Privoxy ( http://www.privoxy.org/ ) ?

Posted by Nathan Ashby-Kuhlman on October 26, 2003, at 2:40 p.m.:

Those are some slick ideas. The URL filter ought to be easy to implement, but I would think an HTML filter might be significantly harder because Web pages load incrementally. If you're trying to detect "inappropriate" words of known maximum length (n), say, you could buffer the last (n-1) characters of each chunk before piping them to the rendering engine. It becomes a little more complicated if you're trying to match regular expressions of unknown length, like URLs and e-mail addresses.

But the real problem I see is if you want to move beyond detecting words and building links -- things that can be done with text processing of the HTML source -- and into your more grand vision of "altering layouts completely," you are going to need not just the power of regular expressions but the power of the DOM. That's where I see the problem -- I don't think it'd be very easy to traverse and modify an incomplete document tree. I for one wouldn't end up using any HTML filters that prevented incremental loading of pages.

Posted by Andyed on October 26, 2003, at 5:33 p.m.:

As Nathan points out, DOM filtering capabilities will make filters much easier to write. Mozilla generates a DOMContentLoaded event that's ideal for timing access to the HTML source, often prior to image requests. There is still the chance for some transformations to cause visual alterations.

There is/was a project at mozdev to do this: sardine.mozdev.org. There was never any code and when I just checked the homepage didn't load, so not sure what's up there. Mozdev is getting slammed these days with traffic. AdBlock@mozdev takes a similar but more limited approach, peek at adblock.js for an idea of what's needed.

If anyone's seriously interested in writing this, you could be up and running in under 5 hours with the right pointers to existing mozdev/mozilla code. Mail or irc://mozilla.org#mozdev

Posted by Arkaid_81 on October 26, 2003, at 10:42 p.m.:

Sounds like maybe you should look into some client-side proxies. Rather than having the browser do the filtering, let it stick to just rendering web pages, and have a seperate tool do the filtering. Just have the browser connecting through the proxy.

That way, the Mozilla team (or Opera team, or IE team) can stick to just writing browsers that do one thing well, rather than doing many mediocre things.

Posted by Simon Willison on October 27, 2003, at 8:13 a.m.:

My dream browser feature would be a persistent version of the Edit Style bookmarklet, preferably in some kind of side panel. There are plenty of sites I visit every day that have something about their design that doesn't quite gel with me, usually to do with the width of the page or the size and typeface used for their text. I can (and often do) fix these sites uses the Edit Styles bookmarklet, but I lose the changes I have made between visits. If I could add additional style rules to those sites in a persistent way via a sidepanel (or web panels as they're now called in Firebird) I'd be happy as a pig in mud.

Adrian: your HTML filtering thing would be best served using Javascript - how about an addition to my CSS stylesheet idea where you can define a bunch of javascript for an individual page (or all pages) that uses the DOM to manipulate the page to your liking.

Combine those two features in to a sidebar/web panel and you'd have an absolutely killer extension for Firebird. If only I knew how to do it!

Posted by Wilson on October 27, 2003, at 11:37 p.m.:

It would be like SmartTags for good, not evil! Kind of.

Posted by markku on October 28, 2003, at 6:58 a.m.:

The closest thing to url filters is firebird's quicksearch keywords, though it provides basic functionality only.

Posted by Micah on October 31, 2003, at 1:17 a.m.:

Lynx has simple URL filtering, through RULEs in lynx.cfg. For example, this line... well, you can guess what it does:

RULE:Redirect http://www.nytimes.com/*.html http://nytimes.com/*.html?pagewanted=print

But still, your post reminds me of things I've also wished for many times.

Posted by brian on November 1, 2003, at 12:56 a.m.:

one warning with the HTML filtering. Many people obfuscate their email address using Javascript or other methods. What you are describing is basically a post-processor which could be used to harvest email address that are unlinked and no longer obfuscated.

Posted by jacob on November 4, 2003, at 7:08 a.m.:

I would love to see a feature like this in my web browser. Some kind of user-specified Javascript file, much like a user stylesheet, would be enough for me. Off the top of my head, there's only one thing I'd want mine to do, but it would improve my browsing experience tremendously: I want the script to walk the links in the document and set every target attribute to "_self." Right now I use a bookmarklet for this purpose, but that's not terribly convenient.

Posted by sil on November 5, 2003, at 10:10 a.m.:

Dream feature: user Javascript scripts, just like user stylesheets. I think about how cool an idea this would be *all* the time, I just don't ever get around to thinking how to implement it. You could do it with a proxy (just have all pages get a <script type="text/javascript" src="file:///home/aquarius/userScript.js"></script> added to the HEAD, or similar), and I'm pretty sure you could do it with a Mozilla/Firebird extension, too, but I never get the time to research this properly...

Posted by Aaron on January 23, 2005, at 2:40 a.m.:

Whoa, wierd that I never saw this thread before. I wrote greasemonkey for exactly this purpose and it was inspired by Adrian's allmusic extension.

Comments have been turned off for this page.