This is a read-only archive!

Blog replacement fun

So I'm still thinking how I want to replace this blog. I still plan to write something from scratch, for fun's sake.

One thing I'm sure of is that I don't want to write HTML by hand, at all, under any circumstances. HTML and XML are not human-writable or human-readable languages. They rely too much on things human beings suck at, namely consistency and repetition. Forget a closing tag? Typo a tag name? Now your document is malformed. Undefined behavior, at best. It's too verbose, it has too much needless punctuation.

It's also too hard to manipulate it or do anything with it after you write it. There's XPath, which is itself a mess to work with, manipulating huge strings of crap via slightly smaller strings with its own funky syntax quirks. I've never found an XML-parsing library with an interface that I liked, and I've had to use them extensively in Perl, Ruby and Python.

So first thing, I'm going to convert all posts and comments into Markdown and use that for future posting and commenting. I like Markdown. It's hard to get wrong typing it by hand and it doesn't get in your way. It also doesn't tie you to one implementation; you can turn Markdown into HTML client-side via Javascript or easily parse it server-side. Or you can display it as-is and it's still readable. It's a very nice idea.

Second thing, I plan to use my programming language to write the HTML for the skeleton of my site for me. Opening, closing, and properly nesting tags is something a machine should do for me. Making sure my tags belong to a well-defined list of allowed valid HTML tags is something a machine should check for me.

More than likely I'm going to write this in Clojure, because s-expressions (and better yet, a combination of Clojure literal lists, maps and arrays) makes writing HTML very easy and foolproof. I've also written an HTML-producing DSL in Ruby in the past though; it's not hard to do in any language.

Another thing I'm sure of is that I need a good anti-spam system but that I have no idea what that system should be. Askimet in Wordpress has caught 50,000(!) spam comments since I started my blog. Some spam still sneaks through on me now and then. I've never used a CAPTCHA and don't plan to; they just don't work. I'm probably going to come up with some funky custom anti-spam measures (which are invisible to users) and rely on the fact that no one is going to take the time to break it. My site isn't a huge or popular target, so here's hoping.

A third complication I'm dreading is how to do this without breaking every link anyone ever made to my site. Wordpress's permalink system is OK, but I'd like to change it. Problem is I can't change it; every link to my site from another site is a dependency. So I might have to mod_rewrite redirect the old URLs to new ones, or use two permalink schemes simultaneously. I don't know.

Fun times ahead. How to design a blog is a problem lots of people have solved but no one has really solved perfectly, or else there wouldn't be so many frameworks and packages to do it. The good thing about writing your own from scratch is that it'll work exactly how you want. Wordpress is close but not close enough.

January 15, 2009 @ 1:55 PM PST
Cateogory: Programming


circuit breaker
Quoth circuit breaker on January 16, 2009 @ 12:04 PM PST

neat, thanks for pointing out Markdown. I had heard of it but didn't realize it's simplicity. I too have been wanting to roll my own (just look at the previous security gaffes a lot of blog engines have had) engine code, this is a good excuse to write a simple parser.

Over-engineering is an artform.

Quoth Job on February 26, 2009 @ 8:28 PM PST

Can't you keep using Akismet? It obviously works pretty well?

Quoth Brian on February 27, 2009 @ 2:29 AM PST

Askimet misses more spam than I'd like. Sometimes a couple of spams a day; really obvious ones too. I may still use it, but Wordpress sort of lets a mess accumulate, then cleans it up with Askimet afterwards. I think I need something else to take the brunt out of most of the spam before it gets to that point.