My blog is still working, in spite of my best efforts to crash it. So that's good. But lately I've been thinking that an SQL database is a lot of overkill just to run a little blog like this.
My blog only has around 450 posts total (over the course of many years), and about an equal number of user comments (thanks to all commenters!). Why do I need a full-blown database for that? All of my posts plus comments plus all meta-data is only 2 MB as a flat text file, 700k gzipped.
By far the most complicated part of my blog engine is the part that stuffs data into the database and gets it back out again in a sane manner (translating Clojure data to SQL values, and back again; splitting up my Clojure data structures into rows for different tables, and then re-combining values joined from multiple tables into one data structure). Eliminating that mess would be nice.
Inevitably I ended up with some logic in the database too: enforcing uniqueness of primary keys, marking some fields as NOT NULL, giving default values and so on. But a lot of other logic was in my Clojure code, e.g. higher-level semantic checking, and some things I wanted to set as default values were impossible to implement in SQL.
Wouldn't it be nice for all the logic to be in Clojure? And the data store on disk to be a simple dump of a Clojure data structure? I can (and did) write a few macros to give me SQL-like field declaration and data validation, for uniqueness of IDs and data types etc. For my limited needs it works OK.
The next question is what format to use for dumping to disk. Happily Clojure is Lisp, so dumping it as a huge s-exp via pr-str works fine, and reading it back in later via read-string is trivial.
Some Java data types can't be printed readably by default, for example java.util.Dates, which print like this:
#<Date Wed May 20 22:39:00 PDT 2009>
The #<> reader macro deliberately throws an error if you try to read that back in, because the reader isn't smart enough to craft Date objects from strings by default. But Clojure is extensible; you can specify a readable-print method for any data type like this:
(defmethod clojure.core/print-method java.util.Date [o w]
(.write w (str "#=" `(java.util.Date. ~(.getTime o)))))
Now dates print as
#=(java.util.Date. 1242884415044)
and if you try to read that via read-string, it'll create a Date object like you'd expect.
user> (def x (read-string "#=(java.util.Date. 1242884415044)"))
#'user/x
user> (class x)
java.util.Date
user> (str x)
"Wed May 20 22:40:15 PDT 2009"
Storing data in a plain file has another benefit of letting me grep my data from a command line, or even edit the data in a text editor and re-load it into the blog (God help me if that's ever necessary).
Having multiple threads banging on a single file on disk is a horrible idea, but Clojure refs and agents and transactions handle that easily. But I do have to work out how not to lose all my data in case the server crashes in the middle of a file update. (I've lost data (in a recoverable way) due to a server crash in the middle of a MySQL update too, so this is a problem for everyone.) Perhaps I'll keep a running history of my data, each update being a new timestamped file, so old files can't possibly be corrupted. Or use the old write-to-tmp-file-and-rename-to-real-file routine. Or heck, I could keep my data in Git and use Git commands from Clojure. It'd be nice to have a history of edits.
If this idea works out I'll upload code for everything to github, as usual.