The first version of this blog used MySQL; then I switched to Tokyo Cabinet. But now I've switched back to PostgreSQL. Here's why.
Why did I switch to TC to begin with?
There weren't any good ORM-type libraries for Clojure at the time (over a year ago). So there was a bit of an impedance mismatch trying to query and work with my data. In the DB I have separate tables for posts, comments, tags, categories. But 90% of the time I want to fetch a post and end up in Clojure with all related tags, comments etc. (A JOIN won't work here without a lot of work to un-mangle the results; you usually need multiple queries.)
With TC I could store anything in the DB, so I just dumped a post into the DB as serialized hash-maps with all the comments, tags, and categories as sub-keys. So querying was easy. (Or was it? More below.)
MySQL was really slow. This is largely because my queries were terrible, as I tried to solve the problem from #1 via brute force. TC on the other hand is fast.
I tried to solve #2 by using a Clojure ref as a cache. But tying the STM to a database's transaction system is as far as I know difficult or impossible right now (per many threads on the mailing list). I had a lot of potential race conditions, which (as far as I know) never bit me, but probably would've eventually. I had to deal with keeping the cache up-to-date as comments were posted and posts were added and deleted and renamed. Remember:
"There are only two hard problems in Computer Science: cache invalidation and naming things." --Phil Karlton
So why did I stop using TC?
I have no idea how to use a key/value store database properly. TC will take anything you dump into it, which is both a strength and a weakness.
There's a lot of crap you have to do by hand that a proper database does for you. Consider checking for null values, for example; I ended up with a lot of
nils in my data because my validations weren't 100% foolproof, or because I imported data via code that didn't run the validations and I never noticed.. Or enforcing uniqueness of values; I had tag objects in the database with the same key but different values (due to capitalization differences), which screwed up a lot of stuff.
On the other hand, there's a lot of information about how to use RDBMS properly, and I have a lot of experience with it already. Constraints are easy to set up. Columns have types, which is nice. (Strange that I gravitate toward statically-type databases while I gravitate toward dynamically-typed programming languages.)
I have to compile and install Tokyo Cabinet by hand on my Linux distro. It's probably not worth distro maintainers to maintain a package that so few people use. MySQL and PostgreSQL have lots of people working on keeping them running OK on most Linux distros.
Some kinds of queries were still awkward in TC. "Give me post X" was great: I'd also get all the tags, categories etc. for free. But then how do you query to get all tags across all posts? Or all comments? Fetch all the posts and iterate over them, collecting their tags, then uniquify the resulting list? Not so pretty, and not so fast. So I was back to caching again, which still gave me nightmares about race conditions and dirty data.
So now why am I using an RDBMS again?
An RDBMS is exactly what I really need, if I could just query the thing concisely and get it to run fast. Thankfully there are some ORM-like libraries for Clojure in the works nowadays, already usable for a hobby project like this blog. There are clj-record, Carte, ClojureQL, and my own Oyako, and possibly others in the works.
For my tiny blog's database, Oyako gives me slightly slower performance than TC, but along the same order of magnitude, which is good enough.
Via Oyako I can (fairly concisely) fetch posts and get the associated tags, comments etc. But I can also easily fetch all tags, or all comments, since they're in their own tables. The "relational" part of RDBMS does come in handy sometimes.
I switched to TC to begin with because I was using SQL wrong, and it was too slow and clumsy. Once I figured out how to use SQL correctly, it was a no-brainer to go back.