Deploying Clojure websites

* This page is related to "Clojure and Compojure to the rescue, again".

On my server I'm running one Java process, which handles four of my websites on four different domains. These are all running on Clojure + Compojure. Some people asked for details of how to do this, so here's a rough outline. For the sake of brevity I'm only going to talk about two domains here, though it scales up to however many you want pretty easily.

This is surely not the only way to do this, and probably not the best way, but it's what I've arrived at after a year of goofing off.

Summary: Emacs + SLIME + Clojure running in GNU Screen; all requests are handled by Apache and mod_proxy sends them to the appropriate Jetty instance / servlet.

Clojure and Compojure to the rescue, again

I haven't posted here much recently because I've been hacking on another recently-sort-of-completed website. One of my favorite hobbies is old 8-bit video games. The first thing I ever programmed was a website about Final Fantasy for the old NES, and I've fiddled with it for the past 10 years or so.

A while back I decided to rewrite the whole thing using Clojure + Compojure with data in mysql. This went really well. I know lines of code isn't that great a metric, but it can give a rough estimate: this whole website is done in 3,400 lines of Clojure, which includes all of the HTML "templates" and the DB layer I had to write. And it's turtles Clojure all the way down. The only thing not written in Clojure are a couple bits of Javascript here and there and the stylesheet.

I suspect the target audience of this blog and the target audience of that website don't overlap that much, but I figured someone might be interested in some of the detail of how it's implemented. A few things I learned...

Blog and CRUD

I updated my blog source code on github. I also split my CRUD library out into its own clj-crud repo. It is cruddy, so the name is apt.

This code still isn't polished enough for someone to drop it on a server and fire it up, but maybe it'll give someone some ideas. I think the new code is cleaner and it'll be easier for me to add features now.

Beware bugs, I'm positive I introduced some.

EDIT: A word about the CRUD library... persisting data to disk is hard when the data may be mutated by many threads at once and the destination for your data is an SQL database that may or may not even be running. I have more respect for people who've written libraries that actually do this kind of thing and work right. Granted I only spent 3 days on mine but still, it's tricky.

I gave up for a while and tried clj-record, but it was prohibitively slow. It has the old N+1 queries problem when trying to select an object which has N sub-objects. In real life you'd write SQL joins to avoid such things. Ruby on Rails on the other hand gets around this via some nasty find syntax.

I get around it by having all my data in a Clojure ref in RAM already so it doesn't matter. And by using hooks so each object keeps a list of its sub-objects and the list is always up-to-date (updates of sub-objects propagate to their parents). But the crap I have to do to get this to just barely work is pretty painful.

Blog source code released

* This page is related to "Clojure 1, PHP 0".

By popular demand, I've released the source code for my blog. Hope someone finds it useful.

http://github.com/briancarper/cow-blog/tree/master

Feedback and bug reports welcome, email me or post them somewhere on my blog and I'll find them.

Clojure

cow-blog

The source code for the blog you're reading right now.

http://github.com/briancarper/cow-blog/tree/master

clj-crud

SQL Database CRUD operations for Clojure.

http://github.com/briancarper/clj-crud/tree/master

clj-qt4-mailtray

A system tray app to check your email and notify you of new messages.

http://github.com/briancarper/clj-qt4-mailtray/tree/master

Mandelbrot Set in Swing

http://briancarper.net/clojure/mandelbrot-swing.clj

Sample pictures:

These were colored via GIMP, because I cheated. Make good wallpapers.

Mandelbrot Set in ASCII

http://briancarper.net/clojure/mandelbrot.clj

Anti-spam field still holding

* This page is related to "Darn you, spammers.".

So far my silly anti-spam measures are working. Since last week I've had 1861 spam comment attempts, of which 0 were successful. 1857 of them didn't even alter the text my the captcha text field at all. Four of them inexplicably HTML-escaped the < into a &lt;.

One feature I didn't implement from Wordpress is subscribing to comments via email. Sending an email from Java is possible but a little bit painful to implement. The Javamail API is a monster.

I do think it's useful to be able to know when someone responds to comment you left, but is spamming your inbox really the best way? I have to think there's a better way.

I did implement an RSS feed for each individual post's comments. And separate RSS feeds for all the tags on my blog, and all the categories. When RSS feeds are generated dynamically, why not? This is all of the code for the tag feeds:

(defn tag-rss [tagname]
  (if-let [tag (get-tag tagname)]
    (rss
        (str "briancarper.net Tag: " (:name tag))
        (str "http://briancarper.net/" (:url tag))
        "briancarper.net"
        (map rss-item (take 25 (all-posts-with-tag tag))))
    (error-404 )))

Plus the routing code:

(GET "/feed/tag/:name" (tag-rss (route :name)))

But I haven't uploaded the comment-feed feature because I don't know if it's overkill. Personally I am liberal with my RSS feeds, I just pop them into my Akregator and off I go. But I don't know if other people take their feeds more seriously, or what. RSS feeds can be a bit heavyweight. Maybe I should make a feed for all of my comments across all posts.

Blog is still going strong

* This page is related to "Darn you, spammers.".

After I implemented that silly CAPTCHA yesterday, the spam was stopped. There's also a honeypot form field (it's hidden via CSS so humans don't know it's there, and if any bot POSTs text for that field, the data is rejected automatically). It's silly and easily defeated, yet it stopped all 262 spam attempts since yesterday. It looks like all the spam is for one site, but it's coming from a huge range of IPs. So it's probably a botnet. Thanks, MS Windows!

I rewrote my whole CRUD layer so that I could use it for more than one database at once, and then rewrote my gallery code to take advantage, and now two hours later I have my origami gallery back up and running. Both sites are running from the same JVM. I wonder how many sites I can have going at once before the server melts into a puddle of Java-inflicted goo.

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11338 16   0  512m 128m  12m S    0  0.3   0:28.33 java

Good thing I have plenty of RAM on the server. From looking at before and after shots of the memory usage, 66 MB is the JVM itself, and 40MB more is Jetty and Compojure and my code and all the dependencies. Then the last ~20 MB or so is my database slurped into RAM. So I can probably fit another few tens of thousands of posts and comments in here before I have to worry much. The real test will be letting this thing run for a couple weeks and see how hard it leaks.

Fun with HTTP headers

* This page is related to "Darn you, spammers.".

One fun thing about playing with Compojure is that it doesn't do much with HTTP headers for you, which is a good learning opportunity. RFC 2616 is rather helpful here.

For example I learned that if you don't set a Cache-Control or Expires header, your browser will happily re-fetch files over and over, which is a bit of performance hit. Static files that don't change often like images etc. can be set with a higher Expires value so they're cached.

Another thing to keep in mind (note to self) is that using mod_proxy to forward traffic to a local Jetty server means that the "remote IP" you get from (.getRemoteAddr request) will always be 127.0.0.1. If you want the user's real remote IP, you have to look in the X-Forwarded-For header (easily accessed as (:x-forwarded-for headers) in Compojure. Given that Identicons are generated from a hash of an IP address, this has resulted in some screwed up (wrongly identical) avatars for a bunch of people in posts for the past couple days. Oops. Not much I can do to fix that now.

In other non-news, I just the spam logging for the blog so I can see the kinds of things bots are doing to get around my feeble anti-spam measures. Sadly the spam seems to have stopped entirely, right after I set this up. How annoying.