Getting list of referers out of Apache logs

I use Google Analytics, but it has a noticeable lag in updating its information. When my site is being hammered, I'd like to see where all the traffic is coming from. It'd also be nice to see how many hits my RSS feed is getting, and how many images and static files are being direct-linked, which Google Analytics currently isn't tracking for me at all.

So this script will look in my Apache logs and print referers for some URL, thanks to ApacheLogRegex:

#!/usr/bin/ruby

require 'apachelogregex'

raise "USAGE: #{$0} log_filename desired_url" unless ARGV[0] and ARGV[1]

format = '%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'
parser = ApacheLogRegex.new(format)
pat = Regexp.new(ARGV[1])
refs = {}

File.readlines(ARGV[0]).each do |line|
  x = parser.parse(line)
  if pat.match(x["%r"])
    r = x["%{Referer}i"]
    refs[r] = (refs[r] || 0) + 1
  end
end
refs.sort_by{|k,v| -v}.each do |ref,count|
  puts "%s: %s" % [count,ref]
end

I used to use awstats for this, but it was too heavyweight and a hassle to set up and keep running. Google Analytics is a no-brainer to use, even though the accuracy isn't as good as parsing Apache logs. At least I get an idea of which of my blatherings people are most interested in.

Making an RPG in Clojure (part one of many?)

What do you get when you combine old-school Final Fantasy-style RPGs with Clojure? Fun times for all. Well, for me at least.

I'm working on a sort of RPG engine in Clojure so I can make my own RPG. Click the thumbnail for a very preliminary video demo (6.5 MB) showing the engine in action.

RPG Demo

All I do in the video is walk around, and eventually start adding random NPCs to the map to test collision detection. Not all that exciting, but I'm proud nonetheless.

To forestall questions, yes I'll eventually post the source code, but no, not yet. It barely works. Just a proof of concept so far.

Right now I can walk around a map while NPCs also randomly walk around the map, not much more. So there isn't much to talk about. But not bad for 4 days and 600 lines of code (one tenth of which is ASCII art... more on that later). Keep in mind that I have no idea what I'm doing.

Collision detection works so people don't walk through walls or each other, and after endless tweaking I got all the animations to be very smooth, even when I add a few dozen NPCs to the map (as I do in the video). The video is a bit jerky but it looks better in person. All of the admittedly poor artwork is also created by myself, thanks to the GIMP and some hastily-read tutorials on pixel art.

It all runs on plain old Swing in Clojure. Here's some of what went right and what went wrong so far.

Out of memory... ouch

* This page is related to "Deploying Clojure websites".

I've written before about how I'm running four Clojure-driven websites out of a single JVM on my VPS. No problems for many months, but today I tried to make a blog post and got all kinds of out-of-memory errors. Hopefully I didn't lose any / many user comments on this blog in the past couple days, but it's possible.

I restarted the JVM and gave it a bit more RAM to play with, I imagine this will fix things. But we'll see. It occurs to me now that there may be such a thing as too much caching.

Deploying Clojure websites

* This page is related to "Clojure and Compojure to the rescue, again".

On my server I'm running one Java process, which handles four of my websites on four different domains. These are all running on Clojure + Compojure. Some people asked for details of how to do this, so here's a rough outline. For the sake of brevity I'm only going to talk about two domains here, though it scales up to however many you want pretty easily.

This is surely not the only way to do this, and probably not the best way, but it's what I've arrived at after a year of goofing off.

Summary: Emacs + SLIME + Clojure running in GNU Screen; all requests are handled by Apache and mod_proxy sends them to the appropriate Jetty instance / servlet.

Clojure and Compojure to the rescue, again

I haven't posted here much recently because I've been hacking on another recently-sort-of-completed website. One of my favorite hobbies is old 8-bit video games. The first thing I ever programmed was a website about Final Fantasy for the old NES, and I've fiddled with it for the past 10 years or so.

A while back I decided to rewrite the whole thing using Clojure + Compojure with data in mysql. This went really well. I know lines of code isn't that great a metric, but it can give a rough estimate: this whole website is done in 3,400 lines of Clojure, which includes all of the HTML "templates" and the DB layer I had to write. And it's turtles Clojure all the way down. The only thing not written in Clojure are a couple bits of Javascript here and there and the stylesheet.

I suspect the target audience of this blog and the target audience of that website don't overlap that much, but I figured someone might be interested in some of the detail of how it's implemented. A few things I learned...

Comments work again

I broke the ability to leave comments a couple days ago. Thanks to everyone who let me know. It's fixed now.

I broke it while uploading yet another website I finished a couple days ago. It's yet another Compojure/Clojure site, this time a bit more ambitious than my humble blog. I plan to write about that whole experience once I have a bit of time.

Let's parse

Is there anything more fun than parsing strings? I submit to you that there is not. I'm currently reading my way through Parsing Techniques - A Practical Guide, which has a first edition free online. (I'm hoping Santa brings me a copy of the 2nd edition this year.)

This is a good book, with enough math to be rigorous but not so much that it's completely unreadable. It starts from the absolute basics ("What's a grammar?") and goes through the Chomsky hierarchy and then dives into parsing techniques in great detail, in a language-agnostic way.

Languages and grammars are fascinating. In high school I studied Spanish, French, Latin and German, largely in my spare time. When I was 16, if people asked what I wanted to do for a living, I said "translator".

The plan to become a translator failed partly because the quality of my early education was horrendous and partly because mastering a language is extremely difficult and at 16 I wasn't motived enough. And then computers showed up in my life, which gave me a never-ending supply of languages to play with, while being fun (and profitable) in so many other ways. But I still took two years of Japanese classes in college for no reason other than enjoyment, and I'm still trying (and failing) to learn Japanese in my spare time 8 years later.

Perl was my first favorite language probably for no reason other than regular expressions. I can understand how people call PCRE syntax line-noise, but to me it's beautiful line noise. I live and breathe regular expressions nowadays. My favorite CS class in college was one where we went through and laboriously built finite-state automata and pushdown automata and Turing machines. Seeing the equivalence of these simple machines with the different classes of grammars was a huge epiphany. Such a simple concept with such huge consequences.

Dijkstra said:

Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer.

I strongly agree with that sentiment. People tell me at times that I'm good at written communication. I have my doubts, and anyways I find it funny because I'm so terrible at verbal communication. I think if I have any success at writing, it's because I view writing as a mechanical process.

I told a prof in college once that I felt like my papers wrote themselves once I had an idea in mind. There are rules of grammar and style, and you learn them and follow them, or break them deliberately if you have a good reason to. You write some prose, then you debug it until it "works" mentally. I don't care about typos and I split infinitives and comma-splice on purpose, but ambiguous or awkward phrases usually stand out to me like compiler bugs in my brain.

What's more important than language? Few things. Language is important enough to be nearly hard-wired into our brains. Children learn it instinctively. Human beings can still easily and effortlessly out-perform the best supercomputer at the task of parsing and interpreting speech. We think in words. The programming languages computers understand are dirt-simple by comparison, but writing code still feels like writing "thoughts for the computer" sometimes.

There are very few times you'll hear me say "What a wonderful world we live in". But one of those times is when I have the opportunity to explore an area of study like language. It's such an enjoyable experience to struggle and try to master such a thing. It's an amazing universe where we have these weird little rules and they work and we can understand them and manipulate them and produce things with them.

Clojure funding

Rich Hickey works on Clojure full-time for free, and he's asking people who get something out of Clojure to contribute some cash. $100 is less than I spend on random books and crap over the course of a year, so I gladly chipped in.

Clojure doesn't make me any money, I use it for hobby websites that actually cost me money every month just to keep going. But it's so much fun I think $100 is small price to pay, if it keeps Clojure development going.

Lame comment spam management that works

It's been nine months since I ditched Wordpress and moved to a blog system I wrote from scratch (in Clojure). This was a great move in so many ways. One of those ways is comment spam. My site is as popular now (or maybe slightly more popular now) as it was when I was running Wordpress, so I think comparing before and after is valid.

With Wordpress, every morning I'd do the ritual of deleting overnight spambot droppings. Typically I got between 1 and 5 every night. I had a default Wordpress install and all I used for spam filtering was Askimet. Askimet did a surprisingly good job, catching literally if not thousands of spams every week which otherwise would've been ruining my site. But inevitably some would still get through. And what's worse, there were a lot more false positives than I could tolerate.

Since I started counting with my new system, which is around 6 months, to the best of my knowledge I've gotten zero spambot-produced comments that made it through my filters. This is pleasant, to say the least.

The system I'm using is stupid. None of it is stuff I thought of myself, I got ideas from other lots of other blogs or articles I read, but the implementation is mine and it's not sophisticated. It would take a bot author a few seconds to work around it. But no one has bothered. Why bother writing a bot for my one-man blog, when you can write a bot for Wordpress and have it work on tens of thousands of blogs? And I can change my system to defeat the bots with a few lines of code just as easily as they can work around it.

So here's why I think it's working.

Adminer, where have you been all my life?

How do you view and edit data in a mysql DB? Lots of ways.

There's always commandline mysql. This is how I do it around 50% of the time. But it could be better. ASCII-art table dumps are not the easiest things to read, and readline-based history and editing only gets you so far.

phpMyAdmin is what I grew up with. It's pretty good but it's way too heavyweight. It also has all kinds of funky Javascript, to the point where I actually use Greasemonkey to remove some of it. Auto-selecting query text when you try to edit it in particular drives me crazy. Auto-selecting ANYTHING on Linux, where selecting text usually equals clobbering the X clipboard, is a really really bad idea. Last I tried it's a one line of Javascript in Greasemonkey to fix this by the way:

document.getElementById('sqlquery').removeAttribute('onfocus');

But phpMyAdmin may have changed since last I used it, it's been a while.

There's the native mysql GUI, called mysql-gui-tools in Gentoo. It's a standalone app that's pretty good, but the Linux version is gimped up compared to the Windows version for some strange reason. In any case it seems to be discontinued or something. There's some new MySQL Workbench thing coming, which I'll probably try once it hits Gentoo, but it looks like overkill for my simple needs.

SQuirreL SQL is another cross-platform GUI app. It connects to lots of different kinds of DBs (pretty much any DB that Java can talk to), so at work where I have get my mysql server to talk to someone else's MS SQLServer (ugh) I use this. But it's very heavyweight and not the most enjoyable interface.

Last week I chanced upon something pretty good. Adminer is a single PHP file you throw on a server and there you go. It gives you something vaguely similar to phpMyAdmin but far more lightweight. There's very little Javascript messing with my query-writing and the styling is minimal and easy to read. I don't know how secure it is, so I don't plan to put it on any public servers, but on my test server doing web development it's good. I love bouncing back and forth between a the database in one browser tab and my website in another tab. This is what I use now.