86 Posts Tagged "Ruby"

/img/rss.pngRSS Feed for "Ruby" Tag

Getting list of referers out of Apache logs

I use Google Analytics, but it has a noticeable lag in updating its information. When my site is being hammered, I'd like to see where all the traffic is coming from. It'd also be nice to see how many hits my RSS feed is getting, and how many images and static files are being direct-linked, which Google Analytics currently isn't tracking for me at all.

So this script will look in my Apache logs and print referers for some URL, thanks to ApacheLogRegex:

#!/usr/bin/ruby

require 'apachelogregex'

raise "USAGE: #{$0} log_filename desired_url" unless ARGV[0] and ARGV[1]

format = '%v:%p %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'
parser = ApacheLogRegex.new(format)
pat = Regexp.new(ARGV[1])
refs = {}

File.readlines(ARGV[0]).each do |line|
  x = parser.parse(line)
  if pat.match(x["%r"])
    r = x["%{Referer}i"]
    refs[r] = (refs[r] || 0) + 1
  end
end
refs.sort_by{|k,v| -v}.each do |ref,count|
  puts "%s: %s" % [count,ref]
end

I used to use awstats for this, but it was too heavyweight and a hassle to set up and keep running. Google Analytics is a no-brainer to use, even though the accuracy isn't as good as parsing Apache logs. At least I get an idea of which of my blatherings people are most interested in.

Clojure ORM-ish stuff

Suppose I have this:

user> (def foo [{:id 1 :foo 123} {:id 2 :foo 456}])
#'user/foo
user> (def bar [{:foo_id 1 :bar 111} {:foo_id 1 :bar 222}])
#'user/bar

What I want is to "join" foo and bar so that each item in foo ends up with a sub-list of bars based on matching key fields.

In real life, these lists-of-hash-maps are coming out of a database via clojure.contrib.sql, so this is something I actually want to do pretty often. This is also vaguely similar to what you get out of a Rails-like ORM, where you end up with an object that has lists of sub-objects anywhere you have a one-to-many relationship.

Here's how I end up doing this in Clojure:

(defn one-to-many
  ([xs name ys f]
      (for [x xs :let [ys (filter (partial f x) ys)]]
        (assoc x name ys))))

Now I can do this:

user> (pprint (one-to-many foo :bars bar #(= (:id %1) (:foo_id %2))))
({:bars ({:foo_id 1, :bar 111} {:foo_id 1, :bar 222}), :id 1, :foo 123}
 {:bars (), :id 2, :foo 456})

And if I define a helper function:

(defn key=
  ([xkey ykey]
     #(= (xkey %1) (ykey %2))))

Then I can write it more concisely:

user> (pprint (one-to-many foo :bars bar (key= :id :foo_id)))
;; same as above

And if I have another "table" of data like this:

user> (def baz [{:foo_id 1 :baz 555} {:foo_id 2 :baz 999}])
#'user/baz

Then I can join them all like this:

user> (pprint (-> foo
                  (one-to-many :bars bar (key= :id :foo_id))
                  (one-to-many :bazzes baz (key= :id :foo_id))))
({:bazzes ({:foo_id 1, :baz 555}),
  :bars ({:foo_id 1, :bar 111} {:foo_id 1, :bar 222}),
  :id 1,
  :foo 123}
 {:bazzes ({:foo_id 2, :baz 999}), :bars (), :id 2, :foo 456})

This is pretty concise. It may be possible to do it in an even more concise way, (if so, do share). If I was willing to adhere to some Rails-y naming convention for my table names and for the id fields in my tables, I could make this shorter by not having to specify the names of the id fields, but I don't want to go there. It's trivial to write similar functions for a one-to-one relationship, or to use a join-table to "join" two tables with a many-to-many relationship.

I am happily surprised sometimes by how simple it is to roll my own version of things that previously seemed like dark magic. I used Rails for a long time and it seemed like a crapload of code must have gone into making the ORM work. But four lines of code gets me 75% of what I ever needed Rails' ORM for.

This may be more thanks to me opening my eyes a bit than to Clojure being awesome, but either way, I'll take it.

Happy 2nd Birthday Clojure

Clojure is two years old and it's looking good. Clojure development has been a bit quiet lately but that's because lots of big changes are apparently being worked on behind the scenes, for example rewriting Clojure in Clojure (and enhancing the Java-Clojure interop along the way, to help make this more possible). Meanwhile clojure-contrib continues to grow and the community continues to be vibrant.

I've been putting Clojure to good use at work in data munging and reporting. I've got data in an MS SQL Server database on one box (not by choice, I assure you) and a mysql database on another box, and then there's a bunch of data files in wide variety of other formats floating around the network. I use Clojure to query and compare it all.

Thanks to JDBC and clojure.contrib.sql it's easy to slurp data from a DB into a bunch of Clojure hash-maps. Thanks to clojure.contrib.duck-streams and Clojure's good regex support, the same is true of data files in general.

In the past I'd have written some enormous SQL queries to generate reports, but it's so much easier to use Clojure's wide array of sequence-manipulation functions to manipulate hash-maps. Doing what I want is rarely more difficult than mapping some transformation over the data, filtering out the data I want, and then formatting it nicely (which is easy thanks to Common Lisp-style formatting).

And once I notice patterns in how I'm using those things, I write a few functions and macros to make it more concise. Consicion is one area where Lisps cannot be beaten (short of APL). For example, give me all data from the mysql db "data4" which is collected at "Site1" and happened before 2008, and group it by the person who collected it:

user> (group-by :collector 
                (filter #(and (= (:site %) "Site1")
                              (date-before? (:date %) 
                                            (date-from-string "2008-01-01")))
                        (mysql-data :data4)))

date-from-string isn't a standard function but here's how easy it is to write it, thanks to the JVM:

(defn date-from-string [s]
  (.parse (java.text.SimpleDateFormat. "yyyy-mm-DD") s))

That's pretty much how my data-querying looks. To some that probably looks terrifying, but thanks to Emacs+Paredit it's a few handy keystrokes to type auto-complete and manipulate and automatically pretty-format such things, and thanks to a suitably large dose of Lisp brand Kool-Aid I find it very natural and comfortable to read at this point.

Then I can (csv ...) it or (plaintext-table ...) it or whatever. I replaced thousands of lines of Ruby and SQL queries with a few hundred lines of Clojure this way, and the Clojure version does more and does it better.

One of the things I like about Clojure is that it's such a small language, you can be reasonably sure you know the whole language (or at least have some passing familiarity with all parts of it) once you've read the docs and the API a few times. This is in contrast to languages like Common Lisp for example where the standard is thick enough to be considered a deadly weapon. Java lurks underneath but you can get away with ignoring it almost entirely, until those rare times you need it.

Alan Perlis said:

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures

This is true, but Clojure takes it further: It's even better to have a lot of functions that act on one abstraction of a bunch of data structures. Clojure gives you a bunch of data structures that can all be accessed under the same seq abstraction, and a bunch of functions that work on that abstraction, and the end result is more than the sum of the parts, because everything is interchangeable in lovely ways. I bounce data between sets and hash-maps and vectors without even thinking about it half the time.

That's just one reason of many that I love Clojure nowadays. I get crap done with it, quickly and easily and with surprising amounts of fun. Thanks Clojure and Clojure devs for making my life better.

Now I have two problems

I'm converting one of my websites from Ruby on Rails to Clojure in my spare time. I stupidly put a bunch of RoR-style links inline into certain bits of plaintext content, so in my DB there are a bunch of text fields with <%= link_to ... %> in the middle.

It was easy to fix with a regex though:

(defn clean [txt]
  (re-gsub #"<%=\s*link_to\s+(\"[^\"]+\"|'[^']+')\s*(?:,\s*'([^']+)'\s*)?(?:,\s*image_path\(['\"]([^'\"]+)['\"]\)\s*)?(?:,\s*:controller\s*=>\s*(?::(\S+)|['\"]([^\"']+)['\"])\s*)?(?:,\s*:action\s*=>\s*(?::(\S+)|['\"]([^\"']+)['\"])\s*)?(?:,\s*:id\s*=>\s*(?:(\d+)|:(\S+)|['\"]([^\"']+)['\"])\s*)?\s*%>"
           (fn [[_ s & parts]] (let [href (str-join "/" (filter identity parts))]
                           (str "<a href=\"/" href "\">" (re-gsub #"^[\"']|[\"']$" "" s) "</a>")))
           txt))

And by easy, I mean not easy.

Note to self, try something other than a regex next time.

Note to self, don't bury some framework's funky-syntax DSL in the middle of plaintext content. Next time use HTML or do the conversion from DSL to HTML early rather than late.

Silly how two years ago I thought I'd be using Ruby for that site forever.

Plamsa + Ruby = Ouch

I wrote my first KDE4 plasmoid the other day. I can't release it because it's essentially a clone of something you aren't allowed to copy (maybe I can replace him with a penguin and release it that way though).

But I need to rewrite it first anyways, because I did it using the Ruby bindings for Qt4 and Plasma, and wow it's painful. It has a 50/50 shot of even initializing at any given point. When it does initialize, it has about a 1 on 8 chance of immediately crashing Plasma. And some things I just can't get to work at all, e.g. setting a default size or resizing the applet programmatically; X-Plasma-DefaultSize in the metadata is supposed to do it but it does nothing. And it's not just my system (using KDE 4.3), because I tried it on a Kubuntu machine using stable KDE 4.2 and had the same problems.

The other snag is that the documentation of the Plasma API is buried so deep on the KDE site that I don't even know how I found it. Here it is for those who care (and for my own future reference). I hit lots of dead links on the KDE site on the way there.

Next step is to rewrite the plasmoid in Python or C++ I guess.

Practicality: PHP vs. Lisp?

Eric at LispCast wrote an article about why PHP is so ridiculously dominant as a web language, when arguably more powerful languages like Common Lisp linger in obscurity.

I think the answer is pretty easy. In real life, practicality usually trumps everything else. Most programmers aren't paid to revolutionize the world of computer science. Most programmers are code monkeys, or to put it more nicely, they're craftsmen who build things that other people pay them to create. The code is a tool to help people do a job. The code is not an end in itself.

In real life, here's a typical situation. You have to make a website for your employer that collects survey data from various people out in the world, in a way that no current off-the-shelf program quite does correctly. If you could buy a program to do it that'd be ideal, but you can't find a good one, so you decide to write one from scratch. The data collection is time-sensitive and absolutely must start by X date. The interface is a web page, and people are going to pointy-clicky their way through, and type some numbers, that's it; the backend just doesn't matter. For your server, someone dug an old dusty desktop machine out of a closet and threw Linux on there for you and gave you an SSH account. Oh right, and this project isn't your only job. It's one of many things you're trying to juggle in a 40-hour work week.

One option is to write it in Common Lisp. You can start by going on a quest for a web server. Don't even think about mod_lisp, would be my advice, based on past experience. Hunchentoot is good, or you can pay a fortune for one of the commercial Lisps. If you want you could also look for a web framework; there are many to choose from, each more esoteric, poorly documented and nearly impossible to install than the last. Then you get to hunt for a Lisp implementation that actually runs those frameworks. Then you get to try to install it and all of your libraries on your Linux server, and on the Windows desktop machine you have to use as a workstation. Good luck.

Once you manage to get Emacs and SLIME going (I'm assuming you already know Emacs intimately, because if you don't, you already lose) you get to start writing your app. Collecting data and moving it around and putting it into a database and exporting it to various statistics packages is common, so you'd do well to start looking for some libraries to help you out with such things. In the Common Lisp world you're likely not to find what you need, or if you're lucky, you'll find what you need in the form of undocumented abandonware. So you can just fix or write those libraries yourself, because Lisp makes writing libraries from scratch easy! Not as easy as downloading one that's already been written and debugged and matured, but anyways. Then you can also roll your own method of deploying your app to your server and keeping it running 24/7, which isn't quite so easy. If you like, you can try explaining your hand-rolled system to the team of sysadmins in another department who keep your server machine running.

Don't bet on anyone in your office being able to help you with writing code, because no one knows Lisp. Might not want to mention to your boss that if you're run over by a bus tomorrow, it's going to be impossible to hire someone to replace you, because no one will be able to read what you wrote. When your boss asks why it's taking you so long, you can mention that the YAML parser you had to write from scratch to interact with a bunch of legacy stuff is super cool and a lovely piece of Lisp code, even if it did take you a week to write and debug given your other workload.

Be sure to wave to your deadline as it goes whooshing by. If you're a genius, maybe you managed to do all of the above and still had time to roll out a 5-layer-deep Domain Specific Language to solve all of your problems so well it brings tears to your eye. But most of us aren't geniuses, especially on a tight deadline.

Another option is to use PHP. Apache is everywhere. MySQL is one simple apt-get away. PHP works with no effort. You can download a single-click-install LAMP stack for Windows nowadays. PHP libraries for everything are everywhere and free and mature because thousands of people already use them. The PHP official documentation is ridiculously thorough, with community participation at the bottom of every page. Google any question you can imagine and you come up with a million answers because the community is huge. Or walk down the hall and ask anyone who's ever done web programming.

The language is stupid, but stupid means easy to learn. You can learn PHP in a day or two if you're familiar with any other language. You can write PHP code in any editor or environment you want. Emacs? Vim? Notepad? nano? Who cares? Whatever floats your boat. Being a stupid language also means that everyone knows it. If you jump ship, your boss can throw together a "PHP coder wanted" ad and replace you in short order.

And what do you lose? You have to use a butt-ugly horrid language, but the price you pay in headaches and swallowed bile is more than offset by the practical gains. PHP is overly verbose and terribly inconsistent and lacks powerful methods of abstraction and proper closures and easy-to-use meta-programming goodness and Lisp-macro syntactic wonders; in that sense it's not a very powerful language. Your web framework in PHP probably isn't continuation-based, it probably doesn't compile your s-expression HTML tree into assembler code before rendering it.

But PHP is probably the most powerful language around for many jobs if you judge by the one and only measure that counts for many people: wall clock time from "Here, do this" to "Yay, I'm done, it's not the prettiest thing in the world but it works".

The above situation was one I experienced at work, and I did choose PHP right from the start, and I did get it done quickly, and it was apparently not too bad because everyone likes the website. No one witnessed the pain of writing all that PHP code, but that pain doesn't matter to anyone but the code monkey.

If I had to do it over again I might pick Ruby, but certainly never Lisp. I hate PHP more than almost anything (maybe with the exception of Java) but I still use it when it's called for. An old rusty wobbly-headed crooked-handled hammer is the best tool for the job if it's right next to you and you only need to pound in a couple of nails.

Perl6 features borrowed from Lisp

Via PerlMonks I found a couple of articles discussing in good detail some of the new features of Perl6.

Perl6 steals even more things from Common Lisp than Perl5 did: it has multimethods / multiple dispatch for example, which is a huge plus. Via this interview with Damian Conway we learn that Perl6 will also have named, optional, and "rest" parameters to subs, just like in CL. That's also a good thing; CL's parameter-passing styles are nice, and it's awesome how you can combine them. Certainly better than Perl5 (but everything is better than Perl5). There's also apparently special Perl6 syntax for applying functions to lists and currying functions, and weird Capture objects to explicitly deal with multiple-value returns from subs. Good stuff.

Perl6 is also apparently taking first-class functional objects to an extreme; blocks, subs, and methods are all objects and there are all kinds of metaprogramming hooks to screw around with them. This is one area where Ruby is just a little bit lacking: functions and methods aren't quite first-class enough in Ruby. Most people seem to pass around symbols / names of methods rather than pass around methods as objects themslves. Anonymous blocks are used liberally but mostly via yield, limiting you to one block per method and largely hiding away the block objects themselves.

I'm honestly a bit excited about Perl6, but largely as a curiosity or new toy to play with. It is kind of interesting how languages keep creeping more and more toward Common Lisp. If Perl is a nicer-looking Common Lisp which I can edit properly in Vim, it'll be almost a dream come true; I hate Emacs and Common Lisp tends to be butt-ugly. (Not talking about the parens, mostly about the verbosity and cruft and inconsistencies. Larry Wall famously said that Common Lisp looks like (paraphrased) "oatmeal with toenail clippings mixed in". Perl is certainly at the other extreme.)

http://rakudo.org/ is a good site for keeping up on Perl6 news. It's pretty active. Here's hoping we see a real release of Perl6 someyear.

Vim + screen + REPL = win

Via the Clojure wiki I found a great page describing how you can use GNU screen and some Vim magic to let Vim play nicely with an interactive commandline program like a Common Lisp REPL, Ruby's irb, or Python's, well, python.

That page is a very stripped-down and simpler version of what Limp does for Vim+Lisp. But Jonathan Palardy's version has the benefit of being so simple that you can set it up yourself manually in a second or two. I still have never gotten Limp to work quite right and I don't have the time to debug a big mess of Vim script.

The idea is to start up a named screen session via e.g. screen -S foo -t bar, then start an irb session (or whatever) in there, and then in Vim you can simply yank some text into a named register and send it off to screen via a system call. Download Jonathan's code and see.

It's not a full-blown SLIME; it doesn't have tab-completion or weird interactive debugging windows or such bullcrap. It doesn't capture the output of your command and feed it back into your Vim buffer. But hey, it's pretty good for something you can throw together in 2 minutes, and it works.

So there goes my last reason to ever use Emacs. Good riddance, I must say.

Honestly, Emacs just frustrates the living hell out of me. Oh how I tried to like it. I really did. I've used it on and off constantly over the past year. I have Emacs shortcuts written all over the whiteboard in my office. But its braindead window management, its terrible broken undo/redo system, its finger-crippling key-chord combos, its lack of features I need (like line numbering), its reliance on broken 3rd-party elisp hack scripts for things Vim has built in (like line numbering!), its ugly fonts and GUI elements, and so on and so forth. Vim is such a joy in comparison.

Making Java not suck

There are some good things about Java. The virtual machine has been refined for quite some time. The garbage collector is likely to perform well. The standard library has gone through many iterations and is very encompassing and complete and amazingly well-documented. The community is enormous. The language is as cross-platform as you could reasonably expect any huge program to be. It has nice GUI frameworks (which nowadays even look native on Windows and Linux, if you use SWT), a good threading library, good socket libraries, and all the things I wish Ruby or Common Lisp had.

The one unignorably bad thing about Java is that you have to write it in Java. It's next to impossible to write Java by hand, and it's still a whole lot of pain even if you use one of the massive Java IDEs that trick you into not noticing the pain. The language is way too verbose. The syntax is busy and full of mandatory brackets and parens and punctuation and bullcrap. The demand that you catch every conceivable exception is tiresome. The ability to abstract all of those things away isn't present. The package / import scheme is way too much typing for any human being. No Lisp-style macros, no easy-to-use anonymous functions, clunky iterators, primitive looping constructs. Everything is forced into a Object Oriented mindset even if it doesn't fit well. And so on.

But the good thing is that nowadays you don't have to write Java in Java. You can write Java in Ruby using JRuby, or write Java in Lisp using Clojure. I gave both of these a try in the past week or so, and both are awesome. You can write the bulk of your program in a nice powerful fun-to-write language, and call out to Java to handle the GUI bits or whatever Java is good at handling. You can write tasty Ruby or Lisp abstractions that hide the horrible mess that is Java's syntax. (There's also Jython, if you swing that way.) You'd think it'd be more effort to write Ruby code that translates to Java than just to write plain old Java, but Ruby is so much better than Java that it actually ends up being easier. For me anyways.

It seems like the lines between programming languages is awfully blurry nowadays. Most languages have some way to interface directly with C code. There are GTK and QT bindings for everything under the sun. We have people writing Lisp interpreters in Javascript, Python interpreters in Lisp, and so on. You have all these VMs (JVM and .NET and Parrot (if it ever gets done)) which let you write the same program in whatever language you feel like. And most of them are cross-platform to some degree or other.

It's an interesting trend which makes a lot of practical sense. No language is great at everything, so why bother limiting yourself to one language per program? Especially with computers being so fast today, we can get away with layering language on top of language. It's a nice situation, if you want to get programs done as fast and with as little effort as humanly possible.

Wish list

What's the Common Lisp version of Perlmonks or Ruby-forum? I have yet to find it.

comp.lang.lisp is largely crap. 50% of the traffic on that list is spam about shoes and fake watches. The other half is equally split between:

  • People debating tiny, silly semantic points of the Common Lisp Hyperspec.
  • People stuck in the 70's or 80's, talking about the good old days, ruminating about Lisp history.
  • Flame wars.
  • New people asking for help. Some get good honest advice and helpful answers, many are flamed and ridiculed into next week if they even hint that they dislike the parentheses.

The Common Lisp community (if you can call it that) is a bunch of really smart guys, but they all live isolated in hermit shacks up in the mountains and they spend their time doing magic tricks with Lisp that few people ever see, and if you wander too close they throw rocks at you.

What's the Common Lisp equivalent of perldoc or rdoc? We have the Hyperspec. It's an impressive document, but it's a bunch of painful HTML that looks like it was created in the early 90's, probably because it was. It reads like a dusty, dry, technical document probably because it is. What it's not, is friendly or easily readable.

Perl has CPAN, Ruby has rubygems, what does Lisp have? Either a hand-rolled system definition script, or if you're lucky an ASDF install file. ASDF is the semi-standard Lisp way of installing libraries, except that it doesn't quite work in Windows, it doesn't check dependencies or handle different versions of a package very well, and it doesn't work the same on all Lisp implementations. Many people in the so-called community think it's not very good.

The fellow running Lispcast makes another good point. Where can you download Lisp? It's not obvious.

You could say "OK Brian, good idea, now get to work!" The problem is that even if I had the time or willpower, I'm not the smartest guy in the world. I honestly don't think I could design and run and maintain a CPAN. And even if I did, would anyone use it? But I do know that there ARE plenty of smart, enthusiastic people using Lisp. Yet high-quality friendly code is largely not being produced.

Peter Christensen wrote about "langauge snobs" and the importance of community. One point made is that some really ugly, horrific languages have been extremely successful simply because they've been accessible and fun. An example given is the scripting language in Second Life, which has over 2.5 billion lines of code written in by tens of thousands of amateurs and has accurately modeled a realistic 3D environment with thousands of users at any given time. All in an ugly language some guy invented AND implemented in one week. The developers admit that the language is total crap, but it doesn't matter. 1) It has very good and accessible documentation, 2) it has a very newbie-friendly community, and 3) and it's easy to pick up, throw together some code and get immediate results. Three things Common Lisp lacks.

This is something I've said myself many times: an active, supportive, enthusiastic community is essential for the health of any programming language. Common Lisp simply doesn't have one and it's a shame.

I still secretly hope that Clojure or NewLisp or Arc turn out to be a huge success. They are the kinds of things Lisp needs today.