19 Posts Tagged 'Wordpress'
Lame comment spam management that works
It's been nine months since I ditched Wordpress and moved to a blog system I wrote from scratch (in Clojure). This was a great move in so many ways. One of those ways is comment spam. My site is as popular now (or maybe slightly more popular now) as it was when I was running Wordpress, so I think comparing before and after is valid.
With Wordpress, every morning I'd do the ritual of deleting overnight spambot droppings. Typically I got between 1 and 5 every night. I had a default Wordpress install and all I used for spam filtering was Askimet. Askimet did a surprisingly good job, catching literally if not thousands of spams every week which otherwise would've been ruining my site. But inevitably some would still get through. And what's worse, there were a lot more false positives than I could tolerate.
Since I started counting with my new system, which is around 6 months, to the best of my knowledge I've gotten zero spambot-produced comments that made it through my filters. This is pleasant, to say the least.
The system I'm using is stupid. None of it is stuff I thought of myself, I got ideas from other lots of other blogs or articles I read, but the implementation is mine and it's not sophisticated. It would take a bot author a few seconds to work around it. But no one has bothered. Why bother writing a bot for my one-man blog, when you can write a bot for Wordpress and have it work on tens of thousands of blogs? And I can change my system to defeat the bots with a few lines of code just as easily as they can work around it.
So here's why I think it's working.
Clojure 1, PHP 0
Goodbye Wordpress
As I mentioned many times, I've been working on replacing Wordpress for my blogging needs. Wordpress has been pretty good for the past three years, but it's time to move on, for a bunch of reasons.
Primarily, the way Wordpress automatically mangles my text is annoying. For example, it turns newlines into paragraphs inconsistently (especially when it comes to pre/code blocks). This blog is mostly about programming, which means being able to post code without having my quotes turned into "smart" quotes and my --flags turned into long-dashes is kind of important. HTML is sometimes automatically escaped, and sometimes not. I can't count how many comments I've gotten where someone posted some code, then posted again to inform me that Wordpress ate the code for dinner. There are plugins to fix some of this, which break every time Wordpress releases a new version, and have never really worked that well for me.
Writing a theme for Wordpress means a mix of PHP and HTML and CSS, which is painful to read and even more painful to write. Aside from the considerable ugliness of PHP itself, there's a lot of weird magic involved with themes, based on naming conventions for files, weird fall-through behavior when certain theme files aren't present and so on. The Wordpress API is enormous and not fun to work with if you want to do something other than the standard Wordpressy kind of blog structure. Static pages aren't too much fun to work with in Wordpress either.
Lately I think I was getting hammered with spam partly because Wordpress is such an easy target. Askimet is nice but it wasn't catching enough lately; maybe 10-15 spams per week were slipping through. And there was always the chance that some widely-known exploit in Wordpress was going to leave my site susceptible to some roving bot.
And so on.
Hello Clojure
Why Clojure? Because it's awesome and fun and powerful and I wanted to learn it better.
Compojure is a web framework for Clojure that made a lot of this very easy. Coming here from a Ruby on Rails background, Compojure has a lot going for it in comparison. Compojure is lightweight and more low-level than Rails. For example Compojure doesn't enforce MVC on you, doesn't force a unit testing framework on you, and doesn't care how you access your data. Compojure just lets you route HTTP requests to Clojure functions based on the URL and request method (RESTfully: POST/GET/DELETE/PUT), and it gives you easy access to the request information, session, GET/POST parameters and cookies.
Under the hood it's all servlets and Jetty, both of which are solid, stable, well-tested, well-documented technologies. However, thankfully, all of that Java stuff is under the hood, and well under it. I didn't have to write a single line of Java or interact with single servlet directly. Everything (session, params, headers) is a Clojure hash-map from the perspective of my code.
Compojure also comes with a domain-specific language for writing HTML, which is similar to CL-WHO and myriad other Common Lisp HTML DSL's. All of which are awesome. I can't say enough how much nicer it is to write (or generate) structured s-exps than to write HTML by hand. More on that below.
Compojure doesn't come with any way to interact with a database, so I had to write one. clojure.contrib has an SQL lib which easily lets you interact with a MySQL database. (Clojure can talk to MySQL via MySQL's JDBC connector, of course.) I used clojure.contrib.sql to write a small (192 lines) library which slurps up a bunch of database tables into Clojure refs, and provides a few functions for basic CRUD operations so that any updates to the ref data is also transparently reflected in the database. The database is essentially only for keeping an on-disk cache of the data in case I need to restart the server. The average number of DB queries per page is zero; everything except posting/editing/deleting data just reads out of a Clojure ref.
With possibly multiple users posting data at once, it's nice to have Clojure's built-in concurrency support. Updating the data refs with new data is always safe from multiple threads simply by throwing a (dosync) around all of the write accesses. This was completely painless to write.
I decided I wanted to use Markdown for posting comments and authoring new pages. This was also very simple to do; I outlined how to get Markdown working in Java and Clojure, in a previous post. The real-time previews for comments are largely inspired by / ripped-off from Stack Overflow, implemented mostly using open-source Javascript libraries like Showdown, JQuery, TypeWatch and TextAreaResizer.
A Brief Comparison: Clojure vs. Wordpress
All of my code including the CRUD library, all of the HTML for the templates and layout, admin controls, and all the glue to put it together is 1,253 lines of code. Wordpress is somewhere over 78,000 lines of PHP depending what you count (doesn't include any themes or layout, but does include Wordpress features I didn't need and didn't implement). It's still a pretty nice reduction in code overall, any way you look at it.
As an example, in my old Wordpress site I had a plugin catcloud to generate a "tag cloud". This plugin itself is 226 lines of PHP, not bad. However, here's the Clojure code to generate a similar tag cloud (which you can see here currently):
(defn tag-cloud []
(let [tags (sort-by #(.toLowerCase (:name (first %))) (all-tags-with-counts))
counts (map second tags)
max-count (apply max counts)
min-count (apply min counts)
min-size 90.0
max-size 200.0
color-fn (fn [val]
(let [b (min (- 255 (Math/round (* val 255))) 200)]
(str "rgb(" b "," b "," b ")")))
tag-fn (fn [[tag c]]
(let [weight (/ (- (Math/log c) (Math/log min-count))
(- (Math/log max-count) (Math/log min-count)))
size (+ min-size (Math/round (* weight
(- max-size min-size))))
color (color-fn (* weight 1.0))]
[:a {:href (:url tag)
:style (str "font-size: " size "%;" "color:" color)}
(:name tag)]))]
(block nil
[:h2 "Tags"]
[:div.tag-cloud
(apply html (interleave (map tag-fn tags)
(repeat " ")))])))
This is 10 times less code, which is a good reduction in my opinion. Most of the code is the math to generate a weight logarithmically for each tag so they scale nicely. (all-tags-with-counts) fetches a seq of two-item pairs: the tags themselves (which are hash-maps) and a count of posts for each tag. There are two locally-defined functions in the let which generate the text color and the font size and HTML for each tag.
The vectors that look like [:h2 "Tags"] are input for Compojure's HTML-generating DSL; this would be transformed for example into <h2>Tags</h2>. (block ...) is a macro which wraps its content in HTML for the rounded borders of my layout. (Math/log ...) and friends are calls to standard Java math functions.
This whole function is less code than just the horrible boilerplate array declarations at the top of the Wordpress plugin:
$catcloud_field_data = array(
array('name' => 'Minimum Font Size', 'option' => 'catcloud_min_font_size', 'size' => '4', 'maxlength' => '3',
'default' => '9', 'note' => 'Used for the least frequent categories', 'validation' => '/^\d{1,3}(\.\d{1,3})?$/'),
array('name' => 'Maximum Font Size', 'option' => 'catcloud_max_font_size', 'size' => '4', 'maxlength' => '3',
'default' => '18', 'note' => 'Used for the most frequent categories', 'validation' => '/^\d{1,3}(\.\d{1,3})?$/'),
array('name' => 'Font Face', 'option' => 'catcloud_font_face', 'size' => '15', 'maxlength' => '254',
'default' => '', 'note' => 'Set an optional list of font faces', 'validation' => '/.*/'),
array('name' => 'Font Units', 'option' => 'catcloud_font_units', 'size' => '3', 'maxlength' => '2',
'default' => 'pt', 'note' => 'Choose one of em, pt, px or %', 'validation' => '/^(%|em|pt|px)$/'),
array('name' => 'Color Start', 'option' => 'catcloud_color_start', 'size' => '7', 'maxlength' => '6',
'default' => '0066CC', 'note' => 'For the least frequent categories. Use a hexadecimal RGB triplet. ie. 0066CC',
'validation' => '/^[\dA-F]{6}$/i'),
array('name' => 'Color End', 'option' => 'catcloud_color_end', 'size' => '7', 'maxlength' => '6',
'default' => 'CC6600', 'note' => 'For the most frequent categories. Use a hexadecimal RGB triplet. ie. CC6600',
'validation' => '/^[\dA-F]{6}$/i'),
array('name' => 'Before Category', 'option' => 'catcloud_before', 'size' => '3', 'maxlength' => '20',
'default' => '[', 'note' => 'Set the character(s) to display before category names', 'validation' => '/.*/'),
array('name' => 'After Category', 'option' => 'catcloud_after', 'size' => '3', 'maxlength' => '20',
'default' => ']', 'note' => 'Set the character(s) to display after category names', 'validation' => '/.*/'),
array('name' => 'Show Top N Categories', 'option' => 'catcloud_top_n_cats', 'size' => '5', 'maxlength' => '3',
'default' => '', 'note' => 'Show only the top N categories (where N is a number like 10 or 25 or whatever. Set to 0 or empty for no limit.',
'validation' => '/^\d*$/'),
array('name' => 'Excluded Categories', 'option' => 'catcloud_excluded_cats', 'size' => '15', 'maxlength' => '254',
'default' => '', 'note' => 'A comma-separated list of category ids.',
'validation' => '/^[\d, ]*$/'),
)
Ugh. As another example, here's the code that handles a POST request to add a new blog page:
(defn do-new-post []
(check-login
(let [post (add-post *params*)]
(sync-tags post (:all-tags *params*))
(redirect-to "/"))))
It does exactly what it says: Check to make sure the user is logged in, add the post based on the POST params, sync up the tags for that post and redirect to the front page. Lisp lets you say what you want very concisely, with a bare minimum of boilerplate.
How about speed? My Clojure code is actually generating HTML in the most brute-force and wasteful way possible. The HTML for each page is regenerated from scratch, via a cascade of a couple dozen function and macro calls, every time you load a page. But it's still pretty fast, a couple hundred milliseconds for most page requests. This is slightly faster than the Wordpress version of my site. If I ever have performance issues I can switch to another Clojure HTML library, like clj-html which uses the same vector-style syntax but pre-compiles the HTML.
How hard was it to set up on the server? Wordpress is pretty famous for being dirt-easy to deploy anywhere. My Clojure app by comparison was slightly more difficult, as you might expect, but it wasn't brain surgery. My server runs Debian. First I installed the JVM via apt, then I rsynced a bunch of jar's and clj files to the server, then I installed emacs and screen also via apt. Then I put two lines into an Apache config file to proxy-forward traffic to a local port where jetty would be listening. I started Emacs, did (require 'bcc.blog.server), did (bcc.blog.server/go) to start everything, and that's about it. Took about 15 minutes to set up from scratch. When I find a bug, I SSH in, re-attach to screen, fix it in Emacs, hit C-c C-c to recompile just the functions I need to update, and then detach from screen again.
I'm pretty pleased with this so far. It was fun to write and has all the features I used from Wordpress, plus more, and the building blocks are there to extend things if I imagine up a new feature I like.
Looks like my blog is still running today in spite of my predictions. Still waiting for the JVM to crash though, I know it's coming. I plan to post the source code for some of this once I'm sure it works.
New Blog... I think...
OK, here's the new blog. Apologies to anyone who may be following my RSS feed, because the whole feed is probably going to be reset by switching blog engines.
If you can call this an "engine". This is my Clojure rewrite. I'll have much more to write about this tomorrow when I'm awake. In the meantime, bug reports are welcome.
Here are my estimates:
- 52% chance the blog is crashed and down by the time I wake up tomorrow.
- 27% chance my feeble anti-spam measures are easily defeated, and hundreds of spam comments are waiting for me in the morning.
- 14% chance the JVM brings down the whole server.
- 7% chance everything works swimmingly.
I had to take down my origami gallery site just to get this to run. Fun times ahead.
When I came up with this blog layout I thought it was great, but three weeks of looking at it and now I'm starting to hate it. I can work on making it all pretty later though.
Ah well, more tomorrow. Keeping my fingers crossed.
Migrating away from Wordpress: permalinks
Work is progressing on my blog-rewrite in Clojure. It's been lots of fun and I keep adding features. Hopefully not too many features; the whole point of ditching Wordpress is that it's far too bloated. But my new blog already has categories and tags and pages with parent pages and so on and so forth. One of these days I'll actually start using it publicly, maybe.
One issue with migrating away from Wordpress is not to break all existing links that point to my Wordpress blog. Most people with Wordpress blogs (including myself) seem to use some date-based permalink structure, which I'd like to avoid.
I thought I'd have to set up some horrid mod_rewrite thing after the switch, to avoid breaking links, but actually Compojure's routes are powerful enough. Any request that looks like a Wordpress permalink, I pass to a redirect function which spits out some redirect headers to the new location. Simple enough, just a few lines of code of this sort:
(defn old-blog-redirect [name]
(redirect-to (str "/blog/" name)))
(defservlet blog-servlet
(GET "/2009/:m/:d/:name/" (old-blog-redirect (route :name))))
This redirects e.g. /2009/01/31/foo to /blog/foo.
Focusing so much on dates is kind of silly. A lot of blogs have a sidebar with a little calendar, or have a list of links to archives of all their posts by month. How useful is this really? Never once have I read someone's blog and thought "Wow, nice post. I wonder what this guy said in November of 2007. Good thing there's a link right there on the front page!"
Does anyone really care enough about the date something was posted, that the date needs to be encoded in the URL? The way I see it, the only thing people are going to use a date in a URL for is to say "That's too old, so I'm not reading that". How many people are going to be persuaded to read your blog by noticing (based on the URL) that a post is brand new, where otherwise they wouldn't have clicked the link? I have to think not many.
Blog replacement fun
So I'm still thinking how I want to replace this blog. I still plan to write something from scratch, for fun's sake.
One thing I'm sure of is that I don't want to write HTML by hand, at all, under any circumstances. HTML and XML are not human-writable or human-readable languages. They rely too much on things human beings suck at, namely consistency and repetition. Forget a closing tag? Typo a tag name? Now your document is malformed. Undefined behavior, at best. It's too verbose, it has too much needless punctuation.
It's also too hard to manipulate it or do anything with it after you write it. There's XPath, which is itself a mess to work with, manipulating huge strings of crap via slightly smaller strings with its own funky syntax quirks. I've never found an XML-parsing library with an interface that I liked, and I've had to use them extensively in Perl, Ruby and Python.
So first thing, I'm going to convert all posts and comments into Markdown and use that for future posting and commenting. I like Markdown. It's hard to get wrong typing it by hand and it doesn't get in your way. It also doesn't tie you to one implementation; you can turn Markdown into HTML client-side via Javascript or easily parse it server-side. Or you can display it as-is and it's still readable. It's a very nice idea.
Second thing, I plan to use my programming language to write the HTML for the skeleton of my site for me. Opening, closing, and properly nesting tags is something a machine should do for me. Making sure my tags belong to a well-defined list of allowed valid HTML tags is something a machine should check for me.
More than likely I'm going to write this in Clojure, because s-expressions (and better yet, a combination of Clojure literal lists, maps and arrays) makes writing HTML very easy and foolproof. I've also written an HTML-producing DSL in Ruby in the past though; it's not hard to do in any language.
Another thing I'm sure of is that I need a good anti-spam system but that I have no idea what that system should be. Askimet in Wordpress has caught 50,000(!) spam comments since I started my blog. Some spam still sneaks through on me now and then. I've never used a CAPTCHA and don't plan to; they just don't work. I'm probably going to come up with some funky custom anti-spam measures (which are invisible to users) and rely on the fact that no one is going to take the time to break it. My site isn't a huge or popular target, so here's hoping.
A third complication I'm dreading is how to do this without breaking every link anyone ever made to my site. Wordpress's permalink system is OK, but I'd like to change it. Problem is I can't change it; every link to my site from another site is a dependency. So I might have to mod_rewrite redirect the old URLs to new ones, or use two permalink schemes simultaneously. I don't know.
Fun times ahead. How to design a blog is a problem lots of people have solved but no one has really solved perfectly, or else there wouldn't be so many frameworks and packages to do it. The good thing about writing your own from scratch is that it'll work exactly how you want. Wordpress is close but not close enough.
Wordpress is no good for programmers
Wordpress is a good least common denominator when it comes to blogging. It's good for someone to throw some PHP scripts on a random server and have a basic blog running in a few minutes with little configuration.
It's no good for programmers. For me in particular anyways. Reasons:
- PHP, yuck. Blogging is something I do purely for fun. PHP is not fun. PHP is a horrendous, sometimes-necessary evil to be avoided if at all possible.
- Text mangling. Quotes are turned to
smartdumb-quotes. Newlines are mangled into paragraph blocks.wp-includes/formatting.phphas 1200 lines of mostly regular expression replacements. Getting text to show up literally is sometimes hard. People post comments and sometimes Wordpress eats them for dinner. I get complaints all the time.
One of the most important things for a blog like mine is the ability to post plaintext and not have it altered in any way. This is possible if you hack Wordpress enough, or take your chances with various plugins. I've never had success with those kinds of plugins or with plugins of my own. Wordpress changes too often, the plugins break, or there are one or two hooks that are overlooked, or one or two regex-replacements I forget to undo. * Bloat. The only features I use are the ability to post text, categorize it via tags, let people comment on it, and let people browse archives of posts. Spam filtering is nice too. These are not difficult tasks. Wordpress adds all kinds of of baggage on top of that, most of which I never use. Pings? Trackbacks? Blogrolls? User registration? Draft posts? Private posts? File uploading? Crappy WYSIWYG in-browser text editors? Plugin systems? I don't need that stuff. * Permalinks are handled by Apache mod_rewrite rules. Yuck again. There are better ways.
So I'm thinking of migrating away from Wordpress. Which is probably going to be a bit painful if I want to retain all of my current posts, and avoid breaking every link anyone ever made to my site. But it's doable.
What to migrate to? I'll probably write something myself. It's usually easier to write something yourself than to hack up someone else's stuff to get it to work right.
White text? Black text? Cow text?
I took a screenshot of my blog and went into Gimp and did Colors => Invert and thus a new blog layout was born. I also brought back the purple/green one. You can change it via the little skin-selector drop-down thing that's hopefully showing up and working properly for everyone. Skin selection is courtesy of a WP plugin; that site is not in English, but the instructions in the download are, if you want to use it yourself.
Black text on white vs. white text on black... the age-old question. My Vim theme has forever been a black background (ps_color to be specific). Even in broad daylight I find that a black background reduces eye strain considerably. Or maybe it's all in my mind, but then again this is a subjective sort of thing, so whatever's in my mind is all that really matters isn't it?
It's notoriously difficult to use a dark-background GTK/QT theme. Too many programs are written with the assumption that your theme is going to be light backgrounded. However thanks to Kore and a few tweaks here and there I've been getting along pretty well for a few months with a dark theme in KDE. I really need to start posting desktop screenshots more often again. Note to self.
So what's up with the cows? Cows are big, dumb, silly beasts. They can represent strength, or embody vulnerability. They're so disgusting that it somehow wraps around again to awesome.
Are cows really dumb though? Does their silent cud-chewing indicate stupidity, or thoughtfulness? Are cows really silly? Or do we project our own latent silliness onto them? Cows thus embody some of the deepest philosophical questions man has ever dared to ask.
Not really. I've been told by various people that I have the kind of sense of humor where it's impossible to tell whether I'm joking or being serious. Sometimes even I can't tell whether I'm joking or not. I love walking that line. Cows are partly a joke that I never get tired of telling, but also they really do make me smile. Cows are a way to have fun with this website. I view my website almost as a sort of parody of a blog, but a parody I still take seriously in a way. I believe it was Friedrich Nietzsche who said:
If you look long enough into the cow, the cow begins to look back through you.
The internet does not have to be serious business, and I don't want my website (or my opinions) to be taken as seriously as many people seem to want to. My secret hope is that whenever someone comes on here to flame me about my opinions, they'll look up and see a cow in a fedora and say "Wait a second... what am I doing?"
Also Gentoo's mascot is a cow. I estimate that the cows on my website increase its overall performance by 14%.
New blog layout (now with more cows)
"OK, what's up with the cows?", you might ask. Actually no one ever has asked. So I'm not saying.
I'm experimenting a bit with a new blog layout. It's black text on white rather than my customary white text on black/grey. I drew the cow in the top right using Inkscape. (Inkscape is such a wonderful program. So easy to use. )
I put this layout together from scratch in exactly 4 hours (not counting cow-drawing time). Just goes to show how easy it is to make Wordpress themes I guess.
I've gone from a sort of three-column layout to a one-column layout. This is partly / largely because I post source code snippets, and I need the full width of the screen for those.
Source code should generally go into PRE tags to preserve whitespace. However that usually has the side effect of preventing text from line-wrapping. In many blogs with multi-column layouts, source code snippets overrun into the sidebar area, and either end up being hidden under the sidebar (thus unreadable), overflow over top of the sidebar (thus looking messy), or if you're lucky, the PRE element itself is side-scrollable like a mini frame (which is kind of annoying). In a one-column layout, I get to use the full width of the screen, which is much nicer in my opinion.
The other reason I like one-column is that the sidebar columns in a multi-column layout invariably have just a few links at the top, and then you have a 200-pixel wide column of blank space all the way down the page to the bottom. It's a bit of a waste. This layout on the other hand is still fairly readable even if I resize Firefox to 400 pixels wide.
Do users really USE most of the links webmasters put into their sidebars? If people want to look at my archives, they can probably go to an archives page. It seems almost absurd how much we try to cater to people's whims and impatience by loading up websites (and computer interfaces in general) with every single thing anyone might want to do on every single page. It ends up looking cluttered and most of those links are probably unused the vast majority of the time. I wish I know how many times anyone ever used the tag cloud I had displayed in my sidebar for so long.
If anyone has any suggestions about things that would make this layout easier to read or use, or if you loathe the new layout (or love it, I guess that's a slim possibility) feel free to leave me a scathing comment. I've tested it in Firefox 2 / 3, IE7, Opera and Konqueror. IE6 can die in a fire.
nofollow
Someone recently mentioned that author comment links in my blog were all given a nofollow attribute. So I wrote a plugin that removes nofollow from comment author links. If you search around it's not hard to find a million other plugins that do the same thing. Putting something like this into /wp-content/plugins (and activating the plugin) should probably do it:
<?php
/*
Plugin Name: Remove nofollow
Plugin URI: http://briancarper.net
Description: Remove 'nofollow' from comment author links
Version: 1.0
Author: Brian Carper
Author URI: http://briancarper.net
*/
function remove_nofollow($text) {
return preg_replace('/ rel=\\'.*?\\'/', '', $text);
}
add_action('get_comment_author_link', 'remove_nofollow');
?>
There is apparently some amount of controversy about nofollow. Who woulda thought. I personally don't see much need for it. Links are links. If I didn't want links to be followed, I wouldn't post them or let other people post them in the first place.
Wordpress theme download (Cow 1.0)
I put my old Wordpress theme up for download as requested. Enjoy. I think.
