<?xml version="1.0" encoding="UTF-8" ?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc=" http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>briancarper.net (λ) (Tag: Spam)</title><link>http://briancarper.net/tag/7/spam</link><description>Some guy's blog about programming and Linux and cows.</description><item><title>Ads on license plates?</title><link>http://briancarper.net/blog/ads-on-license-plates</link><guid>http://briancarper.net/blog/ads-on-license-plates</guid><pubDate>Sun, 20 Jun 2010 22:09:08 -0700</pubDate><description>&lt;p&gt;What if when your car stops at a red light, your &lt;a href=&quot;http://www.mercurynews.com/ci_15338527?IADID=Search-www.mercurynews.com-www.mercurynews.com&amp;amp;IADID=Search-www.mercurynews.com-www.mercurynews.com&amp;amp;nclick_check=1&quot;&gt;license plate displays ad banners&lt;/a&gt;?  What could possibly go wrong?&lt;/p&gt;

&lt;p&gt;Quoth the person(?) who wrote this bill:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&quot;We're just trying to find creative ways of generating additional revenues,&quot; he said. &quot;It's an exciting marriage of technology with need, and an opportunity to keep California in the forefront.&quot; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The forefront of annoying the hell out of people.  Certainly what I need is more distractions on the road.  I mean, what if there's a new brand of toothpaste and I didn't find out yet?  Someone somewhere needs to earn a dime for telling me about it by any means necessary.&lt;/p&gt;

&lt;p&gt;I'm just waiting for the first company to propose paying new parents a few hundred dollars to tattoo ads on their babies.&lt;/p&gt;</description></item><item><title>Printer spam: what could possibly go wrong?</title><link>http://briancarper.net/blog/printer-spam-what-could-possibly-go-wrong</link><guid>http://briancarper.net/blog/printer-spam-what-could-possibly-go-wrong</guid><pubDate>Thu, 17 Jun 2010 10:59:57 -0700</pubDate><description>&lt;p&gt;As further evidence that there are no depths to which companies won't stoop when it comes to advertising, HP has come up with a great idea: Get people to hook their printers up to the internet and then &lt;a href=&quot;http://www.computerworld.com/s/article/9178128/HP_partners_with_Yahoo_for_targeted_ads&quot;&gt;spew advertisements out of their printers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Well, it's a win-win situation for the companies doing the advertising: Not only will people see your ads, they'll pay for the ink and paper to print them.  Maybe not such a great situation for the end-user though.&lt;/p&gt;

&lt;p&gt;And then there are the privacy implications of targeting ads based on geolocating the IP address of the printer.  Which I find a bit disturbing, but I guess advertisers already do that with online ads.  But wait, there's more:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Ads can also be targeted based on a user's behavior as well as the content, said Vyomesh Joshi, head of the HP's Imaging and Printing Group.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Looking at what I'm printing so you can try to sell me things?  Just a bit creepy.&lt;/p&gt;

&lt;p&gt;Most troubling to me is the intrusiveness of the whole thing.  They're taking control of a physical object in my house and using it against me.  May as well kidnap my cat and train him to spell out &quot;BUY PEPSI&quot; in his cat litter.&lt;/p&gt;

&lt;p&gt;Quote some slimeball at HP:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&quot;What we discovered is that people were not bothered by it [an advertisement],&quot; Nigro said. &quot;Part of it I think our belief is you're used to it. You're used to seeing things with ads.&quot;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Translation: &quot;&lt;em&gt;We know this is a really horrible idea, but if people are complacent enough to sit there and take it without complaint, what's stopping us?&lt;/em&gt;&quot;&lt;/p&gt;

&lt;p&gt;He's right though, people are used to it.&lt;/p&gt;

&lt;p&gt;I guess TV, radio, internet, phones, product placement in movies and games, print media, billboards and the postal service just aren't enough.  Clearly what the world really needs is another ad-delivery mechanism.&lt;/p&gt;</description></item><item><title>Lame comment spam management that works</title><link>http://briancarper.net/blog/lame-comment-spam-management-that-works</link><guid>http://briancarper.net/blog/lame-comment-spam-management-that-works</guid><pubDate>Sun, 06 Dec 2009 02:34:08 -0800</pubDate><description>&lt;p&gt;It's been nine months since I ditched Wordpress and moved to a blog system I wrote from scratch (in Clojure).  This was a great move in so many ways.  One of those ways is comment spam.  My site is as popular now (or maybe slightly more popular now) as it was when I was running Wordpress, so I think comparing before and after is valid.&lt;/p&gt;

&lt;p&gt;With Wordpress, every morning I'd do the ritual of deleting overnight spambot droppings.  Typically I got between 1 and 5 every night.  I had a default Wordpress install and all I used for spam filtering was Askimet.  Askimet did a surprisingly good job, catching literally if not thousands of spams every week which otherwise would've been ruining my site.  But inevitably some would still get through.  And what's worse, there were a lot more false positives than I could tolerate.&lt;/p&gt;

&lt;p&gt;Since I started counting with my new system, which is around 6 months, to the best of my knowledge I've gotten &lt;strong&gt;zero&lt;/strong&gt; spambot-produced comments that made it through my filters.  This is pleasant, to say the least.&lt;/p&gt;

&lt;p&gt;The system I'm using is stupid.  None of it is stuff I thought of myself, I got ideas from other lots of other blogs or articles I read, but the implementation is mine and it's not sophisticated.  It would take a bot author a few seconds to work around it.  But no one has bothered.  Why bother writing a bot for my one-man blog, when you can write a bot for Wordpress and have it work on tens of thousands of blogs?  And I can change my system to defeat the bots with a few lines of code just as easily as they can work around it.&lt;/p&gt;

&lt;p&gt;So here's why I think it's working.&lt;/p&gt;

&lt;!--more Spam prevention measures below.--&gt; 

&lt;h1&gt;1. It's not Wordpress&lt;/h1&gt;

&lt;p&gt;Just by using something slightly different from Wordpress, I'm think I'm already ahead.  For example if you have a blog where a form posts comments to &lt;code&gt;/wp-comments-post.php&lt;/code&gt;, a bot doesn't even need to look at your site to spam you.  They can blast your server with POST data at that URL in a format they already know Wordpress will accept.  My site is all custom code, so everything is different enough that default bot attempts fail immediately.&lt;/p&gt;

&lt;p&gt;I think this is the reason that only &lt;strong&gt;1853&lt;/strong&gt; spam comments have even been POSTed at me in the last six months.  That's an improvement of one or two orders of magnitude already.&lt;/p&gt;

&lt;h1&gt;2. Honeypot text field&lt;/h1&gt;

&lt;p&gt;So what about the comments that are actually POSTed?  They are presumably the result of bots that parse sites' HTML looking for comment forms and try to POST data that satisfies the form.&lt;/p&gt;

&lt;p&gt;So in my comment form I have a field called &lt;code&gt;referer&lt;/code&gt;.  A &quot;How did you find my site?&quot; kind of thing.  In fact I don't care how you found this site, this field is a honeypot.  The div containing this field is hidden via CSS.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;div#referer-row {
    display: none;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So you shouldn't ever see it if you're a human using a browser.  But bots parsing the HTML see this field, unless they also bother to parse my CSS and see that it's hidden, which would be expensive and apparently they don't do it.&lt;/p&gt;

&lt;p&gt;If you put anything in this &lt;code&gt;referer&lt;/code&gt; field, your comment will be rejected as spam.  Simple enough.   &lt;/p&gt;

&lt;p&gt;Many blogs require you to fill in every field or else the comment is rejected, so it seems reasonable to expect most bots to fill in all of your form fields.  (My blog actually requires you to fill in nothing but the comment text; author will be set to &lt;code&gt;Anonymous Cow&lt;/code&gt; if you don't fill it in.)&lt;/p&gt;

&lt;p&gt;In fact this seems to be the case; of 1853 spam comments since March, &lt;strong&gt;1810&lt;/strong&gt; put something into this field.  Most of the time it's a random string of letters.  Not even a URL.  Sometimes it's a couple words like &quot;insurance quotes&quot; or something about drugs or casinos.&lt;/p&gt;

&lt;p&gt;The downside of this is if you're a human using a browser that doesn't understand CSS, you will see this field.  Then if you type something into it and try to comment, it'll end up as spam.  So Lynx users and time travelers from 1987 trying to leave me comments might be confused at first.  &lt;/p&gt;

&lt;p&gt;However as far as I can tell, no intelligible data has ever been entered into this field by a human, so I don't think it's a concern.  Six times, the word &quot;None&quot; was entered, but I don't think this is a human because that's nonsense answer to &quot;How did you find this site?&quot;.  But you never know.&lt;/p&gt;

&lt;h1&gt;3. Lame static CAPTCHA&lt;/h1&gt;

&lt;p&gt;That leaves 43 spam comments that made it this far.  My other anti-spam measure is a word you have to type.  But it's always the same word, and the word is &lt;strong&gt;COWS&lt;/strong&gt;.  This CAPTCHA caught the remaining 43.  It looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/random/captcha.png&quot; alt=&quot;CAPTCHA&quot; title=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There's a normal text field with a default value of &lt;code&gt;&amp;lt;= Type this word&lt;/code&gt; specified right in the HTML.  &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;input type=&quot;text&quot; value=&quot;&amp;lt;= Type this word&quot; name=&quot;test&quot; id=&quot;test&quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are no other instructions besides &quot;Type this word&quot;.  I'm assuming either that commenters are familiar enough with CAPTCHAs to know what I want, or can figure it out using common sense.  Given that my target audience is computer geeks and programmers, this should be a safe assumption.  In fact I've had less than a dozen false positives in the past six months via people failing this; see below for details.&lt;/p&gt;

&lt;p&gt;To post a comment, the value of this field must contain the word &quot;COWS&quot; somewhere in it, case-insensitively.  Otherwise it's spam.  Easy enough to implement.&lt;/p&gt;

&lt;p&gt;If you have Javacsript enabled, clicking on this field will clear out the default value.  If you unfocus the field without typing anything, Javascript will put the default value back in.  This is only for convenience.  If you don't have Javascript enabled, you have to highlight and backspace over the default text.  I don't think this is a huge burden.&lt;/p&gt;

&lt;p&gt;Of 1853 spam comments, here's the breakdown for what values end up in this field.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1131: &quot;&amp;amp;lt;= Type this word&quot;
691:  &quot;&amp;lt;= Type this word&quot;
21:   Random letters and numbers
6:    empty
2:    A bunch of URLs
2:    Human beings making typos, e.g. &quot;COW&quot; or &quot;COS&quot;.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So most bots are too stupid even to remove the default value from the field, and none of them entered the correct value.  691 times the bot was somehow smart enough to un-escape &lt;code&gt;&amp;amp;lt;&lt;/code&gt; into &lt;code&gt;&amp;lt;&lt;/code&gt;, which is interesting, but didn't help it defeat the filter.  A lot of the random words look like they were made by Markov chains, e.g. 'fridwolfur' and 'lyndonvolk' and 'calbertdom'.  If I need to write a childerns' poem I'll know where to look for ideas.  One time a bot or spammer managed to type &quot;vows&quot; somehow, but this might or might not be coincidence.&lt;/p&gt;

&lt;p&gt;I think this is better than a normal CAPTCHA because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It's always the word &quot;COWS&quot; in a normal font, so it requires no thought or eye strain to figure out.&lt;/li&gt;
&lt;li&gt;It's black and white, so hopefully people with minor vision problems and color blindness can see it.&lt;/li&gt;
&lt;li&gt;It fits thematically with my blog layout (it's COWS and it has cow spots).&lt;/li&gt;
&lt;li&gt;It's kind of silly, so hopefully people chuckle rather than become ticked off.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's worse than a real CAPTCHA because it requires no effort to break. So it wouldn't work for Wordpress or VBulletin or something with a million users.  But I wonder if every Wordpress had a single static word as a CAPTCHA, but a different word for every blog (generated at install-time maybe), would it work better or worse than the random mangled multi-color CAPTCHAs no one can read?  Real randomly-generated CAPTCHAs don't work anyways; bots can already beat them via OCR or other means.  A simple word would be less annoying for a human, to be sure.&lt;/p&gt;

&lt;p&gt;The other downside is that this is not very accessible to the blind or other people using screen readers, or browsers without image support.  This is unfortunate and I'm still trying to figure out how to get around this.  Right now the ALT text for the CAPTCHA image is &lt;code&gt;This says 'COWS'&lt;/code&gt;; I don't know if this is enough help for people in those situations.&lt;/p&gt;

&lt;p&gt;Of course I'll never know how many people see my CAPTCHA and storm away in a rage without even trying to post a comment.  But I've never heard a complaint.  If this level of CAPTCHA ticks you off personally, please swallow your anger and leave me a comment here saying so, if you feel obliged; I'd love to hear it.&lt;/p&gt;

&lt;h1&gt;False positives&lt;/h1&gt;

&lt;p&gt;As best I can tell, there are no false positives from people filling in the honeypot field.  But even as simple as it is, some people don't succeed at the CAPTCHA image.  Either they typo it or they ignore it entirely.&lt;/p&gt;

&lt;p&gt;I just checked and I counted around 6 comments by real humans where the CAPTCHA was ignored and the default &lt;code&gt;&amp;lt;= Type this word&lt;/code&gt; ended up in the spam DB.  4 of those people re-posted their comment successfully immediately afterwards by filling in the CAPTCHA.  I'm not sure I'm ever going to get much better than that.&lt;/p&gt;

&lt;h1&gt;Spam that makes it through&lt;/h1&gt;

&lt;p&gt;I &lt;em&gt;have&lt;/em&gt; still gotten spam.  Maybe a dozen or so in the past six months.  It's all been in the form of a human typing a normal-looking and relevant comment, about open source software or BASH for example, but with a spammy URL buried in it, e.g. a link to a really dodgy-looking blog trying to sell something, or some scummy SEO site.  It's either a human or a very sophisticated (or lucky) bot; the comment text in these is indistinguishable from a real comment other than the spam URLs.  I have to delete these by hand.&lt;/p&gt;

&lt;p&gt;But I was getting these with Wordpress too.  No automted anti-spam system is going to defeat a human being, so I don't worry about it.&lt;/p&gt;

&lt;h1&gt;That's it&lt;/h1&gt;

&lt;p&gt;The moral of this story is that it doesn't take much to protect yourself from comment spam if you write the code yourself.  As long as it's unique, you'll probably be fine.&lt;/p&gt;

&lt;p&gt;The other moral is that you don't have to annoy the hell out of your users to filter spam effectively.  I'm making the assumption here that my COWS method is not that annoying; tell me if I'm wrong.&lt;/p&gt;

&lt;p&gt;I don't know how well this scales.  Probably not so well.  My blog isn't that highly trafficked.  If my site were more popular it might be worse for me.  But the improvement over Wordpress is unquestionable.&lt;/p&gt;

&lt;p&gt;I've seen all kinds of complicated measures suggested elsewhere, like trying to predict if it's a bot by how many milliseconds it takes between page load and comment posting, or measuring keypress speed, or escaping the HTML of your forms and un-escaping it at loadtime it via Javascript, or setting and retrieving cookies and such.  But a lot of this stuff seems fragile and if your browser doesn't suppoort Javascript or cookies (or your users block them), you're screwed.  I block these things myself, so I expect visitors to do the same.&lt;/p&gt;

&lt;p&gt;If everyone wrote their own blog engines, the world would be a slightly less spammy place.  Or else we'd have much smarter bots.&lt;/p&gt;</description></item><item><title>Spam spam spam spam spammity spam</title><link>http://briancarper.net/blog/spam-spam-spam-spam-spammity-spam</link><guid>http://briancarper.net/blog/spam-spam-spam-spam-spammity-spam</guid><pubDate>Sat, 17 Oct 2009 11:59:05 -0700</pubDate><description>&lt;p&gt;I woke up this morning to about 50 spam emails and some notifications from my host that my CPU usage was about 200% over the past four hours.  Turns out &lt;code&gt;spamd&lt;/code&gt; was going mental.  Not sure what caused it but it seems to be working again after I restarted it.&lt;/p&gt;

&lt;p&gt;One of the worst things about running your own mail server is spam.  I don't much about how to do it properly.  I have SpamAssassin running, I tweaked the settings and trained it well, and it works OK.  Of 8,000 spams in the past week or two, I think only two made it through to my inbox.  But I keep thinking there must be a better way.&lt;/p&gt;

&lt;p&gt;For a while I tried &lt;a href=&quot;http://en.wikipedia.org/wiki/Greylisting&quot;&gt;greylisting&lt;/a&gt;.  Greylisting means you pseudo-bounce every email you get, and force the mail server to resend it.  Once it's resent, that server is added to a whitelist.  The idea is that spam servers won't bother resending and genuine mail servers will.&lt;/p&gt;

&lt;p&gt;I ran this way via &lt;a href=&quot;http://postgrey.schweikert.ch/&quot;&gt;Postgrey&lt;/a&gt; for a couple months.  The good thing is that it works pretty much as advertised.  I went from hundreds of spam emails per day, to fewer than a dozen.  SpamAssassin caught all of those dozen and I never saw them.  It was nice.&lt;/p&gt;

&lt;p&gt;The problem with this, however, is twofold.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;All mail from people you've never heard from before is delayed 5-10 minutes.  This is very annoying in certain circumstances, e.g. registering for an account at a new message board or buying something from an online store you never used before.  I'd rather like to see the receipt or user registration right away.  So to get around this I had to go add them to a whitelist on the server every time, which was ridiculous.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not all genuine mail servers bother resending after the temporary bounce, so you lose mail.  You need only look in &lt;code&gt;/etc/postgrey/whitelist_clients&lt;/code&gt; and see the enormous list of mail servers that Postgrey knows NOT to greylist, to be scared into never using Postgrey again.  This includes yahoo.com, ebay.com, a bunch of airlines, and so on.  The list goes back to 2005 and obviously is an incomplete list, since it only includes servers that people reported having problems with.  I had to add gmail.com to it myself to avoid losing mail from my wife (domains that use large pools of mail servers will always be greylisted, it seems).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Losing mail is the reason I stopped using Postgrey.  So I'm back to SpamAssassin alone and dealing with an occasional spam or two, while my spam inbox balloons.&lt;/p&gt;</description></item><item><title>Anti-spam field still holding</title><link>http://briancarper.net/blog/anti-spam-field-still-holding</link><guid>http://briancarper.net/blog/anti-spam-field-still-holding</guid><pubDate>Mon, 23 Mar 2009 19:38:27 -0700</pubDate><description>&lt;p&gt;So far my silly anti-spam measures are working.  Since last week I've had 1861 spam comment attempts, of which 0 were successful.  1857 of them didn't even alter the text my the captcha text field at all.  Four of them inexplicably HTML-escaped the &lt;code&gt;&amp;lt;&lt;/code&gt; into a &lt;code&gt;&amp;amp;lt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One feature I didn't implement from Wordpress is subscribing to comments via email.  Sending an email from Java is possible but a little bit painful to implement.  The Javamail API is a monster.&lt;/p&gt;

&lt;p&gt;I do think it's useful to be able to know when someone responds to comment you left, but is spamming your inbox really the best way?  I have to think there's a better way.&lt;/p&gt;

&lt;p&gt;I did implement an RSS feed for each individual post's comments.  And separate RSS feeds for all the tags on my blog, and all the categories.  When RSS feeds are generated dynamically, why not?  This is all of the code for the tag feeds:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(defn tag-rss [tagname]
  (if-let [tag (get-tag tagname)]
    (rss
        (str &quot;briancarper.net Tag: &quot; (:name tag))
        (str &quot;http://briancarper.net/&quot; (:url tag))
        &quot;briancarper.net&quot;
        (map rss-item (take 25 (all-posts-with-tag tag))))
    (error-404 )))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Plus the routing code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(GET &quot;/feed/tag/:name&quot; (tag-rss (route :name)))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But I haven't uploaded the comment-feed feature because I don't know if it's overkill.  Personally I am liberal with my RSS feeds, I just pop them into my Akregator and off I go.  But I don't know if other people take their feeds more seriously, or what.  RSS feeds can be a bit heavyweight.  Maybe I should make a feed for all of my comments across all posts.&lt;/p&gt;</description></item><item><title>Blog is still going strong</title><link>http://briancarper.net/blog/blog-is-still-going-strong</link><guid>http://briancarper.net/blog/blog-is-still-going-strong</guid><pubDate>Wed, 18 Mar 2009 22:01:11 -0700</pubDate><description>&lt;p&gt;After I implemented that silly CAPTCHA yesterday, the spam was stopped.  There's also a honeypot form field (it's hidden via CSS so humans don't know it's there, and if any bot POSTs text for that field, the data is rejected automatically).  It's silly and easily defeated, yet it stopped all 262 spam attempts since yesterday.  It looks like all the spam is for one site, but it's coming from a huge range of IPs.  So it's probably a botnet.  Thanks, MS Windows!&lt;/p&gt;

&lt;p&gt;I rewrote my whole CRUD layer so that I could use it for more than one database at once, and then rewrote my gallery code to take advantage, and now two hours later I have my &lt;a href=&quot;http://origamigallery.net&quot;&gt;origami gallery&lt;/a&gt; back up and running.  Both sites are running from the same JVM.  I wonder how many sites I can have going at once before the server melts into a puddle of Java-inflicted goo.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11338 16   0  512m 128m  12m S    0  0.3   0:28.33 java
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Good thing I have plenty of RAM on the server.  From looking at before and after shots of the memory usage, 66 MB is the JVM itself, and 40MB more is Jetty and Compojure and my code and all the dependencies.  Then the last ~20 MB or so is my database slurped into RAM.  So I can probably fit another few tens of thousands of posts and comments in here before I have to worry much.  The real test will be letting this thing run for a couple weeks and see how hard it leaks.&lt;/p&gt;</description></item><item><title>Fun with HTTP headers</title><link>http://briancarper.net/blog/fun-with-http-headers</link><guid>http://briancarper.net/blog/fun-with-http-headers</guid><pubDate>Tue, 17 Mar 2009 22:14:10 -0700</pubDate><description>&lt;p&gt;One fun thing about playing with Compojure is that it doesn't do much with HTTP headers for you, which is a good learning opportunity.  &lt;a href=&quot;http://tools.ietf.org/html/rfc2616&quot;&gt;RFC 2616&lt;/a&gt; is rather helpful here.&lt;/p&gt;

&lt;p&gt;For example I learned that if you don't set a &lt;code&gt;Cache-Control&lt;/code&gt; or &lt;code&gt;Expires&lt;/code&gt; header, your browser will happily re-fetch files over and over, which is a bit of performance hit.  Static files that don't change often like images etc. can be set with a higher &lt;code&gt;Expires&lt;/code&gt; value so they're cached.&lt;/p&gt;

&lt;p&gt;Another thing to keep in mind (note to self) is that using &lt;a href=&quot;http://httpd.apache.org/docs/2.0/mod/mod_proxy.html&quot;&gt;mod_proxy&lt;/a&gt; to forward traffic to a local Jetty server means that the &quot;remote IP&quot; you get from &lt;code&gt;(.getRemoteAddr request)&lt;/code&gt; will always be &lt;code&gt;127.0.0.1&lt;/code&gt;.  If you want the user's real remote IP, you have to look in the &lt;code&gt;X-Forwarded-For&lt;/code&gt; header (easily accessed as &lt;code&gt;(:x-forwarded-for headers)&lt;/code&gt; in Compojure.  Given that &lt;a href=&quot;http://en.wikipedia.org/wiki/Identicon&quot;&gt;Identicons&lt;/a&gt; are generated from a hash of an IP address, this has resulted in some screwed up (wrongly identical) avatars for a bunch of people in posts for the past couple days.  Oops.  Not much I can do to fix that now.&lt;/p&gt;

&lt;p&gt;In other non-news, I just the spam logging for the blog so I can see the kinds of things bots are doing to get around my feeble anti-spam measures.  Sadly the spam seems to have stopped entirely, right after I set this up.  How annoying.&lt;/p&gt;</description></item><item><title>Darn you, spammers.</title><link>http://briancarper.net/blog/darn-you-spammers</link><guid>http://briancarper.net/blog/darn-you-spammers</guid><pubDate>Tue, 17 Mar 2009 18:35:18 -0700</pubDate><description>&lt;p&gt;I was in a rush to get this darn blog finally done, so I threw some stupid anti-spam measures on here.  Namely, the comment form included 20 textareas, 19 of which were &lt;code&gt;display: hidden&lt;/code&gt; and one of which was randomly the right one, and any text in the hidden ones would cause the comment posting to fail.&lt;/p&gt;

&lt;p&gt;It only took a spam bot 48 hours to figure this out, I guess, because the last hour I've been hammered.  So I implemented a CAPTCHA as another short-term holdover until I can code up something good.  At least it immediately stopped this spam bot whose crap I've been deleting for the past hour.  &lt;/p&gt;

&lt;p&gt;Hopefully this isn't too intrusive.  I think it fits the site fairly well, as you will probably agree once you see it.&lt;/p&gt;</description></item><item><title>Email woes</title><link>http://briancarper.net/blog/email-woes-2</link><guid>http://briancarper.net/blog/email-woes-2</guid><pubDate>Fri, 18 Apr 2008 20:49:16 -0700</pubDate><description>&lt;p&gt;I own my own domain (or five) and one of the good things about that is having nearly infinitely many email accounts if you want them.  So I tend to make up a new account for every site I register at.  This leads to amusing things like getting an email from a marketing firm asking me to complete a survey for an airline &quot;who wants to remain STRICTLY ANONYMOUS&quot;.  Sent of course to &lt;strong&gt;UNITED@briancarper.net&lt;/strong&gt;.  Oops.&lt;/p&gt;

&lt;p&gt;Because of laziness I set up a catchall account on my domain so every email sent to anything @briancarper.net would be sent to me.  This was such a horribly bad idea, I'm unsure how I lasted for a couple of years this way.  I was getting about a few hundred spam emails per day.  Amazingly spamassassin + Thuderbird's junk mail filter caught almost every single one of them to the point where I hardly even noticed.  Spam filters can be bad in the same way pain killers can be bad.  They don't solve a problem, they only mask the pain so you can ignore the problem.&lt;/p&gt;

&lt;p&gt;So I decided to stop using a catchall.  Problem is that I already have around a hundred email addresses I've used for various message boards and companies and friends and family, and there's no way I'm going around to change them all.  So I decided to just get a list of them all and set up a big list of postfix aliases for now.&lt;/p&gt;

&lt;p&gt;So, I downloaded my whole email account in mbox format and wrote a Ruby script to crawl it and make a list of all the email accounts I've ever received mail from.  Thank you Linux mailserver for storing email sanely in plaintext.  Luckily for me, I haven't deleted any emails from my server since 2005; so my generated list of emails is likely to be pretty complete.  It pays to be obsessive sometimes.&lt;/p&gt;

&lt;p&gt;Even a braindead brute-force Ruby script is fast enough to do this.  Took a minute or two to scan 200MB of plaintext.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/ruby
require 'find'
found = {}
Find.find(ARGV[0]) do |fn|
  next unless File.file? fn
  File.read(fn).scan(/[A-Za-z0-9_-]+@briancarper.net/) do |email|
    next if found[email]
    found[email] = true
    puts email
  end
end
&lt;/code&gt;&lt;/pre&gt;</description></item></channel></rss>

