This is a read-only archive!

Email woes

I own my own domain (or five) and one of the good things about that is having nearly infinitely many email accounts if you want them. So I tend to make up a new account for every site I register at. This leads to amusing things like getting an email from a marketing firm asking me to complete a survey for an airline "who wants to remain STRICTLY ANONYMOUS". Sent of course to UNITED@briancarper.net. Oops.

Because of laziness I set up a catchall account on my domain so every email sent to anything @briancarper.net would be sent to me. This was such a horribly bad idea, I'm unsure how I lasted for a couple of years this way. I was getting about a few hundred spam emails per day. Amazingly spamassassin + Thuderbird's junk mail filter caught almost every single one of them to the point where I hardly even noticed. Spam filters can be bad in the same way pain killers can be bad. They don't solve a problem, they only mask the pain so you can ignore the problem.

So I decided to stop using a catchall. Problem is that I already have around a hundred email addresses I've used for various message boards and companies and friends and family, and there's no way I'm going around to change them all. So I decided to just get a list of them all and set up a big list of postfix aliases for now.

So, I downloaded my whole email account in mbox format and wrote a Ruby script to crawl it and make a list of all the email accounts I've ever received mail from. Thank you Linux mailserver for storing email sanely in plaintext. Luckily for me, I haven't deleted any emails from my server since 2005; so my generated list of emails is likely to be pretty complete. It pays to be obsessive sometimes.

Even a braindead brute-force Ruby script is fast enough to do this. Took a minute or two to scan 200MB of plaintext.

#!/usr/bin/ruby
require 'find'
found = {}
Find.find(ARGV[0]) do |fn|
  next unless File.file? fn
  File.read(fn).scan(/[A-Za-z0-9_-]+@briancarper.net/) do |email|
    next if found[email]
    found[email] = true
    puts email
  end
end
April 18, 2008 @ 1:49 PM PDT
Cateogory: Programming
Tags: Spam, Email, Ruby, Linux

1 Comment

Patrick
Quoth Patrick on April 18, 2008 @ 8:32 PM PDT

"I was getting about a few hundred spam emails per day. Amazingly spamassassin + Thuderbird's junk mail filter caught almost every single one of them to the point where I hardly even noticed. Spam filters can be bad in the same way pain killers can be bad. They don't solve a problem, they only mask the pain so you can ignore the problem."

In my opinion, it's exactly the opposite. Content-based spam filters are 'the only real thing'. Blacklists, "unknown email addresses" and whatever obscure things one can think of are short-sighted. A well-trained content-based filter will let through the mails that you want to read, based on your training, and nothing else. No matter where the mail comes from or what's the target address, etc.