Busting referrer spam is … a busted process?
by ZetaGecko | Add Your Comments | Server Administration
I've been updating my referrer spam blocks every few days, but the **** who's spamming me keeps registering new domain names to do their spamming from. It keeps my server logs a little cleaner, but it feels like more work than I should have to be doing...sort of like email spam. Blocking based on the spammer's URL just doesn't seem like the right way to do.
First a little history: when I set up my referrer spam blocker last month, I figured I'd better be careful not to block legitimate traffic. Because of the nature of my products, I tend to get linsk from a wide variety of places, so blocking traffic from pages with spammy looking words anywhere in the URL just wasn't an appropriate solution. So I cooked up a pattern that only looks at the domain name, as follows:
SetEnvIfNoCase Referer "[^/]*?//[^/]*(bnetsol|buy-2005|casino-)" BadReferrer
I've shortened the list of things I'm searching for considerably for simplicity in posting here, but that should give you the idea...if you're familiar with regular expressions. In case not, here's what that all means:
[^/]*? = some number of non slash characters (eg. "http:")
// = two slashes
[^/]* = any number (including zero) of non-slash characters
(bnetsol|buy-2005|casino-) = bnetsol or buy-2005 or casino-
So the pattern will match http://www.buy-2005.com/, but not http://www.hello.com/buy-2005.
But enough history--what did I do today? Today, grepped my server log for " 403 ", the error code that my server returns when it denies someone access based on the above pattern, extracted the IP addresses that had been blocked, counted the occurrences of each (using a Perl script--not manually, of course), and added some "deny from" lines to my server configuration to block those with the highest counts. Presuming the spammer is operating from the same servers each time they fling their slop on my server, I won't have to update my pattern to catch their new domain names.
What's the problem with this approach? First, the perpetrator may be using somebody else's computer or a shared computer to spam me (many of the offending IP addresses appeared to be proxy servers), so I'm probably blocking some legitimate people. I'm guessing that few if any other users of those computers were planning on coming to my site, so I'm not too worried about that.
Second, I still have to run through a manual process to update my blocks if they use another server someday. But I'm guessing (hoping) that they'll switch IP addresses less often than they switch spam URLs. So far, it appears likely that that will be the case.
Finally, today, I tried tracking down the spammer so that I could email them and request to be removed from their list. Why would I think such a thing would work? With an email spammer, I wouldn't, but in this case, since I can guarantee that they're not going to get any value out of spamming my referrer logs (since my referrer logs are not exposed to the web in any way), they might be convinced to stop wasting their bandwidth spamming me.
The problem with that idea was that they're pretty much impossible to track down. The domain names they spam me with are registered to possibly artificial persons with email addresses at the same domain. That domain name exists, but there's no web or email server there. Thus, no way to contact them.
I sure hope someone comes up with a better solution for this problem.