comment spam countermeasures (mollom, honeypots, hashcash, bad behavior)

The spam situation has recently gotten a bit out of hand and automated defenses have been giving in.

I've taken a close look at the problem and implemented a gauntlet of spam countermeasures that I'm hoping will give us back the upper hand for a while longer, without compromising on the user experience.

Previously, we only used mollom to filter out comments by anonymous users. This was easy to get around for two reasons:

  1. there were nothing filtering comments/topics by authenticated users
  2. there was nothing preventing automated account creation (e.g., captcha), except the need to verify your e-mail, which many bots are capable of doing automatically.

With the rise of spam by authenticated users this obviously was no longer enough. Rather than play whack a mole with specific spamming methods, I've tried to implement defense in depth strategy.

Here are the additional countermeasures now implemented on the TurnKey website:

  1. upgraded to the latest mollom module:

    New features:

    • honeypot: pretty sucky implementation though. Should be very easy for spambots to detect and workaround.
    • blacklists: ability to manually configure blacklists that block posts that contain certain keywords, links, etc. This is a tool us humans can use to fight against specific troublemakers.
    • ability to retain "spam": instead of discarding anything it detects as spam it will save them to the database. That way we can spot/diagnose false positive and find "false negative" patterns to blacklist.
  2. mollom filters authenticated users as well: I'm hoping to turn this off once we prove that the other countermeasures work well enough that it isn't necessary.

    Moderators are still exempt though. It would be really cool if Mollom supported exempting users that pass a configurable Karma threshold from the filtering.

  3. honeypot fields on submission forms: we're now using the upgraded mollom module and spamicide to insert a bunch of fake fields into HTML forms that could be targeted by spam bots. These fields are hidden by CSS so users don't see them, but most spam bots aren't sophisticated enough to realize that and fill them out anyway. That's how we get them.

  4. hashcash in javascript: submitting to spammed forms now requires the browser to perform a Hashcash calculation in Javascript. This techniques relies on the fact that most spam bots don't have a full blown embedded Javascript engine and even if they do, performing expensive calculations would be sure to slow them down...

  5. HTTP/IP fingerprinting: we're now using Bad Behavior to detect and shut out known attack bots. Bad Behavior relies on a broad spectrum range of HTTP fingerprinting techniques to tell apart real users (e.g., using real browsers) and good bots (e.g., googlebot), from the bad bots.

    One of the things I like about Bad Behavior is that it can detect and ban evildoers before they even reach Drupal. This may improve performance by reducing the load on the database. OTOH, this doesn't come for free, but the analysis involved is pretty cheap so I think the overall result in a net gain.

    Besides spam, attack bots can take up significant server resources, which gives us another excellent reason to block them.

Add new comment