#language en

The current implementation of Spamikaze has only a primitive way of avoiding false positives: querying one or more WhiteLists of known mail servers (eg. ''whitelist.surriel.com'') and avoiding the listing of those IP addresses.  Not only is it a lot of work to manually maintain such a whitelist, but such a whitelist is also bound to be incomplete and/or inaccurate.

Spamikaze's goal would be to only block those IP addresses that send out a lot of spam and little legitimate email; that way the users of Spamikaze powered DNSBLs would get little spam, while losing only very little legitimate email.  There are various ideas on how to identify both spammy IP addresses (that should be blocked) and IP addresses that are the source of lots of legitimate email (and should not be blocked).  Please add your idea to this list, so we can discuss them all and decide what to do:

= Rik's idea =

This method should be best for large sites, or DNSBLs that get a reasonable number of queries.

  * For each IP address, measure:
    * The number of spamtrap mails recently received.
    * The total amount of email received (if we assume that the amount of DNSBL queries corresponds to the number of emails received from this IP by the DNSBL users, we can count the DNSBL queries about this IP address).
  * For the database as a whole:
    * Calculate the average ratio of (spamtrap mails / total emails) - "spamtrap ratio".
    * Calculate the standard deviation.
  * For each IP address in the database:
    * Blocklist if the IP address has a spamtrap ratio higher than average + standard deviation.
    * Auto-whitelist if the IP address has a spamtrap ratio lower than average - standard deviation.
    * If the ratio is near the average, follow the user's preferences:
      * An aggressive list may still want to blocklist, if a spamtrap mail was received recently.
      * A more cautious list may not want to block these IP addresses.
      * A third option would be to be cautious for IP addresses that are on one of the WhiteLists, and aggressive for other IP addresses.
    * The score of an IP address can be modified depending on reverse DNS - missing or dynamic looking reverse DNS can get a host blocked faster than reverse DNS that looks like a mail server, etc...
  * The IP address will automatically expire from the database if it stops sending mail to spamtraps, since the legitimate mail will cause the spamtrap ratio to go below the average.

= lonki's idea =

This method should be best for small (or even personal) Spamikaze installations.

  * Use spamassassin to find out the IP addresses that send ham.
  * Use that data to build up a whitelist.
  * The sample size is probably too small for statistical analysis.