#language en The current implementation of Spamikaze has only a primitive way of avoiding false positives: querying a whitelist of known mail servers (eg. ''whitelist.surriel.com'') and avoiding the listing of those IP addresses. Not only is it a lot of work to manually maintain such a whitelist, but such a whitelist is also bound to be incomplete and/or inaccurate. Spamikaze's goal would be to only block those IP addresses that send out a lot of spam and little legitimate email; that way the users of Spamikaze powered DNSBLs would get little spam, while losing only very little legitimate email. There are various ideas on how to identify both spammy IP addresses (that should be blocked) and IP addresses that are the source of lots of legitimate email (and should not be blocked). Please add your idea to this list, so we can discuss them all and decide what to do: = Rik's idea = * For each IP address, measure: * The number of spamtrap mails recently received. * The total amount of email received (if we assume that the amount of DNSBL queries corresponds to the number of emails received from this IP by the DNSBL users, we can count the DNSBL queries about this IP address). * For the database as a whole: * Calculate the average ratio of (spamtrap mails / total emails) - "spamtrap ratio". * Calculate the standard deviation. * For each IP address in the database: * Blocklist if the IP address has a spamtrap ratio higher than average + standard deviation. * Auto-whitelist if the IP address has a spamtrap ratio lower than average - standard deviation. * If the ratio is near the average, follow the user's preferences: * An aggressive list may still want to blocklist, if a spamtrap mail was received recently. * A more cautious list may not want to block these IP addresses. * The IP address will automatically expire from the database if it stops sending mail to spamtraps, since the legitimate mail will cause the spamtrap ratio to go below the average. = lonki's idea = * Use spamassassin to find out the IP addresses that send ham. * Use that data to build up a whitelist. * Good for personal spamikaze lists - not sure if it is suitable for widely used lists.