Distributed Spamikaze has a number of goals:
- Block a spam source before it has spammed me, by knowing that it already spammed some other Spamikaze installations.
- Figure out that many people get legitimate email from a certain IP address, so it should not be listed.
- When in doubt about an IP address, get a second opinion.
- Make it harder for a real spammer to get his/her IP addresses removed from all the lists.
- Make Spamikaze invulnerable to the DDoS attacks that sometimes take out centralised DNSBLs.
There is not yet a design that would fulfill all these criteria. The main problem seems to be that "block a spam source that spammed other Spamikaze instances" requires a push model, while getting a second opinion on an IP address is more suitable to be implemented as a pull model. If you have an idea on how to get this fixed, please write it down here:
Proposal by Walter:
Push versus pull do not necessarily exclude each other. Imagine a Spamikaze message-bus on which it is possible to push spam-trap hits and additional details and requests about a certain IP-address' reputation. This requires additional design goals that have to be met:
prevent spammers from either tarnishing a certain IP's reputation (Joe Job) or from getting a handle on the survivability of their spam-run, as often happens with SpamAssasin and;
- maintain anonimity of the involved nodes.
Messages that announce spam-trap hits could have a format as follows:
<public_key_id>an id of the public RSA-key of the announcing node</public_key_id>
<unix_time_of_hit>the unix time+timezone of the hit</unix_time_of_hit>
<application_type>SMTP</application_type> (for the time being only SMTP, but distributed Spamikaze could potentially be enhanced for other malicious attempts to use a TCP/IP based service, for example exploits of HTTP-server vulnerabilities)
<hitter_id type:MD5>a cryptographic hash of the application_type+IP-address of the source</hitter_id>
<hit-characteristics class:SMTP-message-body type:MD5>a hash of the message body</hit-characteristics>
<hit_probability>probability of the hit not being a false-positive</hit_probability>
The message also should be be cryptographically signed by the sender node. To discuss: should the unix_time_of_hit and the application_type not be encrypted using the announcer's public key and have the announcement take place at a random delay in order make detection of spam traps more difficult?
Verification requests could have the following format:
<public_key_id>an id of the public RSA-key of the requesting node</public_key_id>
<unix_time_of_request>the unix time + timezone of the request</unix_time_of_request>
<request_id type:MD5>a hash of unix_time_of_request+requester's IP-address</request_id>
<hitter_id type:MD5>a cryptographic hash of the application_type + IP-address of the machine whose reputation is queried</hitter_id>
Each node that is not the originator of a request should add a cryptographically signed report of the number or requests it has received from the requesting party in the past hour and remove such reports from previous nodes in order not to reveal the topology of the distributed Spamikaze network.