You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Stefano Mazzocchi <st...@apache.org> on 2004/02/04 05:22:38 UTC
Why fighting spam with whitelists doesn't work [was Re: Contributing a mailet]

On 3 Feb 2004, at 17:16, <Br...@nokia.com> wrote:

> Stefano,  I found your questions quite thought-provoking.

Good.

> Would you mind answering a couple of questions?

of course not.

> 1) I feel that no other solution other than pure whitelisting will work
> in the long run.

A whitelist approach estimates that the send is a human being (so that 
is able to judge and take an action) and that the from: address was not 
forged. Both are pathetically wrong assumptions these days, especially 
after SoBig and MyDoom worm outbreaks.

> I have had my personal email address for many years
> and there are days when I receive over 1000 spams per day.

Join the club.

> I am
> currently using several public blacklists and SpamAssassin set at its
> most aggressive setting, which worked for years until a few months ago,
> but now spammers are getting very smart about bypassing normal 
> anti-spam
> tools.

I use bayesian filtering (bogofilter because it's very fast). It's good 
enough for almost all sort of spam, but the "random dictionary + image" 
type. But that's easily modelled with a rule engine (but I receive so 
few of them lately that it's not even worth bothering writing one)

Over the last two weeks, I had 4800 spam messages and only 50 false 
negatives (99% correctness) and no false positive so far (even if it's 
admittedly hard to tell, my filter is better than I am in rating spam, 
that's for sure)

My bogofilter database contains something like 30000 ham messages and 
10000 spam messages from my own inbox and it's 35Mb big. The database 
is retrained differentially every 5 minutes so that it adapts to 
messages I move from my inbox to the spam folder or the various ham 
folders [i use my 'outbox' as ham folder as well, since I'm likely to 
like email that looks like the one I send out]

> What alternative would you propose to whitelist-only email?

a computational based approach for senders [see 
http://research.microsoft.com/research/sv/PennyBlack/] plus digital 
signatures for receives (so that you can check that the from address 
was forged or not) [see the one attached to this message]

You will still need some sort of statistical analysis to remove that 
email that manages to come thru, but the volume would be dramatically 
reduced if they find a proper algorithm for the computation-based 
approach [which is very interesting problem from a research 
perspective]

> 2) I know that creating a new "reply" email directed to the "from" or
> "reply-to" address can be abused for relaying.

no, that's not my concern.

My concern is: if I'm *NOT* the one who sent that email, I don't want 
your stinking "are you really you" whitelist message because that's 
unsolicited email and that's exactly what we are trying to avoid in the 
first place!

>  But wouldn't a reject
> of the incoming SMTP transaction itself (with an appropriate error
> message) go back ONLY to the real sender?

what real sender? you have no way to tell if the from: address is 
really the guy who sent the email with some sort of trust facility... 
and trust is not something that you can take for granted or write an 
algorithm in a piece of software for.

> The point is that if somebody
> isn't willing to go through some necessary hassle the first (and only
> the first) time he sends email to me, then that person is not someone I
> want to hear from - EVER.

> I am assuming that the mailet API is called
> -->before<-- the transaction is complete.  And of course, there are
> situations, like when joining a mailing list, where whitelisting would
> have to be done in advance by the recipient.  But please correct me if 
> I
> am wrong.

It's not about being right or wrong, it's about assumptions. You assume 
that the guy you receive email from is really the guy who sent it. This 
is a false assumption almost everyday.

I receive so many "f**k you!" emails by people that believe I'm the one 
with the stinking worm that tries to infects them. While quite 
humorous, you are making the same incorrect assumption: I *DON'T* have 
a virus, it's somebody else's machine pretending to be me! That's why I 
have to sign email now. [even if some stupid email clients can't read 
or it scares people because they think that attachment is a virus!]

 From that matter, it doesn't make any difference on this planet if your 
whitelist server sends me "f**k you, stop it!" or "hey dude, are you a 
spammer?": since I didn't write you, such a message is unsolicited and 
therefore spam.

So, in order to stop *your* spam, you are increasing mine.

With all due respect, this is what I call a stupid solution.

> BTW, OT, I hope you manage to avoid software patents in Europe.  Here 
> in
> the US they are already being used to kill many open source projects.
> RedHat is already leaving key functionality out to avoid lawsuits (like
> MP3 playing and other capabilities) and the cacerts.org site was down
> for months because of a patent issue (but they eventually returned).

That's FUD.

Several of the things the foundation first introduced were later 
patented by corporations. They will never sue the foundation or nobody 
else because if they do they will be publically ridiculized (can you 
say SCO?). They do patent to prevent other corporations from sueing 
them. Patents are becoming defensive mechanisms, very few companes ever 
used them as offensive tools and the one who did did it because 
desperate and the markets reacts by killing them faster because they 
know that's the last resort so they sell the stocks.

Don't get me wrong, I find the patent sitaution in the US ridiculous, 
but that's not even close to be as bad as you paint it.

BTW, don't be paranoid about it, I don't think nobody will be willing 
to sue you for an idea that makes the problem even worse.

--
Stefano.