You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Stefano Mazzocchi <st...@apache.org> on 2004/02/04 05:22:38 UTC
Why fighting spam with whitelists doesn't work [was Re: Contributing a mailet]
On 3 Feb 2004, at 17:16, <Br...@nokia.com> wrote:
> Stefano, I found your questions quite thought-provoking.
Good.
> Would you mind answering a couple of questions?
of course not.
> 1) I feel that no other solution other than pure whitelisting will work
> in the long run.
A whitelist approach estimates that the send is a human being (so that
is able to judge and take an action) and that the from: address was not
forged. Both are pathetically wrong assumptions these days, especially
after SoBig and MyDoom worm outbreaks.
> I have had my personal email address for many years
> and there are days when I receive over 1000 spams per day.
Join the club.
> I am
> currently using several public blacklists and SpamAssassin set at its
> most aggressive setting, which worked for years until a few months ago,
> but now spammers are getting very smart about bypassing normal
> anti-spam
> tools.
I use bayesian filtering (bogofilter because it's very fast). It's good
enough for almost all sort of spam, but the "random dictionary + image"
type. But that's easily modelled with a rule engine (but I receive so
few of them lately that it's not even worth bothering writing one)
Over the last two weeks, I had 4800 spam messages and only 50 false
negatives (99% correctness) and no false positive so far (even if it's
admittedly hard to tell, my filter is better than I am in rating spam,
that's for sure)
My bogofilter database contains something like 30000 ham messages and
10000 spam messages from my own inbox and it's 35Mb big. The database
is retrained differentially every 5 minutes so that it adapts to
messages I move from my inbox to the spam folder or the various ham
folders [i use my 'outbox' as ham folder as well, since I'm likely to
like email that looks like the one I send out]
> What alternative would you propose to whitelist-only email?
a computational based approach for senders [see
http://research.microsoft.com/research/sv/PennyBlack/] plus digital
signatures for receives (so that you can check that the from address
was forged or not) [see the one attached to this message]
You will still need some sort of statistical analysis to remove that
email that manages to come thru, but the volume would be dramatically
reduced if they find a proper algorithm for the computation-based
approach [which is very interesting problem from a research
perspective]
> 2) I know that creating a new "reply" email directed to the "from" or
> "reply-to" address can be abused for relaying.
no, that's not my concern.
My concern is: if I'm *NOT* the one who sent that email, I don't want
your stinking "are you really you" whitelist message because that's
unsolicited email and that's exactly what we are trying to avoid in the
first place!
> But wouldn't a reject
> of the incoming SMTP transaction itself (with an appropriate error
> message) go back ONLY to the real sender?
what real sender? you have no way to tell if the from: address is
really the guy who sent the email with some sort of trust facility...
and trust is not something that you can take for granted or write an
algorithm in a piece of software for.
> The point is that if somebody
> isn't willing to go through some necessary hassle the first (and only
> the first) time he sends email to me, then that person is not someone I
> want to hear from - EVER.
> I am assuming that the mailet API is called
> -->before<-- the transaction is complete. And of course, there are
> situations, like when joining a mailing list, where whitelisting would
> have to be done in advance by the recipient. But please correct me if
> I
> am wrong.
It's not about being right or wrong, it's about assumptions. You assume
that the guy you receive email from is really the guy who sent it. This
is a false assumption almost everyday.
I receive so many "f**k you!" emails by people that believe I'm the one
with the stinking worm that tries to infects them. While quite
humorous, you are making the same incorrect assumption: I *DON'T* have
a virus, it's somebody else's machine pretending to be me! That's why I
have to sign email now. [even if some stupid email clients can't read
or it scares people because they think that attachment is a virus!]
From that matter, it doesn't make any difference on this planet if your
whitelist server sends me "f**k you, stop it!" or "hey dude, are you a
spammer?": since I didn't write you, such a message is unsolicited and
therefore spam.
So, in order to stop *your* spam, you are increasing mine.
With all due respect, this is what I call a stupid solution.
> BTW, OT, I hope you manage to avoid software patents in Europe. Here
> in
> the US they are already being used to kill many open source projects.
> RedHat is already leaving key functionality out to avoid lawsuits (like
> MP3 playing and other capabilities) and the cacerts.org site was down
> for months because of a patent issue (but they eventually returned).
That's FUD.
Several of the things the foundation first introduced were later
patented by corporations. They will never sue the foundation or nobody
else because if they do they will be publically ridiculized (can you
say SCO?). They do patent to prevent other corporations from sueing
them. Patents are becoming defensive mechanisms, very few companes ever
used them as offensive tools and the one who did did it because
desperate and the markets reacts by killing them faster because they
know that's the last resort so they sell the stocks.
Don't get me wrong, I find the patent sitaution in the US ridiculous,
but that's not even close to be as bad as you paint it.
BTW, don't be paranoid about it, I don't think nobody will be willing
to sue you for an idea that makes the problem even worse.
--
Stefano.