You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/02/18 01:18:30 UTC
[Bug 3055] New: Bayes: use hash instead of Message-Id?
http://bugzilla.spamassassin.org/show_bug.cgi?id=3055
Summary: Bayes: use hash instead of Message-Id?
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: minor
Priority: P5
Component: Learner
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: jm@jmason.org
Folks --
this has come up before, but I think we might as well raise it again ;)
Basically, Robert Menschel noted on Fri, 13 Feb 2004 20:59:56 -0800
in this mail
Subject: Re[2]: Some real anti-bayes stuffing followup
Date: Fri, 13 Feb 2004 20:59:56 -0800
Cc: spamassassin-users.incubator.apache.org
the following:
'I've received multiple spams all using the same message id.
a) If a ham is sent to my domain with four recipients here, then because
of the way I run SA, I could process that email four times, once for each
mailbox. That's expected. And it's expected that each of those emails
will have identical bodies, and identical subjects.
b) I receive spam where in a given day I can receive similar spam,
identical message ids, but with different subject headers (usually random
words or letters added to a subject), and/or with different bodies
(sometimes minor random differences, sometimes very different messages).
c) I receive spam where on Jan 2 I can receive spam with a given message
ID, and I can receive spam (similar or not) with identical message ids on
Jan 14, Jan 30, Feb 12, etc.'
I think this is probably a bayes-evasion technique, since we key
our bayes_seen db on Message-ID if present.
What were the objections to using a hash of some selected headers (From, To,
Subject) and the message body, again? Strikes me this is a more resilient
way to avoid spammers using 1 message ID for all their spam and evading
bayes learning that way.
--j.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Re: [Bug 3055] New: Bayes: use hash instead of Message-Id?
Posted by Daniel Quinlan <qu...@pathname.com>.
bugzilla-daemon@bugzilla.spamassassin.org writes:
> What were the objections to using a hash of some selected headers (From, To,
> Subject) and the message body, again? Strikes me this is a more resilient
> way to avoid spammers using 1 message ID for all their spam and evading
> bayes learning that way.
I agree. Let's move to using the hash in 3.0.
I think the main concerns (I wouldn't say objections) were (or should
be).
1. overhead of computing the hash (not a big deal, I think)
2. stability of the hash to minor changes (like whitespace in headers,
whitespace at end of body, header sorting, Received headers, etc.)
that could cause a mismatch in generated ID from one hashing to the
next.
3. backward compatibility with existing Bayes databases.
Daniel
--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting