You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by jdow <jd...@earthlink.net> on 2006/12/04 10:56:08 UTC

Say what?

I have two copies of the same message content and source sent two
minutes apart. These are the only differences in the messages as
I trimmed out the various verification data and differing times.

===8<---
$ diff first second
0a1
> Status:  U
6c7
<       by mx-avoceta.atl.sa.earthlink.net (EarthLink SMTP Server) with SMTP id
---
>       by mx-jacana.atl.sa.earthlink.net (EarthLink SMTP Server) with SMTP id
9c10
<       by smtpout02.lax.untd.com with SMTP id
---
>       by smtpout01.lax.untd.com with SMTP id
72a74,76
>
>
>
===8<---

Of course the various "id" strings all differ as well.

The first message scored Bayes 80. The second scored Bayes 95. This
implies that Bayes is training itself on garbage as well as message
content.

Since other sources of filtering deal with the Received: lines and the
message header id lines should Bayes be paying any attention to them, too?

Should an id string like L8QWHGMP or an X-UNTD-OriginStamp line such as
below figure into the Bayes algorithm at all?

X-UNTD-OriginStamp: qTKGdH6+6PX6q6wVyyDAiKpzgjuM3gNrL/xEOWaR9Ko1VNgBJE6wCw==
R

{^_^}