You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paul Boven <p....@chello.nl> on 2005/03/17 11:16:27 UTC
Testing Bayes (auto)-learning
Hi everyone,
There seem to be some learning-problems with our Bayes database which
I'm trying to track down.
Given a particular spam-message that got auto-trained as ham, then
re-trained as spam, I would like to be able to do the following:
1.) Make sure whether it's in the Bayes database or not, and whether it
is there as ham or as spam. I can use Berkeley's tools to dump the
bayes_seen database, but often the message-ID isn't in there even though
the message got learned; probably with a '@sa-generated' message-ID.
Given the original message, how can I determine which Message-ID Bayes
is using to keep track o fthe message? When will it accept the original
Message-ID, and when will it use the generated one? How can I determine
the sa-generated Message-ID without running it trough the learner again?
How sensitive is the generated Message-ID to changes in Received: and
other headers that happen when the mail gets returned to the learner?
2.) With the new SpamAssassin 3.0.2, I can no longer see what score a
particular token has, because they are hashed. Is there an easy way to
generate these hashes or is there an interface that I can use to check
the score for a token?
My problem is that I have end-users that are basically claiming 'the
more I send to the relearn-address, the lower the Bayes score seems to
be getting.' The included headers seem to support that claim, so I
really want to dig a bit deeper into the whole setup.
Regards, Paul Boven.
Re: Testing Bayes (auto)-learning
Posted by Matt Kettler <mk...@evi-inc.com>.
Greg Abbas wrote:
>Paul Boven <p.boven <at> chello.nl> writes:
>
>
>>Yes, they're forwarding the messages as attachements, and yes, I'm
>>stripping them out of the message/rfc822 attachements before feeding
>>them to Bayes. And in all the tests I've done so far this seems to work,
>>but now that we've upgraded to SA3.0.2 I can't peek 'under the hood'
>>anymore to see if things are still being learned as they should.
>>
>>
>
>On a related note, if I grab messages from a maildir after
>spamassassin has "quarantined" them ("The original message has
>been attached to this so you can view it... yadda yadda") is
>sa-learn smart enough to realize that the spam is contained in
>the attachment?
>
>
sa-learn is smart enough to undo any changes made by spamassassin
itself, so if you use SA to do your tagging, sa-learn will undo it prior
to learning.
However, if you use a tool like amavis, mimedefang, or mailscanner and
use that tool's own encapsulation methods instead of SA's, then sa-learn
won't undo it.
Re: Testing Bayes (auto)-learning
Posted by Greg Abbas <sp...@abbas.org>.
Paul Boven <p.boven <at> chello.nl> writes:
> Yes, they're forwarding the messages as attachements, and yes, I'm
> stripping them out of the message/rfc822 attachements before feeding
> them to Bayes. And in all the tests I've done so far this seems to work,
> but now that we've upgraded to SA3.0.2 I can't peek 'under the hood'
> anymore to see if things are still being learned as they should.
On a related note, if I grab messages from a maildir after
spamassassin has "quarantined" them ("The original message has
been attached to this so you can view it... yadda yadda") is
sa-learn smart enough to realize that the spam is contained in
the attachment? Or is this the same situation as a user-forward,
where I would need to write something to strip it out?
And as an aside, I'm curious about "peeking under the hood" too,
but in my case it's because I'm curious how many messages have
been trained. (In order to find out how soon the filter is going
to think the corpus is large enough to start using its bayes
rules.)
TIA. -g.
Re: Testing Bayes (auto)-learning
Posted by Paul Boven <p....@chello.nl>.
Hi Daryl, everyone,
Daryl C. W. O'Shea wrote:
> Paul Boven wrote:
>> My problem is that I have end-users that are basically claiming 'the
>> more I send to the relearn-address, the lower the Bayes score seems to
>> be getting.' The included headers seem to support that claim, so I
>> really want to dig a bit deeper into the whole setup.
> That there sounds like your problem. How are your users sending mail to
> the 'relearn address'? If they're not forwarding messages as an
> attachment, and you're not striping out these attached messages then it
> isn't going to work to your benefit, and you'll see the result you
> describe.
Yes, they're forwarding the messages as attachements, and yes, I'm
stripping them out of the message/rfc822 attachements before feeding
them to Bayes. And in all the tests I've done so far this seems to work,
but now that we've upgraded to SA3.0.2 I can't peek 'under the hood'
anymore to see if things are still being learned as they should.
Regards, Paul Boven.
Re: Testing Bayes (auto)-learning
Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Paul Boven wrote:
> My problem is that I have end-users that are basically claiming 'the
> more I send to the relearn-address, the lower the Bayes score seems to
> be getting.' The included headers seem to support that claim, so I
> really want to dig a bit deeper into the whole setup.
That there sounds like your problem. How are your users sending mail to
the 'relearn address'? If they're not forwarding messages as an
attachment, and you're not striping out these attached messages then it
isn't going to work to your benefit, and you'll see the result you describe.
Daryl