You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by aktor <ak...@aktornet.ath.cx> on 2005/03/18 01:42:53 UTC

bayes test

Hi,

I wonder how I have to train spamassassin to get bayes_XX test start
working.

I have a rule that trains the bayessian filter with each email y
received with the sa-learn tool. After some months of training (I
thought I needed 200 of spam and 200 of ham) I haven't seen it yet.

The last spam my spamassassin caught it had these tests:

>>>
Return-Path: <yj...@hotmail.com>
X-Original-To: aktor{@|aktornet.ath.cx
Delivered-To: aktor{@|aktornet.ath.cx
Received: from 203.90.52.8 (unknown [203.90.52.8])
	by aktornet.ath.cx (Postfix) with SMTP id 375F6BB49
	for <aktor{@|aktornet.ath.cx>; Thu, 17 Mar 2005 06:19:52 +0100 (CET)
From: ydlBobby <yj...@hotmail.com>
To: aktor{@|aktornet.ath.cx
Subject: Better than Vìagra and cheaper, too! npdu
Sender: ydlBobby <yj...@hotmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Date: Wed, 16 Mar 2005 22:25:20 -0600
X-Mailer: Microsoft Outlook Express 5.00.2615.200
Message-Id: <20...@aktornet.ath.cx>
X-Virus-Scanned: por AMAVIS + CLAMAV en aktornet.ath.cx
X-Amavis-Alert: BAD HEADER Non-encoded 8-bit data (char EC hex) in
message header 'Subject'	Subject: Better than V\354agra and cheape... ^
X-Spam-Status: Yes, hits=11.1 tagged_above=0.0 required=4.0
	tests=DRUGS_ERECTILE, DRUGS_ERECTILE_OBFU, FORGED_HOTMAIL_RCVD2,
	FORGED_MUA_OUTLOOK, INFO_TLD, MSGID_FROM_MTA_ID, RCVD_NUMERIC_HELO
X-Spam-Level: ***********
X-Spam-Flag: YES
<<<

No BAYES_XX test.

I use spamassassin through amavisd-new, with Mail::SpamAssassin Perl
module, with default options.

aktor@AsteriX aktor $ sa-learn --dump magic
0.000   0          3     0  non-token data: bayes db version
0.000   0        568     0  non-token data: nspam
0.000   0       1996     0  non-token data: nham
0.000   0     203190     0  non-token data: ntokens
0.000   0 1086896787     0  non-token data: oldest atime
0.000   0 1111102059     0  non-token data: newest atime
0.000   0          0     0  non-token data: last journal sync atime
0.000   0 1111102285     0  non-token data: last expiry atime 
0.000   0   29436939     0  non-token data: last expire atime delta
0.000   0          0     0  non-token data: last expire reduction count

Do I have to do something else? What am I doing wrong?

Thank you,

aktor
-- 
Bienaventurados los pesimistas, porque ellos harán backups.
		-- Www.frases.com. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 

Re: bayes test

Posted by aktor <ak...@aktornet.ath.cx>.
Hi,

El Fri, 18 Mar 2005 10:54:08 +0530
crisppy fernandes escribió:

> You have not mentioned that rule and file in which you have written
> that rule. if you can tell then it will help others to reply better.
> anyway let me try to explain

I haven't written any rule by myself. I thought it should start
learning by itself.

Both files 

/etc/spamassassin/local.cf
~/.spamassassin

don't hace any directive as I use amavisd-new default settings

> > After some months of training (I
> > thought I needed 200 of spam and 200 of ham) I haven't seen it yet.
> > The last spam my spamassassin caught it had these tests:
> 
> yes its mentioned in spamassassin wiki documentation but reality is
> much more than this.

Ok. That's gonna be the problem. Which is the "real" number of emails
needed to start the bayessian filter to work?

Thx,

aktor
-- 
Compre un MODEM, navegue en Internet: gane amigos y pierda a su mujer.
		-- Www.frases.com. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 

Re: bayes test

Posted by aktor <ak...@aktornet.ath.cx>.
Hi again,

El Fri, 18 Mar 2005 10:54:08 +0530
crisppy fernandes escribió:

> Any rule you write or scores you change do not forget to run the
> command  spamassassin --lint
> and for debugging you can add -D option.

AsteriX root # amavisd-new debug-sa
[..]
debug: bayes: 20621 tie-ing to DB file R/O
/var/lib/amavis/.spamassassin/bayes_toks debug: bayes: 20621 tie-ing to
DB file R/O /var/lib/amavis/.spamassassin/bayes_seen debug: bayes: found
bayes db version 3 debug: bayes: Not available for scanning, only 49
spam(s) in Bayes DB < 200 debug: bayes: 20621 untie-ing
debug: bayes: 20621 untie-ing db_toks
debug: bayes: 20621 untie-ing db_seen
debug: Score set 0 chosen.

I've got this architecture..

postfix -> amavisd-new -> postfix -> maildrop -> sa-learn -> mailbox
               |      |
               V      V
           clamav   spamassassin

So I would like to load per user bayes_toks and bayes_seen files.

I think my problem is that the only file used by spamassasssin is
/var/lib/amavis/.spamassassin/bayes_* and no per user ones

AsteriX root # sa-learn --dump magic --dbpath
/var/lib/amavis/.spamassassin/ 

0.000    0          3     0  non-token data: bayes db version 
0.000    0         49     0  non-token data: nspam 
                   ^^
0.000    0       5240     0  non-token data: nham 
0.000    0     164819     0  non-token data: ntokens 
0.000    0 1106523114     0  non-token data: oldest atime 
0.000    0 1111139568     0  non-token data: newest atime 
0.000    0 1106526477     0  non-token data: last journal sync atime  
0.000    0 1111123833     0  non-token data: last expiry atime 
0.000    0          0     0  non-token data: last expire atime delta
0.000    0          0     0  non-token data: last expire reduction
count


aktor@AsteriX aktor $ sa-learn --dump magic
0.000    0          3     0  non-token data: bayes db version
0.000    0        572     0  non-token data: nspam
                  ^^^
0.000    0       1996     0  non-token data: nham
0.000    0     203323     0  non-token data: ntokens
0.000    0 1086896787     0  non-token data: oldest atime
0.000    0 1111127201     0  non-token data: newest atime
0.000    0          0     0  non-token data: last journal sync atime 
0.000    0 1111102285     0  non-token data: last expiry atime 
0.000    0   29436939     0  non-token data: last expire atime delta 
0.000    0          0     0  non-token data: last expire reduction cou

Is there any way to solve this?

Thx,

aktor
-- 
El hombre todavía puede apagar el ordenador. Sin embargo, tendremos que
esforzarnos mucho para conservar este privilegio.
		-- J. Weizembaum. Sociólogo norteamericano experto en
		ordenadores. 

This mail is copyleft-ed to aktor under the terms of the CC License
(Creative Commons). 

Re: bayes test

Posted by crisppy fernandes <cr...@gmail.com>.
> I wonder how I have to train spamassassin to get bayes_XX test start
> working.
 
> I have a rule that trains the bayessian filter with each email y
> received with the sa-learn tool.

You have not mentioned that rule and file in which you have written
that rule. if you can tell then it will help others to reply better.
anyway let me try to explain

bayes_XX wrks purely on basis of probability. It tries to find out
tokens in the mail which
match to  earlier learned tokens. Its always better that bayes rules
should learn themselves.
but we can always create rules to enhance the chances of that rule
appear with other tests.
they have their default score which you can check in files:
/usr/share/spamassassin/* directory.
and user created rules you can write in either
/etc/mail/spamassassin/local.cf or user specific file in its home
directory user_prefs file.
Any rule you write or scores you change do not forget to run the command 
spamassassin --lint
and for debugging you can add -D option.


> After some months of training (I
> thought I needed 200 of spam and 200 of ham) I haven't seen it yet.
> The last spam my spamassassin caught it had these tests:

yes its mentioned in spamassassin wiki documentation but reality is
much more than this.
Read man sa-learn , that will help you in understanding the process better.

For further queries mail to the list.
-- 
Crisppy Fernandes