You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Luis Hernán Otegui <lu...@gmail.com> on 2008/02/18 18:24:36 UTC

Re: Bayes: What am I missing

2008/2/17, comparity <in...@manngo.net>:
>
>  I have found that in the last few months a lot of mail has been coming
> through. I believe that the bayes filter isn't working. None of the caught
> messages include a bayes score.
>
>  I have dutifully put all of my uncaught spam into a folder for the purposes
> of learning, and run sa-learn from time to time. Below is some information
> which may be relevant:
>
>  I am running spamassassin through procmail
>  SpamAssassin version 3.2.4
>  spamassassin -D bayes< ... indicates a bayes score
>  local.cf:
>      use_bayes               1
>      bayes_auto_learn              1
>      # From
> http://wiki.apache.org/spamassassin/SiteWideBayesSetup
>      bayes_path /etc/mail/spamassassin/bayes
>      bayes_file_mode 0770
>  sa-learn --dump magic
>      0.000          0          3          0  non-token data: bayes db
> version
>      0.000          0      14225          0  non-token data: nspam
>      0.000          0       9037          0  non-token data: nham
>      0.000          0     168352          0  non-token data: ntokens
>      0.000          0 1161931609          0  non-token data: oldest atime
>      0.000          0 1203213840          0  non-token data: newest atime
>      0.000          0 1203212640          0  non-token data: last journal
> sync atime
>      0.000          0 1203212721          0  non-token data: last expiry
> atime
>      0.000          0   11059200          0  non-token data: last expire
> atime delta
>      0.000          0      77173          0  non-token data: last expire
> reduction count
>
>  I have recently (a few months ago ...) cleared out the contents of the
> uncaught spam folders, reasoning that sa should have learned what it needs
> already. However, these folders now have hundreds of new spam to learn from.
>
>  Any ideas?
>
>  Mark
>
Well, what makes you think that Bayes is missing anything? SA needs to
be updated to work properly. Do you use sa-update?

How about sharing an uncaught message with the list? Then we could
have a better idea of what is failing.



> --
>
>
> Mark Simon
>
> Comparity Net
>  Computer Training & Support
>
> Phone/Fax: 1300 726 000
>  mobile: 0411 246 672
>
> email: mark@comparity.net
>  web: http://www.comparity.net
>
> Resume: http://mark.manngo.net
>  Calendar: http://www.comparity.net/calendar.php

Regards,


Luis
-- 
-------------------------------------------------
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
When I grow up, I wanna be like Theo...
-------------------------------------------------

Re: Bayes: What am I missing

Posted by spamis <sp...@pobladores.com>.


comparity wrote:
> 
> spamis wrote:
> 
>  
> Hi comparity,
> has you could fix the problem updating SA? 
> 
> ----
> 
> No, not as far as I can tell. I still get the same spam, and no
> indication that bayes has been applied. 
> 
> -- 
> 
> 

I'm using an older version of qmail-scanner-st. I have saw in the change log
a fixed issue wich maybe produce our problem (i don't know if you are using
qmail-scanner with spamassassin). I will test the new version as soon as I
have configured a test server. I will post my result in few days.

If I forget put here my result and you are interested, please remember me
it: spamis in pobladores (.com)

-- 
View this message in context: http://www.nabble.com/Bayes%3A-What-am-I-missing-tp15542012p15630417.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: autolearn vs sa-learn / Bayes

Posted by Diego Pomatta <in...@abelsonsa.com.ar>.
Luis Hernán Otegui escribió:
> Hola, Diego
>
> 2008/2/21, Diego Pomatta <in...@abelsonsa.com.ar>:
>   
>> Hello list.
>>
>>  Does the bayes system use a separate db for the "autolearn" mode?
>>
>>  Today I noticed that my SA bayes has 50 spam and 45 ham mails learned,
>>  when I thought the db had a lot more, because bayes IS being used.
>>
>>  # sa-learn --dump magic
>>  0.000          0          3          0  non-token data: bayes db version
>>  *0.000          0         50          0  non-token data: nspam
>>  0.000          0         45          0  non-token data: nham*
>>
>>  # spamassassin -D --lint
>>  ...
>>  [7896] dbg: bayes: found bayes db version 3
>>  [7896] dbg: bayes: DB journal sync: last sync: 0
>>  *[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes
>>  DB < 200*
>>  ...
>>
>>  In the beginning , after setting up SA, bayes was not being used.
>>  I had not trained it with anything yet, but my local.cf had:
>>  *use_bayes 1
>>  use_bayes_rules 1
>>  bayes_auto_learn 1*
>>
>>  Reading the logs I noticed that it was only autolearning spam, not ham.
>>  So I added
>>  *bayes_auto_learn_threshold_nonspam 0.5*
>>  and it started learning ham.
>>  I monitored the logs and at some point incoming mails started triggering
>>  the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
>>  So I figured it had autlearned the minimum needed amount of ham and spam
>>  (200) to start working.
>>  Every now and then I use sa-learn to feed some spam and ham to bayes,
>>  and I thought I was contributing to the same db. Those must be the 50
>>  spam and 45 ham mails.
>>
>>  So what's the deal? :)
>>  /Regards
>>
>>
>>     
>
> Well, a couple of questions should be answered first: how do you call
> SA? under which user does SA run? are you learning those mails under
> the right user? Which version are you running? do you use sa-update?
>
> Provided those questions, let's move to the core of this issue: As you
> said, you only have 50 spams and 45 hams learned. You should feed more
> data to SA, to make the Bayes scores kick-in. Normally, Bayes scores
> help SA to get better filtering (at least, they do here, and I suspect
> they'll help you too, since as you work in Argentina, your main locale
> should be Spanish, and you'll be getting mostly Argentinian spam).
>
> Regards,
>
> Luis
>   
Hey Luis. I forgot to add that info, duh.

The setup here is
qmail 3.05
simscan 1.3.1
SpamAssassin 3.2.1 (spamd/spamc)
sa-update is cron'ed to run daily ( no parameters = default channel -> 
updates.spamassassin.org, right? )

Simscan calls spamc under the user "simscan".
I did the manual feeding to sa-learn as root.
so... ummm. I guess root has the separate database and I've been using 
sa-learn with the wrong user...?
Ook, time to remove head from butt, and insert foot in mouth.... *lol*

Regards
Where are you from Luis?

Re: autolearn vs sa-learn / Bayes

Posted by Luis Hernán Otegui <lu...@gmail.com>.
Hola, Diego

2008/2/21, Diego Pomatta <in...@abelsonsa.com.ar>:
> Hello list.
>
>  Does the bayes system use a separate db for the "autolearn" mode?
>
>  Today I noticed that my SA bayes has 50 spam and 45 ham mails learned,
>  when I thought the db had a lot more, because bayes IS being used.
>
>  # sa-learn --dump magic
>  0.000          0          3          0  non-token data: bayes db version
>  *0.000          0         50          0  non-token data: nspam
>  0.000          0         45          0  non-token data: nham*
>
>  # spamassassin -D --lint
>  ...
>  [7896] dbg: bayes: found bayes db version 3
>  [7896] dbg: bayes: DB journal sync: last sync: 0
>  *[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes
>  DB < 200*
>  ...
>
>  In the beginning , after setting up SA, bayes was not being used.
>  I had not trained it with anything yet, but my local.cf had:
>  *use_bayes 1
>  use_bayes_rules 1
>  bayes_auto_learn 1*
>
>  Reading the logs I noticed that it was only autolearning spam, not ham.
>  So I added
>  *bayes_auto_learn_threshold_nonspam 0.5*
>  and it started learning ham.
>  I monitored the logs and at some point incoming mails started triggering
>  the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
>  So I figured it had autlearned the minimum needed amount of ham and spam
>  (200) to start working.
>  Every now and then I use sa-learn to feed some spam and ham to bayes,
>  and I thought I was contributing to the same db. Those must be the 50
>  spam and 45 ham mails.
>
>  So what's the deal? :)
>  /Regards
>
>

Well, a couple of questions should be answered first: how do you call
SA? under which user does SA run? are you learning those mails under
the right user? Which version are you running? do you use sa-update?

Provided those questions, let's move to the core of this issue: As you
said, you only have 50 spams and 45 hams learned. You should feed more
data to SA, to make the Bayes scores kick-in. Normally, Bayes scores
help SA to get better filtering (at least, they do here, and I suspect
they'll help you too, since as you work in Argentina, your main locale
should be Spanish, and you'll be getting mostly Argentinian spam).

Regards,

Luis
-- 
-------------------------------------------------
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
When I grow up, I wanna be like Theo...
-------------------------------------------------

autolearn vs sa-learn / Bayes

Posted by Diego Pomatta <in...@abelsonsa.com.ar>.
Hello list.

Does the bayes system use a separate db for the "autolearn" mode?

Today I noticed that my SA bayes has 50 spam and 45 ham mails learned, 
when I thought the db had a lot more, because bayes IS being used.

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
*0.000          0         50          0  non-token data: nspam
0.000          0         45          0  non-token data: nham*

# spamassassin -D --lint
...
[7896] dbg: bayes: found bayes db version 3
[7896] dbg: bayes: DB journal sync: last sync: 0
*[7896] dbg: bayes: not available for scanning, only 50 spam(s) in bayes 
DB < 200*
...

In the beginning , after setting up SA, bayes was not being used.
I had not trained it with anything yet, but my local.cf had:
*use_bayes 1
use_bayes_rules 1
bayes_auto_learn 1*

Reading the logs I noticed that it was only autolearning spam, not ham.
So I added
*bayes_auto_learn_threshold_nonspam 0.5*
and it started learning ham.
I monitored the logs and at some point incoming mails started triggering 
the BAYES_20, BAYES_50, BAYES_00, BAYES_95, BAYES_99, rules.
So I figured it had autlearned the minimum needed amount of ham and spam 
(200) to start working.
Every now and then I use sa-learn to feed some spam and ham to bayes, 
and I thought I was contributing to the same db. Those must be the 50 
spam and 45 ham mails.

So what's the deal? :)
/Regards


Re: Bayes: What am I missing

Posted by spamis <sp...@pobladores.com>.


comparity wrote:
> 
>   Do you use sa-update?
>   
> 
> No I don't. However, I have just run it. restarted spamassassin
> (service spamassassin restart), and I'll see what happens. 
> 

Hi comparity,

has you could fix the problem updating SA? 

-- 
View this message in context: http://www.nabble.com/Bayes%3A-What-am-I-missing-tp15542012p15607477.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Bayes: What am I missing

Posted by spamis <sp...@pobladores.com>.
Other guys wrote:
-----------------------
  Well, what makes you think that Bayes is missing anything? SA needs to
be updated to work properly. 

I keep all of the capture spam in a folder for examination. Even the
worst of the spam gives the following analysis: 

Content analysis details:   (17.0 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
 1.0 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type=
entry
 3.3 TVD_RCVD_IP4           TVD_RCVD_IP4
 1.6 TVD_RCVD_IP            TVD_RCVD_IP
 2.6 RCVD_NUMERIC_HELO      Received: contains an IP address used for HELO
 0.0 T_TVD_FW_GRAPHIC_ID1   BODY: T_TVD_FW_GRAPHIC_ID1
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.5 HTML_IMAGE_ONLY_04     BODY: HTML: images with 0-400 bytes of words
 2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
                [Blocked - see
&lt;http://www.spamcop.net/bl.shtml?59.92.110.10&gt; ]
 0.5 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
                            [59.92.110.10 listed in zen.spamhaus.org]
 2.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
 1.2 PART_CID_STOCK         Has a spammy image attachment (by Content-ID)
 0.0 PART_CID_STOCK_LESS    Has a spammy image attachment (by Content-ID,
                            more specific)
 0.1 RDNS_NONE              Delivered to trusted network by a host with no
rDNS
 0.0 STOCK_IMG_HTML         Stock spam image part, with distinctive HTML
 0.0 STOCK_IMG_HDR_FROM     Stock spam image part, with distinctive From
line 
with no mention of bayes. 

--------------

I have same problem. Some mails aren't be analyzed by bayes filter. My bayes
filter is trained correctly and work fine for many mails, but not for other. 
-- 
View this message in context: http://www.nabble.com/Bayes%3A-What-am-I-missing-tp15542012p15585811.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.