You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jean Caron <ca...@norac.net> on 2005/04/13 17:53:15 UTC
sa-learn - bayes training...
Folks,
I searched the archive, tried different things, yet I need to ask a few
questions.
I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works
great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every
night for ham and spam. My logs show how many msgs were inspected and how
many were learned. So far so good.
Here's the part I'm unsure of, I have one centralized bayes DB own by this
"dedicated" user. This user runs sa-learn against two shared folders, one
for ham and one for spam. All users (only a hand full) may populate the
shared folders. Many thousand msgs have gone through sa-learn. I thought
this was all too easy...
My problem is bayes does not seem to have any effect what so ever on the
amount of spam delivered to INBOXes. I keep receiving these low score spam
msgs still.
I now suspect this centralized DB, updated by this user alone, may not
produce the expected results. I've read in the archive that individual users
should run cron jobs against their own ham and spam folders. The issue with
this is that only one user has an actual shell defined on the system, so the
others can't run cron. Then again, that just a suspicion, I may be wrong,
and something else may be missing or mis-configured, and that's why I'm
posting this... I'm a little confused. I don't understand how bayes works
exactly, so I can't come to any helpfull conclusion about my setup.
Can anyone see through this and help me understand what is happening ?
Thanks in advance,
Jean
Re: sa-learn - bayes training...
Posted by Jean Caron <ca...@norac.net>.
I just had a chance to (finally) get back to this issue. I tried your
suggestion, changed the mode to 0777 and re-started spamd. Apparently
nothing changed.
I did however realize that bayes tests are listed in my log file, even
though they are not in the header of the msgs.
So, I have bayes autolearn working fine. The database is also fine (> 6000
ham & spam learned). My logs show all that's expected. The messages header
are missing the list of Bayes tests, but are otherwise fine. Spamassassin
--lint returns no error. I have the SARE rules installed. Running qmail,
with qmail-scanner v1.25 and SA 3.0.2. Everything works fine...
Yet, I still have a lot of spam (I know that's relative) that slips through,
more that before this SA upgrade. To show some numbers, I use to get a
couple of false negatives per day, if any, before the upgrade, now I get
anywhere from half a dozen to two dozens. Still much better that the 500
without SA, but not quite fine tuned enough for my taste.
Any suggestions as to where to look next would be appreciated.
Cheers,
Jean
Matt Kettler writes:
> Jean Caron wrote:
>
>>
>> Here's the bayes related I had in there already;
>> use_bayes 1
>> bayes_path /home/bayesUID/bayes
>> bayes_file_mode 0666
>> bayes_auto_learn 1
>> Jean
>
> Suggestion: set bayes_file_mode to 0777 not 0666.
>
> The bayes_file_mode is really a mask not literal permissions, so it
> won't result in executable bits being set for your bayes files. However,
> this mask is sometimes used in directory creation, where the x bit is
> quite appropriate.
>
> This is why the default is 0700, not 0600.
>
>
>
Re: sa-learn - bayes training...
Posted by Matt Kettler <mk...@evi-inc.com>.
Jean Caron wrote:
>
> Here's the bayes related I had in there already;
> use_bayes 1
> bayes_path /home/bayesUID/bayes
> bayes_file_mode 0666
> bayes_auto_learn 1
> Jean
Suggestion: set bayes_file_mode to 0777 not 0666.
The bayes_file_mode is really a mask not literal permissions, so it
won't result in executable bits being set for your bayes files. However,
this mask is sometimes used in directory creation, where the x bit is
quite appropriate.
This is why the default is 0700, not 0600.
Re: sa-learn - bayes training...
Posted by Jean Caron <ca...@norac.net>.
Alright. I find it strange that the defaults don't apply to my setup, but in
any case I added the following to local.cf and re-started spamd.
> add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_
Here's the bayes related I had in there already;
use_bayes 1
bayes_path /home/bayesUID/bayes
bayes_file_mode 0666
bayes_auto_learn 1
Jean
Kevin Peuhkurinen writes:
> Jean Caron wrote:
>
>> Really ? I never saw bayes score in the header. Sould ALL msgs have a
>> bayes score in the header ? Here's a sample header;
>> Received: from 80.231.10.208 by mail (envelope-from
>> <ol...@business-kc.com>, uid 1001) with qmail-scanner-1.25
>> (spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed
>> in 3.859362 secs); 14 Apr 2005 07:18:05 -0000
>> X-Spam-Status: No, hits=1.5 required=2.0
>> X-Spam-Level: +
>> Did I miss such an obvious switch somewhere ??
>> Jean
>>
> For some reason, SA is not adding the tests that the email hit in the
> X-Spam-Status header, as is the default. Without this information, it's
> difficult to tell what is going on. Look in your local.cf file for
> either a "remove_header" or "add_header" entry. Remove (or comment out)
> any of the former and if you have any of the latter, make sure they read:
>
> add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_
> autolearn=_AUTOLEARN_ version=_VERSION_
>
>
> After making the change, be sure to restart spamd. Then begin to moniter
> your false negatives. The headers should then show which tests are hit.
> Look for BAYES tests and see which they are hitting.
>
>
Re: sa-learn - bayes training...
Posted by Kevin Peuhkurinen <ke...@meridiancu.ca>.
Jean Caron wrote:
> Really ? I never saw bayes score in the header. Sould ALL msgs have a
> bayes score in the header ? Here's a sample header;
> Received: from 80.231.10.208 by mail (envelope-from
> <ol...@business-kc.com>, uid 1001) with qmail-scanner-1.25
> (spamassassin: 3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):.
> Processed in 3.859362 secs); 14 Apr 2005 07:18:05 -0000
> X-Spam-Status: No, hits=1.5 required=2.0
> X-Spam-Level: +
> Did I miss such an obvious switch somewhere ??
> Jean
>
For some reason, SA is not adding the tests that the email hit in the
X-Spam-Status header, as is the default. Without this information,
it's difficult to tell what is going on. Look in your local.cf file
for either a "remove_header" or "add_header" entry. Remove (or
comment out) any of the former and if you have any of the latter, make
sure they read:
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
After making the change, be sure to restart spamd. Then begin to
moniter your false negatives. The headers should then show which tests
are hit. Look for BAYES tests and see which they are hitting.
Re: sa-learn - bayes training...
Posted by Jean Caron <ca...@norac.net>.
Really ? I never saw bayes score in the header. Sould ALL msgs have a bayes
score in the header ? Here's a sample header;
Received: from 80.231.10.208 by mail (envelope-from
<ol...@business-kc.com>, uid 1001) with qmail-scanner-1.25 (spamassassin:
3.0.2. Clear:RC:0(80.231.10.208):SA:0(1.5/2.0):. Processed in 3.859362
secs); 14 Apr 2005 07:18:05 -0000
X-Spam-Status: No, hits=1.5 required=2.0
X-Spam-Level: +
Did I miss such an obvious switch somewhere ??
Jean
Phil Barnett writes:
> On Friday 15 April 2005 08:03 am, Jean Caron wrote:
>
>> Again, how can I tell for sure ?
>
> Look in the header and see what the bayes score was on the FN.
>
> --
>
> "In the beginning of a change, the patriot is a brave and scarce man, hated
> and scorned. When the cause succeeds, however, the timid join him...for then
> it costs nothing to be a patriot." -Mark Twain
>
Re: sa-learn - bayes training...
Posted by Phil Barnett <ph...@philb.us>.
On Friday 15 April 2005 08:03 am, Jean Caron wrote:
> Again, how can I tell for sure ?
Look in the header and see what the bayes score was on the FN.
--
"In the beginning of a change, the patriot is a brave and scarce man, hated
and scorned. When the cause succeeds, however, the timid join him...for then
it costs nothing to be a patriot." -Mark Twain
Re: sa-learn - bayes training...
Posted by Jean Caron <ca...@norac.net>.
Kevin, my comments/questions are inline.
Kevin Peuhkurinen writes:
> Jean Caron wrote:
>
>> Kevin, your assumption is correct, user accounts are on the server and
>> spamc is used. I already have the central DB setup using bayes_path in
>> local.cf.
>> I think what you are saying confirms what I suspected, but it's still not
>> 100% clear. Even though I have a central DB, all users must train it
>> individually, is that it ?
>> For example, if UserA populates the shared folders respectively with ham
>> and spam from messages he/she received, if UserB trains the central DB
>> against those msgs, it will have no effect for UserA ? All users must
>> individually train the central DB even though they train using the same
>> msgs from the same shared folders ?
>> Sorry if I seem a little dense, but I think I'm getting it. I hope !
>> Jean
>>
> If you have bayes_path set, then all users should be using just the one
> DB, and any training that one user does will affect the results for all
> other users.
Hummm... That's what I *thought*, but then the results led me to beleive
otherwise, and now you are confirming that only one user can learn for all.
> So, presuming that the permissions on the Bayes files are
> set correctly so that all of your users have access to it, it would seem
> that you do have things set up properly.
I thought so, but something is not doing its "thing".
>
> It is possible that the database is corrupt.
How can I tell for sure ? As far as I can tell, using spamassassin --lint,
sa-learn --dump, etc. the results seem to indicate a healthy DB.
> Have you in fact
> determined that most or all of your false negatives are due to low Bayes
> scores?
>
Again, how can I tell for sure ? My main lead here is that since I upgraded
to 3.0.2, I also changed from owning the DB myself, as a regular user, to
making it system wide owned and trained by a dedicated user. And since then,
I went from a handfull of false negatives a day, to almost a hundred. At
first, and this is where I may have assumed wrong, I thought well alright I
have a brand new DB and it needs to be trained that's all. I gave it enough
time and training, but it never got better. I still have way more FN than I
use to. I've also recently (this week) added the SARE rules, and the results
are not much better.
Jean
Re: sa-learn - bayes training...
Posted by Kevin Peuhkurinen <ke...@meridiancu.ca>.
Jean Caron wrote:
> Kevin, your assumption is correct, user accounts are on the server and
> spamc is used. I already have the central DB setup using bayes_path in
> local.cf.
> I think what you are saying confirms what I suspected, but it's still
> not 100% clear. Even though I have a central DB, all users must train
> it individually, is that it ?
> For example, if UserA populates the shared folders respectively with
> ham and spam from messages he/she received, if UserB trains the
> central DB against those msgs, it will have no effect for UserA ? All
> users must individually train the central DB even though they train
> using the same msgs from the same shared folders ?
> Sorry if I seem a little dense, but I think I'm getting it. I hope !
> Jean
>
If you have bayes_path set, then all users should be using just the one
DB, and any training that one user does will affect the results for all
other users. So, presuming that the permissions on the Bayes files are
set correctly so that all of your users have access to it, it would seem
that you do have things set up properly.
It is possible that the database is corrupt. Have you in fact
determined that most or all of your false negatives are due to low Bayes
scores?
>
Re: sa-learn - bayes training...
Posted by Jean Caron <ca...@norac.net>.
Kevin, your assumption is correct, user accounts are on the server and spamc
is used. I already have the central DB setup using bayes_path in local.cf.
I think what you are saying confirms what I suspected, but it's still not
100% clear. Even though I have a central DB, all users must train it
individually, is that it ?
For example, if UserA populates the shared folders respectively with ham and
spam from messages he/she received, if UserB trains the central DB against
those msgs, it will have no effect for UserA ? All users must individually
train the central DB even though they train using the same msgs from the
same shared folders ?
Sorry if I seem a little dense, but I think I'm getting it. I hope !
Jean
Kevin Peuhkurinen writes:
> Jean Caron wrote:
>
>> Folks,
>> I searched the archive, tried different things, yet I need to ask a few
>> questions.
>> I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works
>> great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user every
>> night for ham and spam. My logs show how many msgs were inspected and how
>> many were learned. So far so good.
>> Here's the part I'm unsure of, I have one centralized bayes DB own by
>> this "dedicated" user. This user runs sa-learn against two shared
>> folders, one for ham and one for spam. All users (only a hand full) may
>> populate the shared folders. Many thousand msgs have gone through
>> sa-learn. I thought this was all too easy...
>> My problem is bayes does not seem to have any effect what so ever on the
>> amount of spam delivered to INBOXes. I keep receiving these low score
>> spam msgs still.
>> I now suspect this centralized DB, updated by this user alone, may not
>> produce the expected results. I've read in the archive that individual
>> users should run cron jobs against their own ham and spam folders. The
>> issue with this is that only one user has an actual shell defined on the
>> system, so the others can't run cron. Then again, that just a suspicion,
>> I may be wrong, and something else may be missing or mis-configured, and
>> that's why I'm posting this... I'm a little confused. I don't understand
>> how bayes works exactly, so I can't come to any helpfull conclusion about
>> my setup.
>> Can anyone see through this and help me understand what is happening ?
>> Thanks in advance,
>> Jean
>>
> Jean,
> I'm not entirely sure based on the information you provided how spamd is
> getting called, but I'm quite sure that your setup is not doing what you
> expect it to. I'm guessing since you say that you are using procmail
> that you have user accounts set up on the server itself and that spamc is
> being called as individual users from .forward files. If this is the
> case, then each user will have a .spamassassin/ directory in their home
> which will contain their own personal Bayes database. Your problem is
> that you have one particular user who runs sa-learn, so only their Bayes
> DB is being trained (other than through the auto-learning feature, that
> is, which is updating the individual databases).
>
> One easy option you can consider is the use of a global Bayes DB for all
> your users instead of each of them having their own personal DB. Bayes
> tends to be less effective with global rather than personal databases, but
> only if the individual users are able to do their own training. You
> could do this fairly easily by setting the "bayes_path" option in your
> /etc/mail/spamassassin/local.cf file and have it point the .spamassassin/
> directory of the user who is doing all the sa-learn training.
>
> Hope that helps.
> Kevin
>
Re: sa-learn - bayes training...
Posted by Kevin Peuhkurinen <ke...@meridiancu.ca>.
Jean Caron wrote:
> Folks,
> I searched the archive, tried different things, yet I need to ask a
> few questions.
> I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works
> great. Bayes auto-learns ok, I run sa-learn from a "dedicated" user
> every night for ham and spam. My logs show how many msgs were
> inspected and how many were learned. So far so good.
> Here's the part I'm unsure of, I have one centralized bayes DB own by
> this "dedicated" user. This user runs sa-learn against two shared
> folders, one for ham and one for spam. All users (only a hand full)
> may populate the shared folders. Many thousand msgs have gone through
> sa-learn. I thought this was all too easy...
> My problem is bayes does not seem to have any effect what so ever on
> the amount of spam delivered to INBOXes. I keep receiving these low
> score spam msgs still.
> I now suspect this centralized DB, updated by this user alone, may not
> produce the expected results. I've read in the archive that individual
> users should run cron jobs against their own ham and spam folders. The
> issue with this is that only one user has an actual shell defined on
> the system, so the others can't run cron. Then again, that just a
> suspicion, I may be wrong, and something else may be missing or
> mis-configured, and that's why I'm posting this... I'm a little
> confused. I don't understand how bayes works exactly, so I can't come
> to any helpfull conclusion about my setup.
> Can anyone see through this and help me understand what is happening ?
> Thanks in advance,
> Jean
>
Jean,
I'm not entirely sure based on the information you provided how spamd is
getting called, but I'm quite sure that your setup is not doing what you
expect it to. I'm guessing since you say that you are using procmail
that you have user accounts set up on the server itself and that spamc
is being called as individual users from .forward files. If this is
the case, then each user will have a .spamassassin/ directory in their
home which will contain their own personal Bayes database. Your
problem is that you have one particular user who runs sa-learn, so only
their Bayes DB is being trained (other than through the auto-learning
feature, that is, which is updating the individual databases).
One easy option you can consider is the use of a global Bayes DB for all
your users instead of each of them having their own personal DB. Bayes
tends to be less effective with global rather than personal databases,
but only if the individual users are able to do their own training.
You could do this fairly easily by setting the "bayes_path" option in
your /etc/mail/spamassassin/local.cf file and have it point the
.spamassassin/ directory of the user who is doing all the sa-learn training.
Hope that helps.
Kevin