You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Michael B Allen <io...@gmail.com> on 2015/06/23 01:55:04 UTC

No BAYES_XX tags in X-Spam-Report

How can I tell if SA is tagging using bayes?

[root@www .spamassassin]# pwd
/var/log/spamassassin/.spamassassin
[root@www .spamassassin]# ls -la
total 1100
drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
-rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
-rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
-rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
[root@www .spamassassin]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        301          0  non-token data: nspam
0.000          0      11236          0  non-token data: nham
0.000          0     419941          0  non-token data: ntokens
0.000          0 1150469108          0  non-token data: oldest atime
0.000          0 1435015894          0  non-token data: newest atime
0.000          0 1435016419          0  non-token data: last journal sync atime
0.000          0 1435016432          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire
reduction count
[root@www .spamassassin]# cat ../../maillog | grep BAYES
[root@www .spamassassin]#

I don't see any BAYES_ tags in X-Spam-Report.

I'm using a default SA install on CentOS 7.

Do I need a local.cf? From looking at the docs, it claims bayes is
enabled by default. How can I check this?

Or maybe I haven't sa-learn'd enough spam?

Mike

Re: No BAYES_XX tags in X-Spam-Report

Posted by RW <rw...@googlemail.com>.

On Tue, 23 Jun 2015 02:04:03 +0200
Reindl Harald wrote:

> oh and independent of not running as root you have only 301 spam 
> messages while the docs clearly state you need at least 400 ham as
> well as 400 spam samples

It's 200 of each.

Re: No BAYES_XX tags in X-Spam-Report

Posted by Reindl Harald <h....@thelounge.net>.

oh and independent of not running as root you have only 301 spam 
messages while the docs clearly state you need at least 400 ham as well 
as 400 spam samples

Am 23.06.2015 um 02:01 schrieb Reindl Harald:
> Am 23.06.2015 um 01:55 schrieb Michael B Allen:
>> How can I tell if SA is tagging using bayes?
>
> if you see the bayes tags in headers and logs
>
>> [root@www .spamassassin]# pwd
>> /var/log/spamassassin/.spamassassin
>> [root@www .spamassassin]# ls -la
>> total 1100
>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>
> i doubt that SA is using the bayes of root
> so you just rain the wrong bayes
>
>> [root@www .spamassassin]# sa-learn --dump magic
>> 0.000          0          3          0  non-token data: bayes db version
>> 0.000          0        301          0  non-token data: nspam
>> 0.000          0      11236          0  non-token data: nham
>> 0.000          0     419941          0  non-token data: ntokens
>> 0.000          0 1150469108          0  non-token data: oldest atime
>> 0.000          0 1435015894          0  non-token data: newest atime
>> 0.000          0 1435016419          0  non-token data: last journal
>> sync atime
>> 0.000          0 1435016432          0  non-token data: last expiry atime
>> 0.000          0          0          0  non-token data: last expire
>> atime delta
>> 0.000          0          0          0  non-token data: last expire
>
> again: SA don'trun as root in any sane setup and hence won't use *that*
> bayes-db
>
>> reduction count
>> [root@www .spamassassin]# cat ../../maillog | grep BAYES
>> [root@www .spamassassin]#
>>
>> I don't see any BAYES_ tags in X-Spam-Report.
>
> see above
>
>> I'm using a default SA install on CentOS 7.
>
> what is a "default install"?
> it can be spamass-milter or anything else calling SA
>
>> Do I need a local.cf? From looking at the docs, it claims bayes is
>> enabled by default. How can I check this?
>>
>> Or maybe I haven't sa-learn'd enough spam?
>
> no, you just rain the wrong bayes and NEVER EVER should run such things
> as root - http://wiki.apache.org/spamassassin/SiteWideBayesSetup
>
> we are running spamd as well as spamass-milter with it's user and to be
> explicit have the following line in /etc/mail/spamassassin/local.cf
> while it's not strictly needed in that case since that's the userhome
> bayes_path /var/lib/spamass-milter/.spamassassin/bayes

Re: No BAYES_XX tags in X-Spam-Report

Posted by Michael B Allen <io...@gmail.com>.

On Mon, Jun 22, 2015 at 9:45 PM, Michael B Allen <io...@gmail.com> wrote:
> and after running sa-learn again (as root) on ham, my db is now broken:
>
> [root@www .spamassassin]# sa-learn --dump magic
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> ERROR: Bayes dump returned an error, please re-run with -D for more information

Well now the DB is no longer broken and I didn't do anything. I guess
spamassassin ran and detected the broken DB and rebuilt it?

[root@www .spamassassin]# ls -la
total 9288
drwx------ 2 spamd spamd     4096 Jun 22 21:46 .
drwx------ 3 spamd spamd     4096 Jun  7 00:41 ..
-rw------- 1 spamd spamd     2280 Jun 22 21:46 bayes_journal
-rw------- 1 spamd spamd  1306624 Jun 22 21:38 bayes_seen
-rw------- 1 spamd spamd 10485760 Jun 22 21:38 bayes_toks
-rw-r--r-- 1 spamd spamd     1869 Jun  7 00:41 user_prefs
[root@www .spamassassin]# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        378          0  non-token data: nspam
0.000          0      10821          0  non-token data: nham
0.000          0     413216          0  non-token data: ntokens
0.000          0 1150469108          0  non-token data: oldest atime
0.000          0 1435023513          0  non-token data: newest atime
0.000          0 1435023514          0  non-token data: last journal sync atime
0.000          0 1435023525          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire
reduction count

Weird.

But now I am seeing:

X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=unavailable
    version=3.4.0

I have restarted spamd with systemctrl {stop,start} spamassassin.
Hopefully it was just a transient issue.

Mike

Re: No BAYES_XX tags in X-Spam-Report

Posted by RW <rw...@googlemail.com>.

On Wed, 24 Jun 2015 17:14:37 -0400
Bill Cole wrote:


> You snipped out what I was specifically responding to:
> 
> On 22 Jun 2015, at 21:45 , Michael B Allen wrote:
> 
> > bayes_file_mode 0777

Ok, I misunderstood. I thought you were referring to things being run as
root.

Re: No BAYES_XX tags in X-Spam-Report

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 24 Jun 2015, at 16:21, RW wrote:

> On Mon, 22 Jun 2015 22:42:09 -0400
> Bill Cole wrote:
>
>> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>
>>> So with a default install (CentOS 7 in my case and I suspect pretty
>>> much all other systems), bayes will NOT just work by default unless
>>> you explicitly modify /etc/mail/spamassassin/local.cf to tell
>>> sa-learn to use the bayes db owned by spamd
>>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the
>>> one owned by root?
>
>> Don't do that, ever, on any regular file, on any system that has
>> processes running as more than just root. I know it's in the SA Wiki,
>> but it's an irresponsible recommendation.
>
>
> The default is that spamd starts as root and its children  drop
> privileges  and run as the user running spamc. Running spamc as root 
> is
> the source of the myth that SA by default stores its data under /root.

Yes, if that's how you run spamd, the user running spamc determines 
which per-user config & DBs to use. Which is not actually relevant to 
this thread.

> spamd can also start as root and then drop  to the unprivileged user
> once it's bound to its port.

Yes, and based on OP's description that's *specifically* the 
configuration being discussed: spamd "running as the user spamd" which 
only makes sense as meaning it include "-u spamd" in its args. Also: an 
absolute and rather odd bayes_path.

> I don't know the wiki passage you are referring to, but I'd be 
> surprised
> if it's actually advocating doing mail scans as root.

You snipped out what I was specifically responding to:

On 22 Jun 2015, at 21:45 , Michael B Allen wrote:

> bayes_file_mode 0777

That is used as an example at 
http://wiki.apache.org/spamassassin/SiteWideBayesSetup so it is 
understandable why it gets used. The text following it denies the 
recommendation, but quite weakly.

I've actually found that page useful for screening sysadmin job 
candidates, without any expectation that they understand SA to find the 
problem. It is much better to never hire the one who will have to be 
instructed later on the generic error of using mode 0777.

Re: No BAYES_XX tags in X-Spam-Report

Posted by RW <rw...@googlemail.com>.

On Mon, 22 Jun 2015 22:42:09 -0400
Bill Cole wrote:

> On 22 Jun 2015, at 21:45, Michael B Allen wrote:

> > So with a default install (CentOS 7 in my case and I suspect pretty
> > much all other systems), bayes will NOT just work by default unless
> > you explicitly modify /etc/mail/spamassassin/local.cf to tell
> > sa-learn to use the bayes db owned by spamd
> > (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the
> > one owned by root?

> Don't do that, ever, on any regular file, on any system that has 
> processes running as more than just root. I know it's in the SA Wiki, 
> but it's an irresponsible recommendation.

The default is that spamd starts as root and its children  drop
privileges  and run as the user running spamc. Running spamc as root is
the source of the myth that SA by default stores its data under /root. 

spamd can also start as root and then drop  to the unprivileged user
once it's bound to its port.

I don't know the wiki passage you are referring to, but I'd be surprised
if it's actually advocating doing mail scans as root.

Re: No BAYES_XX tags in X-Spam-Report

Posted by Reindl Harald <h....@thelounge.net>.


Am 23.06.2015 um 18:48 schrieb Bill Cole:
> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
> SYSTEM-WIDE DB ****

says who? that below is the rsult of a customized sa-learn script for a 
ton of users working like a charm on a spamass-milter setup for 10 
months now

[root@mail-gw:~]$ bayes-stats.sh
0      30009    SPAM
0      17486    HAM
0    2125684    TOKEN

insgesamt 64M
-rw------- 1 sa-milt sa-milt 5,0M 2015-06-23 18:16 bayes_seen
-rw------- 1 sa-milt sa-milt  80M 2015-06-23 18:16 bayes_toks
-rw------- 1 sa-milt sa-milt   98 2015-02-17 11:37 user_prefs

BAYES_00        27854   71.55 %
BAYES_05          856    2.19 %
BAYES_20          955    2.45 %
BAYES_40          857    2.20 %
BAYES_50         3398    8.72 %
BAYES_60          359    0.92 %
BAYES_80          390    1.00 %
BAYES_95          298    0.76 %
BAYES_99         3962   10.17 %
BAYES_999        3654    9.38 %

DELIVERED       44487   89.75 %
DNSWL           41562   83.85 %
SPF             19953   40.25 %
SPF/DKIM WL     10265   20.70 %
SHORTCIRCUIT    10617   21.41 %
BLOCKED          6154   12.41 %

Re: No BAYES_XX tags in X-Spam-Report

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 23 Jun 2015, at 14:58, Michael B Allen wrote:

> On Tue, Jun 23, 2015 at 12:48 PM, Bill Cole
> <sa...@billmail.scconsult.com> wrote:
>>> Yes, I want a system-wide bayes db. And I am running spamd and spamc
>>> and I assume that is all working (but of course I have no idea if it
>>> really is).
>>>
>>> But I want users to be able to put spams that get through into
>>> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to 
>>> run
>>> sa-learn on all of those messages for the system-wide db.
>>>
>>> So can that be done without running sa-learn as root?
>>
>>
>> Of course. As I said in other words that you quoted but apparently
>> misunderstood:
>>
>> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
>> SYSTEM-WIDE DB ****
>>
>> Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it 
>> run as
>> the user whose magic maildirs are being learned. It talks to the 
>> spamd
>> daemon, running as the spamd user, managing the system-wide Bayes DB. 
>> If it
>> isn't run as root, it can't do random violence limited only by your 
>> capacity
>> for typos.
>
> Well, ever since we stopped using The UNIX® Time-Sharing System back
> in '87 generally "users" don't run stuff on their own like this
> anymore.

Sure, but if you're using Real Users (i.e. if diverse ownership of 
Maildirs is an actual system issue) then maybe you populate a crontab 
for each one as well. Or not. My point is that if you have spamd running 
as the user spamd, it will only ever operate based on the SpamAssassin 
configuration for the user spamd, never as if it were root. No matter 
how you run spamc, it can't make spamd break ownership of the DB files 
so that spamd can't continue to use them. Because sa-learn running as 
root is a root process manipulating files itself (not mediated by spamd) 
you need to be careful about how you invoke it because you MIGHT end up 
with something like this:

# ls -l ~spamd/.spamassassin/
total 400461
-rw-------  1 spamd  spamd   80642048 Jun 23 19:24 auto-whitelist
-rw-------  1 root   spamd      51264 Jun 23 04:29 bayes_journal
-rw-------  1 spamd  spamd  324435968 Jun 23 19:24 bayes_seen
-rw-------  1 spamd  spamd    5046272 Jun 23 19:24 bayes_toks
-rw-r--r--  1 spamd  spamd       1869 Jul 17  2011 user_prefs

(Sigh.... gotta go spank someone...)

> But if spamc -L could consume an entire Maildir without requiring an
> awk expert, that would be great.

No awk needed. Assuming the Maildir gets cleaned out so you aren't 
constantly trying to re-learn an ever-growing pile of old messages:

cd ~$USER/Maildir/.LearnAsSpam/cur
for x in *; do spamc -L spam < $x & done

Replace the '&' with a ';' if you find the concurrency a problem.

A bit fancier, run it hourly for rapid learning of fewer messages, still 
no awk, :

for x in $( find /home/*/Maildir/.LearnAsSpam/cur/ -type f -cmin -61 ) ; 
spamc -L spam < $x & done

Re: No BAYES_XX tags in X-Spam-Report

Posted by Michael B Allen <io...@gmail.com>.

On Tue, Jun 23, 2015 at 12:48 PM, Bill Cole
<sa...@billmail.scconsult.com> wrote:
>> Yes, I want a system-wide bayes db. And I am running spamd and spamc
>> and I assume that is all working (but of course I have no idea if it
>> really is).
>>
>> But I want users to be able to put spams that get through into
>> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
>> sa-learn on all of those messages for the system-wide db.
>>
>> So can that be done without running sa-learn as root?
>
>
> Of course. As I said in other words that you quoted but apparently
> misunderstood:
>
> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
> SYSTEM-WIDE DB ****
>
> Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it run as
> the user whose magic maildirs are being learned. It talks to the spamd
> daemon, running as the spamd user, managing the system-wide Bayes DB. If it
> isn't run as root, it can't do random violence limited only by your capacity
> for typos.

Well, ever since we stopped using The UNIX® Time-Sharing System back
in '87 generally "users" don't run stuff on their own like this
anymore.

But if spamc -L could consume an entire Maildir without requiring an
awk expert, that would be great.

Mike

Re: No BAYES_XX tags in X-Spam-Report

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 23 Jun 2015, at 0:05, Michael B Allen wrote:

> On Mon, Jun 22, 2015 at 10:42 PM, Bill Cole
> <sa...@billmail.scconsult.com> wrote:
>> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>>
>>> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald 
>>> <h....@thelounge.net>
>>> wrote:
>>>>>
>>>>> [root@www .spamassassin]# pwd
>>>>> /var/log/spamassassin/.spamassassin
>>>>> [root@www .spamassassin]# ls -la
>>>>> total 1100
>>>>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>>>>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>>>>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>>>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>>>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>>>>
>>>>
>>>>
>>>> i doubt that SA is using the bayes of root
>>>> so you just rain the wrong bayes
>>>
>>>
>>> So with a default install (CentOS 7 in my case and I suspect pretty
>>> much all other systems), bayes will NOT just work by default unless
>>> you explicitly modify /etc/mail/spamassassin/local.cf to tell 
>>> sa-learn
>>> to use the bayes db owned by spamd
>>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the 
>>> one
>>> owned by root?
>>>
>>> However, I have done this:
>>>
>>> bayes_path /var/log/spamassassin/.spamassassin/bayes
>>> bayes_file_mode 0777
>>
>>
>> Don't do that, ever, on any regular file, on any system that has 
>> processes
>> running as more than just root. I know it's in the SA Wiki, but it's 
>> an
>> irresponsible recommendation.
>
> Yeah, I was going to ask about this because it seems to me if the db
> is owned by spamd and spamassassin is running as user spamd and
> sa-learn is running as root then 0600 should be fine (although it's
> not obvious to me why SA needs a "file mode" in the first place).

A diversity of rigs. SA isn't the spamd daemon or the spamc client or 
$PERLLIB/Mail/SpamAssassin.pm or the configured ruleset, it is the whole 
tree of Perl modules in the Mail::SpamAssassin namespace plus *maybe* 
spamd/spamc, the rules, and subsidiary utilities using them like 
sa-learn. Different sites use the Perl framework and tools in different 
ways, so they need different ownership & permission settings. As I don't 
use SpamAssassin on CentOS (or RHEL) I'm not sure precisely what the 
default SA rig there looks like, and how (if at all) RedHat has hooked 
it into Postfix(?) so I can't explain much about the specifics of what 
you get from 'yum install spamassassin'.

More specifically: in its simplest form, SA is designed to be used by 
each of many unprivileged users with independent Bayes DBs fed & used by 
local mail delivery and pre-delivery filtering processes and sa-learn or 
an equivalent tool for learning messages post-delivery. The 
bayes_file_mode defaults to 0700 and usually need not be changed, but on 
some OS's with some mail subsystems it may be necessary to adjust that 
to allow a delivery agent or other component (e.g. filtering tools) 
running as something other than root OR the individual mail recipient to 
read or maybe even write to the users' individual Bayes DBs. You should 
NOT need it changed on a system that only uses a system-wide Bayes DB.

> So then what do you recommend that the bayes_file_mode value be 
> precisely?

The default is usually fine. That's why it is the default. Note that 
this value is only applied when creating a new file in the Bayes DB 
(which is composed of multiple files) so it is possible for the effects 
of changing it to be delayed. If RedHat's packaging of SpamAssassin 
includes a different value, I'd suggest not changing it. Also, moving 
your DB into /var/log/spamassassin/ is a quirky choice that might not be 
compatible with RedHat's integration choices in the package they 
distribute (and which CentOS replicates.) It's your system and your 
choices of course...

> At any rate, the whole thing seems to be working now incidentally. I
> am getting BAYES_XX tags now.

Yes. As documented, you don't get messages scored by the Bayes component 
until it has built an adequate learned history of both ham and spam to 
do valid scoring.

> As stated in my other followup message,
> SA seems to have detected the broken db and fixed it because it
> suddenly just stated working and sa-learn --dump magic works and is
> showing the right numbers.

Well, I'm not convinced that's exactly how it worked, but I'm glad you 
seem to have it working.

Note that 'sa-learn' DOES NOT talk to spamd, it uses the SA config that 
it finds for the user running it to figure out which rules it should use 
and where to find the Bayes DB (and AWL or TxRep DBs) for that user. If 
you have spamd running to use a system-wide 
config/ruleset/Bayes/(AWL|TxRep) you should get in the habit of using 
spamc to communicate with the daemon rather than running sa-learn as 
root and relying on a quirky config to assure that you are handling DB 
files that are global and owned by the right non-root user. If in doing 
that you cause the creation of a file in the DB that is owned by root 
and can't be deleted by spamd, your DB will be broken.

> So just for posterity, the problem was I just needed "bayes_path
> /var/log/spamassassin/.spamassassin/bayes" in local.cf to make
> sa-learn use that db instead of /root/.spamassassin/bayes. Looks like
> it choked initially but somehow it's working now.

Yeah, that seems like a very wrong solution. Not saying it didn't work 
for you, but it would not be my choice. Since you seem set on having a 
weird place for your DB, I won't argue the issue.

>>> Everything is installed as user / group spamd and postfix is set to
>>> call spamassassin with user=spamd. And I assume I must run sa-learn 
>>> as
>>> root so that it can access Maildir directories and that bayes_path
>>> tells sa-learn where the db is. So now what's the problem?
>>
>>
>> Wrong assumption.
>>
>> The sa-learn program is for anyone to manually work with their own 
>> Bayes DB,
>> including for the owner of a system-wide Bayes DB to work with that 
>> Bayes
>> DB. If you have a system-wide Bayes DB, it should be fed by either a
>> system-wide filtering mechanism operating as part of the delivery 
>> process
>> and running as the owner of the global DB or by users running the 
>> spamc
>> client under their own ids to feed a spamd daemon running as the 
>> owner of
>> the global DB or by a combination of the two. The CentOS 7 package 
>> installs
>> spamd and spamc, and if you want to learn already-delivered mail into 
>> a
>> global BayesDB, those are the tools to use.
>
> Yes, I want a system-wide bayes db. And I am running spamd and spamc
> and I assume that is all working (but of course I have no idea if it
> really is).
>
> But I want users to be able to put spams that get through into
> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
> sa-learn on all of those messages for the system-wide db.
>
> So can that be done without running sa-learn as root?

Of course. As I said in other words that you quoted but apparently 
misunderstood:

***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A 
SYSTEM-WIDE DB ****

Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it 
run as the user whose magic maildirs are being learned. It talks to the 
spamd daemon, running as the spamd user, managing the system-wide Bayes 
DB. If it isn't run as root, it can't do random violence limited only by 
your capacity for typos.

> Ideally I would think sa-learn should be able to run as root just to
> access files but use a spamd child to process them and update the
> bayes db. Possible?

That's not how any of this works...

The reason for the 'd' in spamd is that it is a daemon: a long-running 
process that other processes (or network entities) can talk to via a 
local unix socket in the filesystem or a TCP port using a defined 
protocol. The sa-learn program is not a client of spamd speaking that 
protocol but rather a direct manipulator of the BayesDB, just as spamd 
is. You can usually get away with using sa-learn to work with the same 
BayesDB that spamd uses, but you are likely to eventually do something a 
little wrong and either screw up the BayesDB with a file spamd can't 
write to or accidentally and blindly work with a brand new different 
BayesDB because of some environmental change or you've re-installed SA 
or whatever. I don't think there's a real risk of deadlock or data 
corruption or anything like that from using spamd and sa-learn on the 
same DB, but you do have 2 tools that are unaware of each other 
potentially trying to write to the same files, so there is at least some 
possibility for contention problems. And as you've noticed: to learn 
messages in anyone's maildirs itnot the system BayesDB, you have to run 
sa-learn as root, because it isn't talking to spamd at all but fiddling 
with spamd's file behind spamd's back. Running things as root should be 
resisted and avoided. Use spamc instead, avoid the risks.

-- 
Bill Cole                                        Email: 
bill@scconsult.com
18847 Rosetta Ave.                      USE THE FROM HEADER IF IT 
DIFFERS!
Eastpointe, MI USA 48021            MAIN ADDRESS IS HEAVILY 
SPAM-FILTERED!
Phone: +1-586-774-4357

Re: No BAYES_XX tags in X-Spam-Report

Posted by Reindl Harald <h....@thelounge.net>.

Am 23.06.2015 um 06:05 schrieb Michael B Allen:
> Yes, I want a system-wide bayes db. And I am running spamd and spamc
> and I assume that is all working (but of course I have no idea if it
> really is).
>
> But I want users to be able to put spams that get through into
> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
> sa-learn on all of those messages for the system-wide db.

you need to somehow move that messages in the folder where you run 
sa-learn from and you should consider keep that samples to have the 
option completly rebuild the database from scratch - SA 3.4.1 as example 
brought some changes which justified the rebuild with our currently 
40000 samples

> So can that be done without running sa-learn as root?

yes, see above

it needs work on your side, SA is a framework

> Ideally I would think sa-learn should be able to run as root just to
> access files but use a spamd child to process them and update the
> bayes db. Possible?

you don't understand how it works: they bayes db is in fact a file 
living in the userhome, spamd is using that database file, sa-learn is a 
command-line utility to fill up that database, not more and not less

there is no single reason to run anything of that as root except the 
script you write for sa-learning could make sure that the spamd user has 
read permissions and sa-learn get fired with the spamd-user

su -c "sa-learn --max-size=0 --spam /folder/with/spam" - spamd-user
su -c "sa-learn --max-size=0 --ham /folder/with/ham" - spamd-user

and yes, the site-wide bayes is a good idea even if it needs some work 
because otherwise *each user* would need to train 400 spam as well as 
400 ham samples for his own bayes and expierience shows that won't happen

Re: No BAYES_XX tags in X-Spam-Report

Posted by Michael B Allen <io...@gmail.com>.

On Mon, Jun 22, 2015 at 10:42 PM, Bill Cole
<sa...@billmail.scconsult.com> wrote:
> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>
>> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net>
>> wrote:
>>>>
>>>> [root@www .spamassassin]# pwd
>>>> /var/log/spamassassin/.spamassassin
>>>> [root@www .spamassassin]# ls -la
>>>> total 1100
>>>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>>>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>>>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>>>
>>>
>>>
>>> i doubt that SA is using the bayes of root
>>> so you just rain the wrong bayes
>>
>>
>> So with a default install (CentOS 7 in my case and I suspect pretty
>> much all other systems), bayes will NOT just work by default unless
>> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
>> to use the bayes db owned by spamd
>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
>> owned by root?
>>
>> However, I have done this:
>>
>> bayes_path /var/log/spamassassin/.spamassassin/bayes
>> bayes_file_mode 0777
>
>
> Don't do that, ever, on any regular file, on any system that has processes
> running as more than just root. I know it's in the SA Wiki, but it's an
> irresponsible recommendation.

Yeah, I was going to ask about this because it seems to me if the db
is owned by spamd and spamassassin is running as user spamd and
sa-learn is running as root then 0600 should be fine (although it's
not obvious to me why SA needs a "file mode" in the first place).

So then what do you recommend that the bayes_file_mode value be precisely?

At any rate, the whole thing seems to be working now incidentally. I
am getting BAYES_XX tags now. As stated in my other followup message,
SA seems to have detected the broken db and fixed it because it
suddenly just stated working and sa-learn --dump magic works and is
showing the right numbers.

So just for posterity, the problem was I just needed "bayes_path
/var/log/spamassassin/.spamassassin/bayes" in local.cf to make
sa-learn use that db instead of /root/.spamassassin/bayes. Looks like
it choked initially but somehow it's working now.

>> Everything is installed as user / group spamd and postfix is set to
>> call spamassassin with user=spamd. And I assume I must run sa-learn as
>> root so that it can access Maildir directories and that bayes_path
>> tells sa-learn where the db is. So now what's the problem?
>
>
> Wrong assumption.
>
> The sa-learn program is for anyone to manually work with their own Bayes DB,
> including for the owner of a system-wide Bayes DB to work with that Bayes
> DB. If you have a system-wide Bayes DB, it should be fed by either a
> system-wide filtering mechanism operating as part of the delivery process
> and running as the owner of the global DB or by users running the spamc
> client under their own ids to feed a spamd daemon running as the owner of
> the global DB or by a combination of the two. The CentOS 7 package installs
> spamd and spamc, and if you want to learn already-delivered mail into a
> global BayesDB, those are the tools to use.

Yes, I want a system-wide bayes db. And I am running spamd and spamc
and I assume that is all working (but of course I have no idea if it
really is).

But I want users to be able to put spams that get through into
~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
sa-learn on all of those messages for the system-wide db.

So can that be done without running sa-learn as root?

Ideally I would think sa-learn should be able to run as root just to
access files but use a spamd child to process them and update the
bayes db. Possible?

Mike

Re: No BAYES_XX tags in X-Spam-Report

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 22 Jun 2015, at 21:45, Michael B Allen wrote:

> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald 
> <h....@thelounge.net> wrote:
>>> [root@www .spamassassin]# pwd
>>> /var/log/spamassassin/.spamassassin
>>> [root@www .spamassassin]# ls -la
>>> total 1100
>>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>>
>>
>> i doubt that SA is using the bayes of root
>> so you just rain the wrong bayes
>
> So with a default install (CentOS 7 in my case and I suspect pretty
> much all other systems), bayes will NOT just work by default unless
> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
> to use the bayes db owned by spamd
> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
> owned by root?
>
> However, I have done this:
>
> bayes_path /var/log/spamassassin/.spamassassin/bayes
> bayes_file_mode 0777

Don't do that, ever, on any regular file, on any system that has 
processes running as more than just root. I know it's in the SA Wiki, 
but it's an irresponsible recommendation.

> and after running sa-learn again (as root) on ham, my db is now 
> broken:
>
> [root@www .spamassassin]# sa-learn --dump magic
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> ERROR: Bayes dump returned an error, please re-run with -D for more 
> information
>
> Can someone clue me in here?

See the last line. "sa-learn -D --dump magic" will provide deeper clues 
as to the specific nature of the problem, which seems to be consistent 
with trying to use a Bayes DB that isn't there or isn't readable.

I'm not certain what is breaking for SA after you've broken your system, 
but since it's CentOS 7 it may be that SELinux is backstopping the 777 
insanity. I'm unwilling to replicate the mistake to replicate the 
error...
The FIRST step is to undo the breakage you directly inflicted, since 
your system seems to have been working normally before, running in an 
initial learning mode (i.e. without enough learned to use Bayes for 
scoring).

> Everything is installed as user / group spamd and postfix is set to
> call spamassassin with user=spamd. And I assume I must run sa-learn as
> root so that it can access Maildir directories and that bayes_path
> tells sa-learn where the db is. So now what's the problem?

Wrong assumption.

The sa-learn program is for anyone to manually work with their own Bayes 
DB, including for the owner of a system-wide Bayes DB to work with that 
Bayes DB. If you have a system-wide Bayes DB, it should be fed by either 
a system-wide filtering mechanism operating as part of the delivery 
process and running as the owner of the global DB or by users running 
the spamc client under their own ids to feed a spamd daemon running as 
the owner of the global DB or by a combination of the two. The CentOS 7 
package installs spamd and spamc, and if you want to learn 
already-delivered mail into a global BayesDB, those are the tools to 
use.

Re: No BAYES_XX tags in X-Spam-Report

Posted by Reindl Harald <h....@thelounge.net>.


Am 23.06.2015 um 03:45 schrieb Michael B Allen:
> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net> wrote:
>>> [root@www .spamassassin]# pwd
>>> /var/log/spamassassin/.spamassassin
>>> [root@www .spamassassin]# ls -la
>>> total 1100
>>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>>
>> i doubt that SA is using the bayes of root
>> so you just rain the wrong bayes
>
> So with a default install (CentOS 7 in my case and I suspect pretty
> much all other systems), bayes will NOT just work by default unless
> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
> to use the bayes db owned by spamd
> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
> owned by root?
>
> However, I have done this:
>
> bayes_path /var/log/spamassassin/.spamassassin/bayes
> bayes_file_mode 0777
>
> and after running sa-learn again (as root) on ham, my db is now broken

STOP ACTING AS ROOT IN GENERAL

you need to train as the user spamd is running

Re: No BAYES_XX tags in X-Spam-Report

Posted by Michael B Allen <io...@gmail.com>.

On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net> wrote:
>> [root@www .spamassassin]# pwd
>> /var/log/spamassassin/.spamassassin
>> [root@www .spamassassin]# ls -la
>> total 1100
>> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>
>
> i doubt that SA is using the bayes of root
> so you just rain the wrong bayes

So with a default install (CentOS 7 in my case and I suspect pretty
much all other systems), bayes will NOT just work by default unless
you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
to use the bayes db owned by spamd
(/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
owned by root?

However, I have done this:

bayes_path /var/log/spamassassin/.spamassassin/bayes
bayes_file_mode 0777

and after running sa-learn again (as root) on ham, my db is now broken:

[root@www .spamassassin]# sa-learn --dump magic
bayes: bayes db version 0 is not able to be used, aborting! at
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
208.
bayes: bayes db version 0 is not able to be used, aborting! at
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
208.
ERROR: Bayes dump returned an error, please re-run with -D for more information

Can someone clue me in here?

Everything is installed as user / group spamd and postfix is set to
call spamassassin with user=spamd. And I assume I must run sa-learn as
root so that it can access Maildir directories and that bayes_path
tells sa-learn where the db is. So now what's the problem?

Mike

Re: No BAYES_XX tags in X-Spam-Report

Posted by Reindl Harald <h....@thelounge.net>.


Am 23.06.2015 um 01:55 schrieb Michael B Allen:
> How can I tell if SA is tagging using bayes?

if you see the bayes tags in headers and logs

> [root@www .spamassassin]# pwd
> /var/log/spamassassin/.spamassassin
> [root@www .spamassassin]# ls -la
> total 1100
> drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
> drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
> -rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
> -rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs

i doubt that SA is using the bayes of root
so you just rain the wrong bayes

> [root@www .spamassassin]# sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0        301          0  non-token data: nspam
> 0.000          0      11236          0  non-token data: nham
> 0.000          0     419941          0  non-token data: ntokens
> 0.000          0 1150469108          0  non-token data: oldest atime
> 0.000          0 1435015894          0  non-token data: newest atime
> 0.000          0 1435016419          0  non-token data: last journal sync atime
> 0.000          0 1435016432          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire atime delta
> 0.000          0          0          0  non-token data: last expire

again: SA don'trun as root in any sane setup and hence won't use *that* 
bayes-db

> reduction count
> [root@www .spamassassin]# cat ../../maillog | grep BAYES
> [root@www .spamassassin]#
>
> I don't see any BAYES_ tags in X-Spam-Report.

see above

> I'm using a default SA install on CentOS 7.

what is a "default install"?
it can be spamass-milter or anything else calling SA

> Do I need a local.cf? From looking at the docs, it claims bayes is
> enabled by default. How can I check this?
>
> Or maybe I haven't sa-learn'd enough spam?

no, you just rain the wrong bayes and NEVER EVER should run such things 
as root - http://wiki.apache.org/spamassassin/SiteWideBayesSetup

we are running spamd as well as spamass-milter with it's user and to be 
explicit have the following line in /etc/mail/spamassassin/local.cf 
while it's not strictly needed in that case since that's the userhome
bayes_path /var/lib/spamass-milter/.spamassassin/bayes

Re: No BAYES_XX tags in X-Spam-Report

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 22.06.15 19:55, Michael B Allen wrote:
>How can I tell if SA is tagging using bayes?
>
>[root@www .spamassassin]# pwd
>/var/log/spamassassin/.spamassassin

>[root@www .spamassassin]# ls -la
>total 1100
>drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
>drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
>-rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
>-rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>-rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs
>[root@www .spamassassin]# sa-learn --dump magic
>0.000          0          3          0  non-token data: bayes db version
>0.000          0        301          0  non-token data: nspam
>0.000          0      11236          0  non-token data: nham
>0.000          0     419941          0  non-token data: ntokens
>0.000          0 1150469108          0  non-token data: oldest atime
>0.000          0 1435015894          0  non-token data: newest atime
>0.000          0 1435016419          0  non-token data: last journal sync atime
>0.000          0 1435016432          0  non-token data: last expiry atime
>0.000          0          0          0  non-token data: last expire atime delta
>0.000          0          0          0  non-token data: last expire
>reduction count

>I don't see any BAYES_ tags in X-Spam-Report.

how did you incporporate spamassassin to mail flow? Do you run milter,
amavisd, or simply spamc from procmailrc?

the bayes database  you have shown belongs to user spamd, but the user does
not get all the mail, does it?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I just got lost in thought. It was unfamiliar territory.