You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael B Allen <io...@gmail.com> on 2015/06/23 01:55:04 UTC
No BAYES_XX tags in X-Spam-Report
How can I tell if SA is tagging using bayes?
[root@www .spamassassin]# pwd
/var/log/spamassassin/.spamassassin
[root@www .spamassassin]# ls -la
total 1100
drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
-rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
-rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
-rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
[root@www .spamassassin]# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 301 0 non-token data: nspam
0.000 0 11236 0 non-token data: nham
0.000 0 419941 0 non-token data: ntokens
0.000 0 1150469108 0 non-token data: oldest atime
0.000 0 1435015894 0 non-token data: newest atime
0.000 0 1435016419 0 non-token data: last journal sync atime
0.000 0 1435016432 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire
reduction count
[root@www .spamassassin]# cat ../../maillog | grep BAYES
[root@www .spamassassin]#
I don't see any BAYES_ tags in X-Spam-Report.
I'm using a default SA install on CentOS 7.
Do I need a local.cf? From looking at the docs, it claims bayes is
enabled by default. How can I check this?
Or maybe I haven't sa-learn'd enough spam?
Mike
Re: No BAYES_XX tags in X-Spam-Report
Posted by RW <rw...@googlemail.com>.
On Tue, 23 Jun 2015 02:04:03 +0200
Reindl Harald wrote:
> oh and independent of not running as root you have only 301 spam
> messages while the docs clearly state you need at least 400 ham as
> well as 400 spam samples
It's 200 of each.
Re: No BAYES_XX tags in X-Spam-Report
Posted by Reindl Harald <h....@thelounge.net>.
oh and independent of not running as root you have only 301 spam
messages while the docs clearly state you need at least 400 ham as well
as 400 spam samples
Am 23.06.2015 um 02:01 schrieb Reindl Harald:
> Am 23.06.2015 um 01:55 schrieb Michael B Allen:
>> How can I tell if SA is tagging using bayes?
>
> if you see the bayes tags in headers and logs
>
>> [root@www .spamassassin]# pwd
>> /var/log/spamassassin/.spamassassin
>> [root@www .spamassassin]# ls -la
>> total 1100
>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>
> i doubt that SA is using the bayes of root
> so you just rain the wrong bayes
>
>> [root@www .spamassassin]# sa-learn --dump magic
>> 0.000 0 3 0 non-token data: bayes db version
>> 0.000 0 301 0 non-token data: nspam
>> 0.000 0 11236 0 non-token data: nham
>> 0.000 0 419941 0 non-token data: ntokens
>> 0.000 0 1150469108 0 non-token data: oldest atime
>> 0.000 0 1435015894 0 non-token data: newest atime
>> 0.000 0 1435016419 0 non-token data: last journal
>> sync atime
>> 0.000 0 1435016432 0 non-token data: last expiry atime
>> 0.000 0 0 0 non-token data: last expire
>> atime delta
>> 0.000 0 0 0 non-token data: last expire
>
> again: SA don'trun as root in any sane setup and hence won't use *that*
> bayes-db
>
>> reduction count
>> [root@www .spamassassin]# cat ../../maillog | grep BAYES
>> [root@www .spamassassin]#
>>
>> I don't see any BAYES_ tags in X-Spam-Report.
>
> see above
>
>> I'm using a default SA install on CentOS 7.
>
> what is a "default install"?
> it can be spamass-milter or anything else calling SA
>
>> Do I need a local.cf? From looking at the docs, it claims bayes is
>> enabled by default. How can I check this?
>>
>> Or maybe I haven't sa-learn'd enough spam?
>
> no, you just rain the wrong bayes and NEVER EVER should run such things
> as root - http://wiki.apache.org/spamassassin/SiteWideBayesSetup
>
> we are running spamd as well as spamass-milter with it's user and to be
> explicit have the following line in /etc/mail/spamassassin/local.cf
> while it's not strictly needed in that case since that's the userhome
> bayes_path /var/lib/spamass-milter/.spamassassin/bayes
Re: No BAYES_XX tags in X-Spam-Report
Posted by Michael B Allen <io...@gmail.com>.
On Mon, Jun 22, 2015 at 9:45 PM, Michael B Allen <io...@gmail.com> wrote:
> and after running sa-learn again (as root) on ham, my db is now broken:
>
> [root@www .spamassassin]# sa-learn --dump magic
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> ERROR: Bayes dump returned an error, please re-run with -D for more information
Well now the DB is no longer broken and I didn't do anything. I guess
spamassassin ran and detected the broken DB and rebuilt it?
[root@www .spamassassin]# ls -la
total 9288
drwx------ 2 spamd spamd 4096 Jun 22 21:46 .
drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
-rw------- 1 spamd spamd 2280 Jun 22 21:46 bayes_journal
-rw------- 1 spamd spamd 1306624 Jun 22 21:38 bayes_seen
-rw------- 1 spamd spamd 10485760 Jun 22 21:38 bayes_toks
-rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
[root@www .spamassassin]# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 378 0 non-token data: nspam
0.000 0 10821 0 non-token data: nham
0.000 0 413216 0 non-token data: ntokens
0.000 0 1150469108 0 non-token data: oldest atime
0.000 0 1435023513 0 non-token data: newest atime
0.000 0 1435023514 0 non-token data: last journal sync atime
0.000 0 1435023525 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire
reduction count
Weird.
But now I am seeing:
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=unavailable
version=3.4.0
I have restarted spamd with systemctrl {stop,start} spamassassin.
Hopefully it was just a transient issue.
Mike
Re: No BAYES_XX tags in X-Spam-Report
Posted by RW <rw...@googlemail.com>.
On Wed, 24 Jun 2015 17:14:37 -0400
Bill Cole wrote:
> You snipped out what I was specifically responding to:
>
> On 22 Jun 2015, at 21:45 , Michael B Allen wrote:
>
> > bayes_file_mode 0777
Ok, I misunderstood. I thought you were referring to things being run as
root.
Re: No BAYES_XX tags in X-Spam-Report
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 24 Jun 2015, at 16:21, RW wrote:
> On Mon, 22 Jun 2015 22:42:09 -0400
> Bill Cole wrote:
>
>> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>
>>> So with a default install (CentOS 7 in my case and I suspect pretty
>>> much all other systems), bayes will NOT just work by default unless
>>> you explicitly modify /etc/mail/spamassassin/local.cf to tell
>>> sa-learn to use the bayes db owned by spamd
>>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the
>>> one owned by root?
>
>> Don't do that, ever, on any regular file, on any system that has
>> processes running as more than just root. I know it's in the SA Wiki,
>> but it's an irresponsible recommendation.
>
>
> The default is that spamd starts as root and its children drop
> privileges and run as the user running spamc. Running spamc as root
> is
> the source of the myth that SA by default stores its data under /root.
Yes, if that's how you run spamd, the user running spamc determines
which per-user config & DBs to use. Which is not actually relevant to
this thread.
> spamd can also start as root and then drop to the unprivileged user
> once it's bound to its port.
Yes, and based on OP's description that's *specifically* the
configuration being discussed: spamd "running as the user spamd" which
only makes sense as meaning it include "-u spamd" in its args. Also: an
absolute and rather odd bayes_path.
> I don't know the wiki passage you are referring to, but I'd be
> surprised
> if it's actually advocating doing mail scans as root.
You snipped out what I was specifically responding to:
On 22 Jun 2015, at 21:45 , Michael B Allen wrote:
> bayes_file_mode 0777
That is used as an example at
http://wiki.apache.org/spamassassin/SiteWideBayesSetup so it is
understandable why it gets used. The text following it denies the
recommendation, but quite weakly.
I've actually found that page useful for screening sysadmin job
candidates, without any expectation that they understand SA to find the
problem. It is much better to never hire the one who will have to be
instructed later on the generic error of using mode 0777.
Re: No BAYES_XX tags in X-Spam-Report
Posted by RW <rw...@googlemail.com>.
On Mon, 22 Jun 2015 22:42:09 -0400
Bill Cole wrote:
> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
> > So with a default install (CentOS 7 in my case and I suspect pretty
> > much all other systems), bayes will NOT just work by default unless
> > you explicitly modify /etc/mail/spamassassin/local.cf to tell
> > sa-learn to use the bayes db owned by spamd
> > (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the
> > one owned by root?
> Don't do that, ever, on any regular file, on any system that has
> processes running as more than just root. I know it's in the SA Wiki,
> but it's an irresponsible recommendation.
The default is that spamd starts as root and its children drop
privileges and run as the user running spamc. Running spamc as root is
the source of the myth that SA by default stores its data under /root.
spamd can also start as root and then drop to the unprivileged user
once it's bound to its port.
I don't know the wiki passage you are referring to, but I'd be surprised
if it's actually advocating doing mail scans as root.
Re: No BAYES_XX tags in X-Spam-Report
Posted by Reindl Harald <h....@thelounge.net>.
Am 23.06.2015 um 18:48 schrieb Bill Cole:
> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
> SYSTEM-WIDE DB ****
says who? that below is the rsult of a customized sa-learn script for a
ton of users working like a charm on a spamass-milter setup for 10
months now
[root@mail-gw:~]$ bayes-stats.sh
0 30009 SPAM
0 17486 HAM
0 2125684 TOKEN
insgesamt 64M
-rw------- 1 sa-milt sa-milt 5,0M 2015-06-23 18:16 bayes_seen
-rw------- 1 sa-milt sa-milt 80M 2015-06-23 18:16 bayes_toks
-rw------- 1 sa-milt sa-milt 98 2015-02-17 11:37 user_prefs
BAYES_00 27854 71.55 %
BAYES_05 856 2.19 %
BAYES_20 955 2.45 %
BAYES_40 857 2.20 %
BAYES_50 3398 8.72 %
BAYES_60 359 0.92 %
BAYES_80 390 1.00 %
BAYES_95 298 0.76 %
BAYES_99 3962 10.17 %
BAYES_999 3654 9.38 %
DELIVERED 44487 89.75 %
DNSWL 41562 83.85 %
SPF 19953 40.25 %
SPF/DKIM WL 10265 20.70 %
SHORTCIRCUIT 10617 21.41 %
BLOCKED 6154 12.41 %
Re: No BAYES_XX tags in X-Spam-Report
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 23 Jun 2015, at 14:58, Michael B Allen wrote:
> On Tue, Jun 23, 2015 at 12:48 PM, Bill Cole
> <sa...@billmail.scconsult.com> wrote:
>>> Yes, I want a system-wide bayes db. And I am running spamd and spamc
>>> and I assume that is all working (but of course I have no idea if it
>>> really is).
>>>
>>> But I want users to be able to put spams that get through into
>>> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to
>>> run
>>> sa-learn on all of those messages for the system-wide db.
>>>
>>> So can that be done without running sa-learn as root?
>>
>>
>> Of course. As I said in other words that you quoted but apparently
>> misunderstood:
>>
>> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
>> SYSTEM-WIDE DB ****
>>
>> Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it
>> run as
>> the user whose magic maildirs are being learned. It talks to the
>> spamd
>> daemon, running as the spamd user, managing the system-wide Bayes DB.
>> If it
>> isn't run as root, it can't do random violence limited only by your
>> capacity
>> for typos.
>
> Well, ever since we stopped using The UNIX® Time-Sharing System back
> in '87 generally "users" don't run stuff on their own like this
> anymore.
Sure, but if you're using Real Users (i.e. if diverse ownership of
Maildirs is an actual system issue) then maybe you populate a crontab
for each one as well. Or not. My point is that if you have spamd running
as the user spamd, it will only ever operate based on the SpamAssassin
configuration for the user spamd, never as if it were root. No matter
how you run spamc, it can't make spamd break ownership of the DB files
so that spamd can't continue to use them. Because sa-learn running as
root is a root process manipulating files itself (not mediated by spamd)
you need to be careful about how you invoke it because you MIGHT end up
with something like this:
# ls -l ~spamd/.spamassassin/
total 400461
-rw------- 1 spamd spamd 80642048 Jun 23 19:24 auto-whitelist
-rw------- 1 root spamd 51264 Jun 23 04:29 bayes_journal
-rw------- 1 spamd spamd 324435968 Jun 23 19:24 bayes_seen
-rw------- 1 spamd spamd 5046272 Jun 23 19:24 bayes_toks
-rw-r--r-- 1 spamd spamd 1869 Jul 17 2011 user_prefs
(Sigh.... gotta go spank someone...)
> But if spamc -L could consume an entire Maildir without requiring an
> awk expert, that would be great.
No awk needed. Assuming the Maildir gets cleaned out so you aren't
constantly trying to re-learn an ever-growing pile of old messages:
cd ~$USER/Maildir/.LearnAsSpam/cur
for x in *; do spamc -L spam < $x & done
Replace the '&' with a ';' if you find the concurrency a problem.
A bit fancier, run it hourly for rapid learning of fewer messages, still
no awk, :
for x in $( find /home/*/Maildir/.LearnAsSpam/cur/ -type f -cmin -61 ) ;
spamc -L spam < $x & done
Re: No BAYES_XX tags in X-Spam-Report
Posted by Michael B Allen <io...@gmail.com>.
On Tue, Jun 23, 2015 at 12:48 PM, Bill Cole
<sa...@billmail.scconsult.com> wrote:
>> Yes, I want a system-wide bayes db. And I am running spamd and spamc
>> and I assume that is all working (but of course I have no idea if it
>> really is).
>>
>> But I want users to be able to put spams that get through into
>> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
>> sa-learn on all of those messages for the system-wide db.
>>
>> So can that be done without running sa-learn as root?
>
>
> Of course. As I said in other words that you quoted but apparently
> misunderstood:
>
> ***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
> SYSTEM-WIDE DB ****
>
> Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it run as
> the user whose magic maildirs are being learned. It talks to the spamd
> daemon, running as the spamd user, managing the system-wide Bayes DB. If it
> isn't run as root, it can't do random violence limited only by your capacity
> for typos.
Well, ever since we stopped using The UNIX® Time-Sharing System back
in '87 generally "users" don't run stuff on their own like this
anymore.
But if spamc -L could consume an entire Maildir without requiring an
awk expert, that would be great.
Mike
Re: No BAYES_XX tags in X-Spam-Report
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 23 Jun 2015, at 0:05, Michael B Allen wrote:
> On Mon, Jun 22, 2015 at 10:42 PM, Bill Cole
> <sa...@billmail.scconsult.com> wrote:
>> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>>
>>> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald
>>> <h....@thelounge.net>
>>> wrote:
>>>>>
>>>>> [root@www .spamassassin]# pwd
>>>>> /var/log/spamassassin/.spamassassin
>>>>> [root@www .spamassassin]# ls -la
>>>>> total 1100
>>>>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>>>>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>>>>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>>>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>>>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>>>>
>>>>
>>>>
>>>> i doubt that SA is using the bayes of root
>>>> so you just rain the wrong bayes
>>>
>>>
>>> So with a default install (CentOS 7 in my case and I suspect pretty
>>> much all other systems), bayes will NOT just work by default unless
>>> you explicitly modify /etc/mail/spamassassin/local.cf to tell
>>> sa-learn
>>> to use the bayes db owned by spamd
>>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the
>>> one
>>> owned by root?
>>>
>>> However, I have done this:
>>>
>>> bayes_path /var/log/spamassassin/.spamassassin/bayes
>>> bayes_file_mode 0777
>>
>>
>> Don't do that, ever, on any regular file, on any system that has
>> processes
>> running as more than just root. I know it's in the SA Wiki, but it's
>> an
>> irresponsible recommendation.
>
> Yeah, I was going to ask about this because it seems to me if the db
> is owned by spamd and spamassassin is running as user spamd and
> sa-learn is running as root then 0600 should be fine (although it's
> not obvious to me why SA needs a "file mode" in the first place).
A diversity of rigs. SA isn't the spamd daemon or the spamc client or
$PERLLIB/Mail/SpamAssassin.pm or the configured ruleset, it is the whole
tree of Perl modules in the Mail::SpamAssassin namespace plus *maybe*
spamd/spamc, the rules, and subsidiary utilities using them like
sa-learn. Different sites use the Perl framework and tools in different
ways, so they need different ownership & permission settings. As I don't
use SpamAssassin on CentOS (or RHEL) I'm not sure precisely what the
default SA rig there looks like, and how (if at all) RedHat has hooked
it into Postfix(?) so I can't explain much about the specifics of what
you get from 'yum install spamassassin'.
More specifically: in its simplest form, SA is designed to be used by
each of many unprivileged users with independent Bayes DBs fed & used by
local mail delivery and pre-delivery filtering processes and sa-learn or
an equivalent tool for learning messages post-delivery. The
bayes_file_mode defaults to 0700 and usually need not be changed, but on
some OS's with some mail subsystems it may be necessary to adjust that
to allow a delivery agent or other component (e.g. filtering tools)
running as something other than root OR the individual mail recipient to
read or maybe even write to the users' individual Bayes DBs. You should
NOT need it changed on a system that only uses a system-wide Bayes DB.
> So then what do you recommend that the bayes_file_mode value be
> precisely?
The default is usually fine. That's why it is the default. Note that
this value is only applied when creating a new file in the Bayes DB
(which is composed of multiple files) so it is possible for the effects
of changing it to be delayed. If RedHat's packaging of SpamAssassin
includes a different value, I'd suggest not changing it. Also, moving
your DB into /var/log/spamassassin/ is a quirky choice that might not be
compatible with RedHat's integration choices in the package they
distribute (and which CentOS replicates.) It's your system and your
choices of course...
> At any rate, the whole thing seems to be working now incidentally. I
> am getting BAYES_XX tags now.
Yes. As documented, you don't get messages scored by the Bayes component
until it has built an adequate learned history of both ham and spam to
do valid scoring.
> As stated in my other followup message,
> SA seems to have detected the broken db and fixed it because it
> suddenly just stated working and sa-learn --dump magic works and is
> showing the right numbers.
Well, I'm not convinced that's exactly how it worked, but I'm glad you
seem to have it working.
Note that 'sa-learn' DOES NOT talk to spamd, it uses the SA config that
it finds for the user running it to figure out which rules it should use
and where to find the Bayes DB (and AWL or TxRep DBs) for that user. If
you have spamd running to use a system-wide
config/ruleset/Bayes/(AWL|TxRep) you should get in the habit of using
spamc to communicate with the daemon rather than running sa-learn as
root and relying on a quirky config to assure that you are handling DB
files that are global and owned by the right non-root user. If in doing
that you cause the creation of a file in the DB that is owned by root
and can't be deleted by spamd, your DB will be broken.
> So just for posterity, the problem was I just needed "bayes_path
> /var/log/spamassassin/.spamassassin/bayes" in local.cf to make
> sa-learn use that db instead of /root/.spamassassin/bayes. Looks like
> it choked initially but somehow it's working now.
Yeah, that seems like a very wrong solution. Not saying it didn't work
for you, but it would not be my choice. Since you seem set on having a
weird place for your DB, I won't argue the issue.
>>> Everything is installed as user / group spamd and postfix is set to
>>> call spamassassin with user=spamd. And I assume I must run sa-learn
>>> as
>>> root so that it can access Maildir directories and that bayes_path
>>> tells sa-learn where the db is. So now what's the problem?
>>
>>
>> Wrong assumption.
>>
>> The sa-learn program is for anyone to manually work with their own
>> Bayes DB,
>> including for the owner of a system-wide Bayes DB to work with that
>> Bayes
>> DB. If you have a system-wide Bayes DB, it should be fed by either a
>> system-wide filtering mechanism operating as part of the delivery
>> process
>> and running as the owner of the global DB or by users running the
>> spamc
>> client under their own ids to feed a spamd daemon running as the
>> owner of
>> the global DB or by a combination of the two. The CentOS 7 package
>> installs
>> spamd and spamc, and if you want to learn already-delivered mail into
>> a
>> global BayesDB, those are the tools to use.
>
> Yes, I want a system-wide bayes db. And I am running spamd and spamc
> and I assume that is all working (but of course I have no idea if it
> really is).
>
> But I want users to be able to put spams that get through into
> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
> sa-learn on all of those messages for the system-wide db.
>
> So can that be done without running sa-learn as root?
Of course. As I said in other words that you quoted but apparently
misunderstood:
***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A
SYSTEM-WIDE DB ****
Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it
run as the user whose magic maildirs are being learned. It talks to the
spamd daemon, running as the spamd user, managing the system-wide Bayes
DB. If it isn't run as root, it can't do random violence limited only by
your capacity for typos.
> Ideally I would think sa-learn should be able to run as root just to
> access files but use a spamd child to process them and update the
> bayes db. Possible?
That's not how any of this works...
The reason for the 'd' in spamd is that it is a daemon: a long-running
process that other processes (or network entities) can talk to via a
local unix socket in the filesystem or a TCP port using a defined
protocol. The sa-learn program is not a client of spamd speaking that
protocol but rather a direct manipulator of the BayesDB, just as spamd
is. You can usually get away with using sa-learn to work with the same
BayesDB that spamd uses, but you are likely to eventually do something a
little wrong and either screw up the BayesDB with a file spamd can't
write to or accidentally and blindly work with a brand new different
BayesDB because of some environmental change or you've re-installed SA
or whatever. I don't think there's a real risk of deadlock or data
corruption or anything like that from using spamd and sa-learn on the
same DB, but you do have 2 tools that are unaware of each other
potentially trying to write to the same files, so there is at least some
possibility for contention problems. And as you've noticed: to learn
messages in anyone's maildirs itnot the system BayesDB, you have to run
sa-learn as root, because it isn't talking to spamd at all but fiddling
with spamd's file behind spamd's back. Running things as root should be
resisted and avoided. Use spamc instead, avoid the risks.
--
Bill Cole Email:
bill@scconsult.com
18847 Rosetta Ave. USE THE FROM HEADER IF IT
DIFFERS!
Eastpointe, MI USA 48021 MAIN ADDRESS IS HEAVILY
SPAM-FILTERED!
Phone: +1-586-774-4357
Re: No BAYES_XX tags in X-Spam-Report
Posted by Reindl Harald <h....@thelounge.net>.
Am 23.06.2015 um 06:05 schrieb Michael B Allen:
> Yes, I want a system-wide bayes db. And I am running spamd and spamc
> and I assume that is all working (but of course I have no idea if it
> really is).
>
> But I want users to be able to put spams that get through into
> ~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
> sa-learn on all of those messages for the system-wide db.
you need to somehow move that messages in the folder where you run
sa-learn from and you should consider keep that samples to have the
option completly rebuild the database from scratch - SA 3.4.1 as example
brought some changes which justified the rebuild with our currently
40000 samples
> So can that be done without running sa-learn as root?
yes, see above
it needs work on your side, SA is a framework
> Ideally I would think sa-learn should be able to run as root just to
> access files but use a spamd child to process them and update the
> bayes db. Possible?
you don't understand how it works: they bayes db is in fact a file
living in the userhome, spamd is using that database file, sa-learn is a
command-line utility to fill up that database, not more and not less
there is no single reason to run anything of that as root except the
script you write for sa-learning could make sure that the spamd user has
read permissions and sa-learn get fired with the spamd-user
su -c "sa-learn --max-size=0 --spam /folder/with/spam" - spamd-user
su -c "sa-learn --max-size=0 --ham /folder/with/ham" - spamd-user
and yes, the site-wide bayes is a good idea even if it needs some work
because otherwise *each user* would need to train 400 spam as well as
400 ham samples for his own bayes and expierience shows that won't happen
Re: No BAYES_XX tags in X-Spam-Report
Posted by Michael B Allen <io...@gmail.com>.
On Mon, Jun 22, 2015 at 10:42 PM, Bill Cole
<sa...@billmail.scconsult.com> wrote:
> On 22 Jun 2015, at 21:45, Michael B Allen wrote:
>
>> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net>
>> wrote:
>>>>
>>>> [root@www .spamassassin]# pwd
>>>> /var/log/spamassassin/.spamassassin
>>>> [root@www .spamassassin]# ls -la
>>>> total 1100
>>>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>>>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>>>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>>>
>>>
>>>
>>> i doubt that SA is using the bayes of root
>>> so you just rain the wrong bayes
>>
>>
>> So with a default install (CentOS 7 in my case and I suspect pretty
>> much all other systems), bayes will NOT just work by default unless
>> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
>> to use the bayes db owned by spamd
>> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
>> owned by root?
>>
>> However, I have done this:
>>
>> bayes_path /var/log/spamassassin/.spamassassin/bayes
>> bayes_file_mode 0777
>
>
> Don't do that, ever, on any regular file, on any system that has processes
> running as more than just root. I know it's in the SA Wiki, but it's an
> irresponsible recommendation.
Yeah, I was going to ask about this because it seems to me if the db
is owned by spamd and spamassassin is running as user spamd and
sa-learn is running as root then 0600 should be fine (although it's
not obvious to me why SA needs a "file mode" in the first place).
So then what do you recommend that the bayes_file_mode value be precisely?
At any rate, the whole thing seems to be working now incidentally. I
am getting BAYES_XX tags now. As stated in my other followup message,
SA seems to have detected the broken db and fixed it because it
suddenly just stated working and sa-learn --dump magic works and is
showing the right numbers.
So just for posterity, the problem was I just needed "bayes_path
/var/log/spamassassin/.spamassassin/bayes" in local.cf to make
sa-learn use that db instead of /root/.spamassassin/bayes. Looks like
it choked initially but somehow it's working now.
>> Everything is installed as user / group spamd and postfix is set to
>> call spamassassin with user=spamd. And I assume I must run sa-learn as
>> root so that it can access Maildir directories and that bayes_path
>> tells sa-learn where the db is. So now what's the problem?
>
>
> Wrong assumption.
>
> The sa-learn program is for anyone to manually work with their own Bayes DB,
> including for the owner of a system-wide Bayes DB to work with that Bayes
> DB. If you have a system-wide Bayes DB, it should be fed by either a
> system-wide filtering mechanism operating as part of the delivery process
> and running as the owner of the global DB or by users running the spamc
> client under their own ids to feed a spamd daemon running as the owner of
> the global DB or by a combination of the two. The CentOS 7 package installs
> spamd and spamc, and if you want to learn already-delivered mail into a
> global BayesDB, those are the tools to use.
Yes, I want a system-wide bayes db. And I am running spamd and spamc
and I assume that is all working (but of course I have no idea if it
really is).
But I want users to be able to put spams that get through into
~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
sa-learn on all of those messages for the system-wide db.
So can that be done without running sa-learn as root?
Ideally I would think sa-learn should be able to run as root just to
access files but use a spamd child to process them and update the
bayes db. Possible?
Mike
Re: No BAYES_XX tags in X-Spam-Report
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 22 Jun 2015, at 21:45, Michael B Allen wrote:
> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald
> <h....@thelounge.net> wrote:
>>> [root@www .spamassassin]# pwd
>>> /var/log/spamassassin/.spamassassin
>>> [root@www .spamassassin]# ls -la
>>> total 1100
>>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>>
>>
>> i doubt that SA is using the bayes of root
>> so you just rain the wrong bayes
>
> So with a default install (CentOS 7 in my case and I suspect pretty
> much all other systems), bayes will NOT just work by default unless
> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
> to use the bayes db owned by spamd
> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
> owned by root?
>
> However, I have done this:
>
> bayes_path /var/log/spamassassin/.spamassassin/bayes
> bayes_file_mode 0777
Don't do that, ever, on any regular file, on any system that has
processes running as more than just root. I know it's in the SA Wiki,
but it's an irresponsible recommendation.
> and after running sa-learn again (as root) on ham, my db is now
> broken:
>
> [root@www .spamassassin]# sa-learn --dump magic
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> bayes: bayes db version 0 is not able to be used, aborting! at
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
> 208.
> ERROR: Bayes dump returned an error, please re-run with -D for more
> information
>
> Can someone clue me in here?
See the last line. "sa-learn -D --dump magic" will provide deeper clues
as to the specific nature of the problem, which seems to be consistent
with trying to use a Bayes DB that isn't there or isn't readable.
I'm not certain what is breaking for SA after you've broken your system,
but since it's CentOS 7 it may be that SELinux is backstopping the 777
insanity. I'm unwilling to replicate the mistake to replicate the
error...
The FIRST step is to undo the breakage you directly inflicted, since
your system seems to have been working normally before, running in an
initial learning mode (i.e. without enough learned to use Bayes for
scoring).
> Everything is installed as user / group spamd and postfix is set to
> call spamassassin with user=spamd. And I assume I must run sa-learn as
> root so that it can access Maildir directories and that bayes_path
> tells sa-learn where the db is. So now what's the problem?
Wrong assumption.
The sa-learn program is for anyone to manually work with their own Bayes
DB, including for the owner of a system-wide Bayes DB to work with that
Bayes DB. If you have a system-wide Bayes DB, it should be fed by either
a system-wide filtering mechanism operating as part of the delivery
process and running as the owner of the global DB or by users running
the spamc client under their own ids to feed a spamd daemon running as
the owner of the global DB or by a combination of the two. The CentOS 7
package installs spamd and spamc, and if you want to learn
already-delivered mail into a global BayesDB, those are the tools to
use.
Re: No BAYES_XX tags in X-Spam-Report
Posted by Reindl Harald <h....@thelounge.net>.
Am 23.06.2015 um 03:45 schrieb Michael B Allen:
> On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net> wrote:
>>> [root@www .spamassassin]# pwd
>>> /var/log/spamassassin/.spamassassin
>>> [root@www .spamassassin]# ls -la
>>> total 1100
>>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>>
>> i doubt that SA is using the bayes of root
>> so you just rain the wrong bayes
>
> So with a default install (CentOS 7 in my case and I suspect pretty
> much all other systems), bayes will NOT just work by default unless
> you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
> to use the bayes db owned by spamd
> (/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
> owned by root?
>
> However, I have done this:
>
> bayes_path /var/log/spamassassin/.spamassassin/bayes
> bayes_file_mode 0777
>
> and after running sa-learn again (as root) on ham, my db is now broken
STOP ACTING AS ROOT IN GENERAL
you need to train as the user spamd is running
Re: No BAYES_XX tags in X-Spam-Report
Posted by Michael B Allen <io...@gmail.com>.
On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h....@thelounge.net> wrote:
>> [root@www .spamassassin]# pwd
>> /var/log/spamassassin/.spamassassin
>> [root@www .spamassassin]# ls -la
>> total 1100
>> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>
>
> i doubt that SA is using the bayes of root
> so you just rain the wrong bayes
So with a default install (CentOS 7 in my case and I suspect pretty
much all other systems), bayes will NOT just work by default unless
you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
to use the bayes db owned by spamd
(/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
owned by root?
However, I have done this:
bayes_path /var/log/spamassassin/.spamassassin/bayes
bayes_file_mode 0777
and after running sa-learn again (as root) on ham, my db is now broken:
[root@www .spamassassin]# sa-learn --dump magic
bayes: bayes db version 0 is not able to be used, aborting! at
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
208.
bayes: bayes db version 0 is not able to be used, aborting! at
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/BayesStore/DBM.pm line
208.
ERROR: Bayes dump returned an error, please re-run with -D for more information
Can someone clue me in here?
Everything is installed as user / group spamd and postfix is set to
call spamassassin with user=spamd. And I assume I must run sa-learn as
root so that it can access Maildir directories and that bayes_path
tells sa-learn where the db is. So now what's the problem?
Mike
Re: No BAYES_XX tags in X-Spam-Report
Posted by Reindl Harald <h....@thelounge.net>.
Am 23.06.2015 um 01:55 schrieb Michael B Allen:
> How can I tell if SA is tagging using bayes?
if you see the bayes tags in headers and logs
> [root@www .spamassassin]# pwd
> /var/log/spamassassin/.spamassassin
> [root@www .spamassassin]# ls -la
> total 1100
> drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
> drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
> -rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
> -rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
> -rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
i doubt that SA is using the bayes of root
so you just rain the wrong bayes
> [root@www .spamassassin]# sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 301 0 non-token data: nspam
> 0.000 0 11236 0 non-token data: nham
> 0.000 0 419941 0 non-token data: ntokens
> 0.000 0 1150469108 0 non-token data: oldest atime
> 0.000 0 1435015894 0 non-token data: newest atime
> 0.000 0 1435016419 0 non-token data: last journal sync atime
> 0.000 0 1435016432 0 non-token data: last expiry atime
> 0.000 0 0 0 non-token data: last expire atime delta
> 0.000 0 0 0 non-token data: last expire
again: SA don'trun as root in any sane setup and hence won't use *that*
bayes-db
> reduction count
> [root@www .spamassassin]# cat ../../maillog | grep BAYES
> [root@www .spamassassin]#
>
> I don't see any BAYES_ tags in X-Spam-Report.
see above
> I'm using a default SA install on CentOS 7.
what is a "default install"?
it can be spamass-milter or anything else calling SA
> Do I need a local.cf? From looking at the docs, it claims bayes is
> enabled by default. How can I check this?
>
> Or maybe I haven't sa-learn'd enough spam?
no, you just rain the wrong bayes and NEVER EVER should run such things
as root - http://wiki.apache.org/spamassassin/SiteWideBayesSetup
we are running spamd as well as spamass-milter with it's user and to be
explicit have the following line in /etc/mail/spamassassin/local.cf
while it's not strictly needed in that case since that's the userhome
bayes_path /var/lib/spamass-milter/.spamassassin/bayes
Re: No BAYES_XX tags in X-Spam-Report
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 22.06.15 19:55, Michael B Allen wrote:
>How can I tell if SA is tagging using bayes?
>
>[root@www .spamassassin]# pwd
>/var/log/spamassassin/.spamassassin
>[root@www .spamassassin]# ls -la
>total 1100
>drwx------ 2 spamd spamd 4096 Jun 22 19:42 .
>drwx------ 3 spamd spamd 4096 Jun 7 00:41 ..
>-rw------- 1 spamd spamd 45056 Jun 22 19:42 bayes_seen
>-rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
>-rw-r--r-- 1 spamd spamd 1869 Jun 7 00:41 user_prefs
>[root@www .spamassassin]# sa-learn --dump magic
>0.000 0 3 0 non-token data: bayes db version
>0.000 0 301 0 non-token data: nspam
>0.000 0 11236 0 non-token data: nham
>0.000 0 419941 0 non-token data: ntokens
>0.000 0 1150469108 0 non-token data: oldest atime
>0.000 0 1435015894 0 non-token data: newest atime
>0.000 0 1435016419 0 non-token data: last journal sync atime
>0.000 0 1435016432 0 non-token data: last expiry atime
>0.000 0 0 0 non-token data: last expire atime delta
>0.000 0 0 0 non-token data: last expire
>reduction count
>I don't see any BAYES_ tags in X-Spam-Report.
how did you incporporate spamassassin to mail flow? Do you run milter,
amavisd, or simply spamc from procmailrc?
the bayes database you have shown belongs to user spamd, but the user does
not get all the mail, does it?
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I just got lost in thought. It was unfamiliar territory.