You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karl Denninger <ka...@denninger.net> on 2016/11/21 16:12:55 UTC

Bayes scoring and role accounts

I'm using SpamAssassin on a system that uses Postfix for MTA and Dovecot
for handling final delivery.  Spamassassin is being called via Postfix
through spamd with:

#
# Spam Assassin bayesian filter updaters
#
sa-spam unix    -       n       n       -       -       pipe
user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl spam ${sender}
sa-ham  unix    -       n       n       -       -       pipe
user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl ham ${sender}

I have a material number of role accounts on the box that are all
aliased to the various places they need to go.  Most of these do not
have entries in /etc/passwd, that is, they're not real login accounts.

The issue is that if I am reading the code correctly my particular Bayes
database (for "karl") is not being consulted, and can't be, for anything
that comes into a role account since the user side of the email address
is (obviously) not altered in the message.  As a result I have the
rulesets, but none of the "training" that individual Bayes recognition
would provide, nor is there any way for that training to take place
since none of these accounts are "real".

sa-learn --dump magic -u karl shows the expected (large) number of
tokens in the database, but the same command targeting any of the role
account names shows nearly nothing (which isn't surprising since they're
role accounts and not real user logins.)

How have people dealt with this -- or do they?

-- 
Karl Denninger
karl@denninger.net <ma...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

Re: Bayes scoring and role accounts

Posted by Joe Quinn <jq...@pccc.com>.
On 11/21/2016 11:27 AM, Karl Denninger wrote:
>
> On 11/21/2016 10:12, Karl Denninger wrote:
>> I'm using SpamAssassin on a system that uses Postfix for MTA and 
>> Dovecot for handling final delivery.  Spamassassin is being called 
>> via Postfix through spamd with:
>>
>> #
>> # Spam Assassin bayesian filter updaters
>> #
>> sa-spam unix    -       n       n       -       -       pipe 
>> user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl spam ${sender}
>> sa-ham  unix    -       n       n       -       -       pipe 
>> user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl ham ${sender}
>>
>> I have a material number of role accounts on the box that are all 
>> aliased to the various places they need to go.  Most of these do not 
>> have entries in /etc/passwd, that is, they're not real login accounts.
>>
>> The issue is that if I am reading the code correctly my particular 
>> Bayes database (for "karl") is not being consulted, and can't be, for 
>> anything that comes into a role account since the user side of the 
>> email address is (obviously) not altered in the message.  As a result 
>> I have the rulesets, but none of the "training" that individual Bayes 
>> recognition would provide, nor is there any way for that training to 
>> take place since none of these accounts are "real".
>>
>> sa-learn --dump magic -u karl shows the expected (large) number of 
>> tokens in the database, but the same command targeting any of the 
>> role account names shows nearly nothing (which isn't surprising since 
>> they're role accounts and not real user logins.)
>>
>> How have people dealt with this -- or do they?
>>
>>
> To add to this the way the bayes database gets built (other than via 
> auto-add) is from anything that a user sticks in the "Junk" folder.  
> There is a cron job that runs every hour that runs sa-learn against 
> that and then moves anything it finds in there to a "Junk-Saved" 
> folder, expiring anything older than 14 days from that folder (so spam 
> emails are held for 2 weeks.)  Dovecot is configured to deliver 
> confirmed spam to the "Junk" folder as well.
>
> Is the best way to handle role accounts to (1) create a "dummy" user 
> account for them and (2) have the script that runs sa-learn add spam 
> to not only the target's account but also, if the target is a role 
> account, to each of the role account's database entries as well?  
> That's a somewhat-messy maintenance job if/when role accounts are 
> added/removed/changed, but it appears to be the only way to accomplish 
> the goal.
>
> -- 
> Karl Denninger
> karl@denninger.net <ma...@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/

I can't speak for specifically making it work with Postfix, but you 
usually want a site-wide Bayes database. No matter what (real or fake) 
user is receiving the message, it would get trained as the spamd user, 
or whatever ends up running SA. That same user runs SA and reads that 
appropriate database, which gets training from everyone and classifies 
based on a much more statistically useful volume of data.


Re: Bayes scoring and role accounts

Posted by Karl Denninger <ka...@denninger.net>.
On 11/21/2016 10:12, Karl Denninger wrote:
> I'm using SpamAssassin on a system that uses Postfix for MTA and
> Dovecot for handling final delivery.  Spamassassin is being called via
> Postfix through spamd with:
>
> #
> # Spam Assassin bayesian filter updaters
> #
> sa-spam unix    -       n       n       -       -       pipe
> user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl spam ${sender}
> sa-ham  unix    -       n       n       -       -       pipe
> user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl ham ${sender}
>
> I have a material number of role accounts on the box that are all
> aliased to the various places they need to go.  Most of these do not
> have entries in /etc/passwd, that is, they're not real login accounts.
>
> The issue is that if I am reading the code correctly my particular
> Bayes database (for "karl") is not being consulted, and can't be, for
> anything that comes into a role account since the user side of the
> email address is (obviously) not altered in the message.  As a result
> I have the rulesets, but none of the "training" that individual Bayes
> recognition would provide, nor is there any way for that training to
> take place since none of these accounts are "real".
>
> sa-learn --dump magic -u karl shows the expected (large) number of
> tokens in the database, but the same command targeting any of the role
> account names shows nearly nothing (which isn't surprising since
> they're role accounts and not real user logins.)
>
> How have people dealt with this -- or do they?
>
>
To add to this the way the bayes database gets built (other than via
auto-add) is from anything that a user sticks in the "Junk" folder. 
There is a cron job that runs every hour that runs sa-learn against that
and then moves anything it finds in there to a "Junk-Saved" folder,
expiring anything older than 14 days from that folder (so spam emails
are held for 2 weeks.)  Dovecot is configured to deliver confirmed spam
to the "Junk" folder as well.

Is the best way to handle role accounts to (1) create a "dummy" user
account for them and (2) have the script that runs sa-learn add spam to
not only the target's account but also, if the target is a role account,
to each of the role account's database entries as well?  That's a
somewhat-messy maintenance job if/when role accounts are
added/removed/changed, but it appears to be the only way to accomplish
the goal.

-- 
Karl Denninger
karl@denninger.net <ma...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/