You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Grant Taylor <gt...@tnetconsulting.net> on 2018/11/26 03:54:14 UTC

Is $THIS possible?

Is it possible to have per recipient rules (when running spamd & 
spamass-milter) that read a (hashed) list of addresses?

I'm pontificating creating tests against To: / CC: addresses to see how 
many of them I've added to a list.

Ultimately I'd like to have a (hashed) list addresses that I recognize 
and add (0.1?) to the spam score for each unknown address.

Is anything like this possible with SpamAssassin?  Or do I need to back 
up and refactor my problem / solution?



-- 
Grant. . . .
unix || die

Re: Is $THIS possible?

Posted by Grant Taylor <gt...@tnetconsulting.net>.

Hi Giovanni,

On 11/27/2018 12:56 AM, Giovanni Bechis wrote:
> I do not know if it's viable for your own use but amavisd penpal feature 
> could be an option (https://www.ijs.si/software/amavisd/#features-spam) It 
> creates a redis database where it correlates outbound msg-id and replies 
> so it can subtract score if an email msg it's a reply to a known sender.
Intriguing.  I'll have to check that out.

It sounds like it's conceptually similar to a stateful firewall for 
email.  As in if there is known email conversation state (akin to 
connection state) then a (small?) value is deducted from the spam score. 
  Thus meaning messages that might be flagged as spam on their own might 
pass through unmodified if they are part of an ongoing conversation.

Very interesting.

Thank you for sharing amavisd penpal with me.  :-)

-- 
Grant. . . .
unix || die

Re: Is $THIS possible?

Posted by Giovanni Bechis <gi...@paclan.it>.

On 11/26/18 11:10 PM, Grant Taylor wrote:
> On 11/26/2018 02:33 PM, Martin Gregorie wrote:
>> I think that fear is unfounded
> 
> Please don't mistake my laziness as fear.  I simply am not motivated enough to construct a solution that will harvest outgoing recipient addresses.
> 
I do not know if it's viable for your own use but amavisd penpal feature could be an option (https://www.ijs.si/software/amavisd/#features-spam)
It creates a redis database where it correlates outbound msg-id and replies so it can subtract score if an email msg it's a reply to a known sender.

 Giovanni


> I might be interested and motivated enough to (eventually) construct something to check against an LDAP address book.  —  I've been pontificating creating an LDAP address book anyway.  So if something else can make use of it, all the better.  Especially if said something else is directly related to email (filtering).
> 
>> IOW, if you build a whitelist containing just the addresses your outgoing mail is addressed to and periodically trim it to retain only addresses that stuff has been sent to in the last 24 months years I predict that your list size will stabilise despite user churn simply because most people's address lists don't change much from year to year.
> 
> That all makes sense and I tend to agree with it.  It's just not what I'm currently pontificating doing.
> 
>> And, of course, mail concerning online purchases is 99% incoming, so the addresses on it will never get into this type of whitelist.
> 
> I initially think the same thing about address books.  But some MUAs have an option (maybe on by default) that automatically add senders and / or outgoing recipients to their address book.  I prefer to manually manage my address book.  —  But that's just me and I do realize that I'm odd like that.
> 
> 
>

Re: Is $THIS possible?

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 11/26/2018 02:33 PM, Martin Gregorie wrote:
> I think that fear is unfounded

Please don't mistake my laziness as fear.  I simply am not motivated 
enough to construct a solution that will harvest outgoing recipient 
addresses.

I might be interested and motivated enough to (eventually) construct 
something to check against an LDAP address book.  —  I've been 
pontificating creating an LDAP address book anyway.  So if something 
else can make use of it, all the better.  Especially if said something 
else is directly related to email (filtering).

> IOW, if you build a whitelist containing just the addresses your outgoing 
> mail is addressed to and periodically trim it to retain only addresses 
> that stuff has been sent to in the last 24 months years I predict 
> that your list size will stabilise despite user churn simply because 
> most people's address lists don't change much from year to year.

That all makes sense and I tend to agree with it.  It's just not what 
I'm currently pontificating doing.

> And, of course, mail concerning online purchases is 99% incoming, so 
> the addresses on it will never get into this type of whitelist.

I initially think the same thing about address books.  But some MUAs 
have an option (maybe on by default) that automatically add senders and 
/ or outgoing recipients to their address book.  I prefer to manually 
manage my address book.  —  But that's just me and I do realize that I'm 
odd like that.

-- 
Grant. . . .
unix || die

Re: Is $THIS possible?

Posted by Martin Gregorie <ma...@gregorie.org>.

On Mon, 2018-11-26 at 12:38 -0700, Grant Taylor wrote:
> I agree with your logic.  But I don't know if I want to organically
> grow the list based on outgoing email recipients.  I think I'd rather
> use the contents of address books.  (Obviously something needs to get
> said address book data from MUAs to the server where it can use it.)
> 
I think that fear is unfounded unless your user population has a fairly
high turnover. Mine is static: my database is a mail archive with just
myself as user, so has an absolutely stable user base. I use a view to
generate the address whitelist by selecting only the addresses that
I've sent mail to so, for instance this automatically deselects almost
all the addresses in mass mailing I've received. The current stats are:

messages archived:    189997 
all addresses:         15936
whitelisted addresses: 10919

I don't normally keep stats on the whitelist sixe but I do watch the
other two and have noticed that the whole address list has stayed at
around 15,000 entries for several years, which is fairly amazing
considering the number of 'mass mailings' I get from friends and around
Xmas.

IOW, if you build a whitelist containing just the addresses your
outgoing mail is addressed to and periodically trim it to retain only
addresses that stuff has been sent to in the last 24 months years I
predict that your list size will stabilise despite user churn simply
because most people's address lists don't change much from year to
year. And, of course, mail concerning online purchases is 99% incoming,
so the addresses on it will never get into this type of whitelist.
 
Martin



> > Other points:
> > 
> > - if each address entry carries the date mail was last sent to it 
> > you'll have an easy way to purge the list of addresses that nobody 
> > has corresponded with in, say, the last two years: this 'time to
> > live' 
> > is long enough to deal with annual subscriptions, etc.
> > 
> > - you'll also need a tool for removing spammers that got on because
> > a 
> > user clicked 'send' without reading a message carefully enough to
> > see 
> > that it was spam
> 
> I understand your points.  But I think your point's merit depends on
> the 
> organic / automatic growth from outgoing email.  Which I'm not
> wanting 
> to do at this time.
> 
> > I've had this sort of system running for about 10 years now, using 
> > PostgreSQL as the database. By and large this looks after itself
> > without 
> > needing more than sporadic maintenance, usually when PostgreSQL has
> > a 
> > major upgrade every year or two. But then PostgreSQL is designed to
> > be 
> > self maintaining apart from making periodic backups. I do these
> > weekly.
> 
> ACK
> 
> I wonder if I could leverage LDAP instead of a (more) traditional
> SQL 
> database.  That way the same data set might be used for more than
> just 
> this purpose.  It might even be possible to use the LDAP address book
> as 
> the data source for this.  }:-)
> 
> I suspect I could just as easily have something dynamically update
> the 
> LDAP address book as I could an SQL database.  Granted, the
> mechanics 
> would be different, but it could still be done.
> 
> Thank you for confirming that (something along the lines of) $THIS
> is 
> possible.
> 
> 
>

Re: Is $THIS possible?

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 11/26/2018 06:08 AM, Martin Gregorie wrote:
> Write yourself a plugin which looks up a database table of known 
> addresses. Thats not hard if you know a bit of Perl,

ACK

> though the list of incoming addresses sounds too simplistic to be much 
> use: how would it distinguish between spammers and non-spammers?

My idea is to use the number of recognized vs unrecognized addresses in 
the To: & CC: headers as a signal of how likely the message is to be 
spam.  (This is where I was considering adding something to the spam 
score for each unrecognized address.)

> Instead, consider populating the database with addresses that your users 
> have sent mail to because by and large these will not be spammers.

I agree with your logic.  But I don't know if I want to organically grow 
the list based on outgoing email recipients.  I think I'd rather use the 
contents of address books.  (Obviously something needs to get said 
address book data from MUAs to the server where it can use it.)

> Other points:
> 
> - if each address entry carries the date mail was last sent to it 
> you'll have an easy way to purge the list of addresses that nobody 
> has corresponded with in, say, the last two years: this 'time to live' 
> is long enough to deal with annual subscriptions, etc.
> 
> - you'll also need a tool for removing spammers that got on because a 
> user clicked 'send' without reading a message carefully enough to see 
> that it was spam

I understand your points.  But I think your point's merit depends on the 
organic / automatic growth from outgoing email.  Which I'm not wanting 
to do at this time.

> I've had this sort of system running for about 10 years now, using 
> PostgreSQL as the database. By and large this looks after itself without 
> needing more than sporadic maintenance, usually when PostgreSQL has a 
> major upgrade every year or two. But then PostgreSQL is designed to be 
> self maintaining apart from making periodic backups. I do these weekly.

ACK

I wonder if I could leverage LDAP instead of a (more) traditional SQL 
database.  That way the same data set might be used for more than just 
this purpose.  It might even be possible to use the LDAP address book as 
the data source for this.  }:-)

I suspect I could just as easily have something dynamically update the 
LDAP address book as I could an SQL database.  Granted, the mechanics 
would be different, but it could still be done.

Thank you for confirming that (something along the lines of) $THIS is 
possible.

-- 
Grant. . . .
unix || die

Re: Is $THIS possible?

Posted by Henrik K <he...@hege.li>.

On Mon, Nov 26, 2018 at 01:08:04PM +0000, Martin Gregorie wrote:
>
> Instead, consider populating the database with addresses that your
> users have sent mail to because by and large these will not be
> spammers.

If using postfix, one could use my postpals tool for this too..

http://mailfud.org/postpals/

Re: Is $THIS possible?

Posted by Martin Gregorie <ma...@gregorie.org>.

On Sun, 2018-11-25 at 20:54 -0700, Grant Taylor wrote:
> Ultimately I'd like to have a (hashed) list addresses that I
> recognize and add (0.1?) to the spam score for each unknown address.
> 
Write yourself a plugin which looks up a database table of known
addresses. Thats not hard if you know a bit of Perl, though the list of
incoming addresses sounds too simplistic to be much use: how would it
distinguish between spammers and non-spammers?

Instead, consider populating the database with addresses that your
users have sent mail to because by and large these will not be
spammers. Other points:

- if each address entry carries the date mail was last sent to it
  you'll have an easy way to purge the list of addresses that nobody
  has corresponded with in, say, the last two years: this 'time to
  live' is long enough to deal with annual subscriptions, etc.

- you'll also need a tool for removing spammers that got on because a
  user clicked 'send' without reading a message carefully enough to
  see that it was spam

I've had this sort of system running for about 10 years now, using
PostgreSQL as the database. By and large this looks after itself
without needing more than sporadic maintenance, usually when PostgreSQL
has a major upgrade every year or two. But then PostgreSQL is designed
to be self maintaining apart from making periodic backups. I do these
weekly.

Martin