You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by SuperDuper <dm...@structerre.com.au> on 2011/10/25 09:51:48 UTC

5000 x whitelist_from or whitelist_auth entries - performance hit?

I am planning on exporting a list of our client's email addresses into a file
with 5000 separate lines as such:
whitelist_from client@somebody.co


I'm running an Apple XServe with Intel Xeon Quadcores and 6Gb RAM -
processor fairly underutilised at the moment.  Is 5000 whitelist entries
expected to have a dramatic performance influence?

Also, further to this, will replacing the whitelist_from with whitelist_auth
make a dramatic difference?

Approximately what percentage of servers out there arel configured correctly
so that whitelist_auth works correctly?


-- 
View this message in context: http://old.nabble.com/5000-x-whitelist_from--or--whitelist_auth-entries---performance-hit--tp32715552p32715552.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by Benny Pedersen <me...@junc.org>.
On Tue, 25 Oct 2011 11:21:07 +0200, Robert Schetterer wrote:
> you should choose another way for whitelisting,
> i.e bypass spamassassin for trusted server ips etc
> anyway why not using i.e. whitelist_from *@somebody.co ?

this open forges to numbers of equal senders recipient, never seen in 
my logs, so if mta is not checking sender auth then dont use 
whitelist_from, its safe to use whitelist_auth

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 25 Oct 2011, RW wrote:

> On Tue, 25 Oct 2011 06:28:41 -0700 (PDT)
> John Hardin wrote:
>
>> Seconded. MTAs typically have efficient facilities for white- or
>> black-listing specific email addresses. Use the capabilities of your
>> MTA and glue layer to completely bypass SA for those addresses since
>> you _know_ you want to receive mail from them.
>
> The downside to that is that it's not going through Bayes, so there's
> no auto-learning or atime updates. So when someone with a whitelisted
> address delegates, moves-on, or uses a different account, Bayes may be
> less well prepared than it would otherwise be. I suspect that in some
> cases MTA whitelisting may actually lead to a worse FP rate than doing
> nothing - particularly where BAYES_00 has been given a more substantial
> score.

Modulo manual training with classified & miss corpora, of course. I 
distrust autolearn, but then I've never administered SA in a large user 
environment.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It is not the business of government to make men virtuous or
   religious, or to preserve the fool from the consequences of his own
   folly.                                              -- Henry George
-----------------------------------------------------------------------
  320 days since the first successful private orbital launch (SpaceX)

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by RW <rw...@googlemail.com>.
On Tue, 25 Oct 2011 06:28:41 -0700 (PDT)
John Hardin wrote:

 
> Seconded. MTAs typically have efficient facilities for white- or 
> black-listing specific email addresses. Use the capabilities of your
> MTA and glue layer to completely bypass SA for those addresses since
> you _know_ you want to receive mail from them.


The downside to that is that it's not going through Bayes, so there's
no auto-learning or atime updates. So when someone with a whitelisted
address delegates, moves-on, or uses a different account, Bayes may be
less well prepared than it would otherwise be. I suspect that in some
cases MTA whitelisting may actually lead to a worse FP rate than doing
nothing - particularly where BAYES_00 has been given a more substantial
score.

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 25 Oct 2011, Robert Schetterer wrote:

> Am 25.10.2011 09:51, schrieb SuperDuper:
>>
>> I am planning on exporting a list of our client's email addresses into a file
>> with 5000 separate lines as such:
>> whitelist_from client@somebody.co
>
> you should choose another way for whitelisting,
> i.e bypass spamassassin for trusted server ips etc

Seconded. MTAs typically have efficient facilities for white- or 
black-listing specific email addresses. Use the capabilities of your MTA 
and glue layer to completely bypass SA for those addresses since you 
_know_ you want to receive mail from them.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   False is the idea of utility that sacrifices a thousand real
   advantages for one imaginary or trifling inconvenience; that would
   take fire from men because it burns, and water because one may drown
   in it; that has no remedy for evils except destruction. The laws
   that forbid the carrying of arms are laws of such a nature. They
   disarm only those who are neither inclined nor determined to commit
   crime.               -- Cesare Beccaria, quoted by Thomas Jefferson
-----------------------------------------------------------------------
  320 days since the first successful private orbital launch (SpaceX)

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by Robert Schetterer <ro...@schetterer.org>.
Am 25.10.2011 09:51, schrieb SuperDuper:
> 
> I am planning on exporting a list of our client's email addresses into a file
> with 5000 separate lines as such:
> whitelist_from client@somebody.co
> 
> 
> I'm running an Apple XServe with Intel Xeon Quadcores and 6Gb RAM -
> processor fairly underutilised at the moment.  Is 5000 whitelist entries
> expected to have a dramatic performance influence?
> 
> Also, further to this, will replacing the whitelist_from with whitelist_auth
> make a dramatic difference?
> 
> Approximately what percentage of servers out there arel configured correctly
> so that whitelist_auth works correctly?
> 
> 
you should choose another way for whitelisting,
i.e bypass spamassassin for trusted server ips etc
anyway why not using i.e. whitelist_from *@somebody.co ?

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2011-10-25 at 00:51 -0700, SuperDuper wrote:
> I am planning on exporting a list of our client's email addresses into a file
> with 5000 separate lines as such:
> whitelist_from client@somebody.co
> 
I do essentially the same thing with an SA plugin and rule plus a
database. 

Background: I archive all incoming and outgoing mail in a PostgreSQL
database because it keeps my mail folders nice and empty while making
access to archived mail somewhat faster than searching through mail
folders is. The archive schema includes a view that contains only the
addresses of people I've sent mail to. The plugin does lookups on this
view and has an associated rule that whitelists hits by applying a
suitably large negative score. The benefit of handling whitelisting this
way is that updating is completely automatic and doesn't require SA to
be stopped and restarted each time the list changes: every time I write
or reply to a new correspondent they appear in the view.

Suggestion: there is nothing to stop the plugin from doing its lookups
against a table provided that it contains at least the same column as
the view and you have a way of keeping the table's contents up to date.
The view looks like this:

create view whitelist as
        select  distinct email
        from    address a, addresstype t
        where   a.archive='yes' and 
                a.self = 'no' and
                a.sdbk=t.asdbk and 
                t.type='To';

So a table like the following should be fine and is probably general
enough for it to be used without modification by any RDBMS. Of course it
can have other columns that help to maintain the table and/or make it
useful for other related tasks, e.g. a client list:

create table whitelist 
(
	email varchar(80) primary key
);

If this sounds useful to you, the plugin is available here:
http://www.libelle-systems.com/downloads/ma/docs/manual/whitelisting.html

I should probably package the plugin with a table definition and make it
available for freestanding use but that hasn't happened yet: maybe I
should make that my next mini-project.


Martin