You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by David Allsopp <dr...@metastack.com> on 2008/09/11 13:57:50 UTC

Querying the AWL

I've been happily using SpamAssassin with MIMEDefang for nearly a year now.
I have a question about controlling and querying the whitelist.

The per-user automatic whitelist is enabled and clearly doing "something"
(because it's growing in size) but I can't find much documentation about it.
Is there any way to query the email addresses stored in the AWL? For
example, periodically I wouldn't mind going through the AWL and promoting
addresses to the actual whitelist for each user (and then mark all the
others as permanently blacklisted) just to "help" the AWL on its way. Also,
as I use MIMEDefang, this would allow me to implement the blacklist earlier
without actually calling spamc as I could bounce the email immediately
following the SMTP FROM clause. Looking through the docs - I can't see any
way of querying the addresses and scores from the AWL... did I miss
something?

Ultimately, what I'm planning on doing, either in my MIMEDefang filter or by
parsing the sendmail logs every now and then, is to update user's whitelists
such that any email address emailed *by* a user are automatically added to
their personal whitelist in user_prefs. Additionally, because most of my
users use Outlook, I'd periodically synchronise Outlook address books with
the server. MIMEDefang is configured to bounce email above a certain
threshold: giving it the users' address books allows this bounce threshold
to be very low (e.g. 3-5) as MIMEDefang could use a higher bounce threshold
(e.g. 10) for recognised email addresses - which would hopefully still catch
SPAM with a forged from address (though from what I can see it's relatively
rare to get SPAM from a forged address that you actually know).

Is this:

a) Sensible (is it a good idea to have a huge number of email addresses in
user_prefs?)
b) Has anyone configured SpamAssassin (via MIMEDefang or any other milter)
to work this before?

I've googled around and couldn't see anything. I'm invoking SpamAssassin via
spamc running version 3.2.5 on Fedora 9 but I don't expect that's relevant
here.

Thanks in advance for any advice/tips


David

RE: Querying the AWL

Posted by Giampaolo Tomassoni <g....@libero.it>.

> -----Original Message-----
> From: David Allsopp [mailto:dra-news@metastack.com]
> Sent: Thursday, September 11, 2008 1:58 PM
> 
> I've been happily using SpamAssassin with MIMEDefang for nearly a year
> now.
> I have a question about controlling and querying the whitelist.
> 
> The per-user automatic whitelist is enabled and clearly doing
> "something" (because it's growing in size) but I can't find much
> documentation about it.
> Is there any way to query the email addresses stored in the AWL? For
> example, periodically I wouldn't mind going through the AWL and
> promoting
> addresses to the actual whitelist for each user (and then mark all the
> others as permanently blacklisted) just to "help" the AWL on its way.
> Also,
> as I use MIMEDefang, this would allow me to implement the blacklist
> earlier
> without actually calling spamc as I could bounce the email immediately
> following the SMTP FROM clause. Looking through the docs - I can't see
> any
> way of querying the addresses and scores from the AWL... did I miss
> something?

You may use a SQL backend to store AWL data. This would help a lot in
interrogating the AWL DB.

Have a look to the Mail::SpamAssassin::Plugin::AWL perldoc to setup a
SQL-backed AWL.


> Ultimately, what I'm planning on doing, either in my MIMEDefang filter
> or by parsing the sendmail logs every now and then, is to update
> user's whitelists such that any email address emailed *by* a user are
> automatically added to their personal whitelist in user_prefs.
> Additionally, because most of my users use Outlook, I'd periodically
> synchronise Outlook address books with the server. MIMEDefang is
> configured to bounce email above a certain threshold: giving it the
> users' address books allows this bounce threshold to be very low
> (e.g. 3-5) as MIMEDefang could use a higher bounce threshold
> (e.g. 10) for recognised email addresses - which would hopefully still
> catch SPAM with a forged from address (though from what I can see it's
> relatively rare to get SPAM from a forged address that you actually know).
> 
> Is this:
> 
> a) Sensible (is it a good idea to have a huge number of email addresses
> in
> user_prefs?)
> b) Has anyone configured SpamAssassin (via MIMEDefang or any other
> milter)
> to work this before?

This is a feature often referred with the name "Pen Pals". I know amavisd
has support for it.

Giampaolo


> I've googled around and couldn't see anything. I'm invoking
> SpamAssassin via
> spamc running version 3.2.5 on Fedora 9 but I don't expect that's
> relevant
> here.
> 
> Thanks in advance for any advice/tips
> 
> 
> David

Re: Querying the AWL

Posted by Jonas Eckerman <jo...@frukt.org>.

David Allsopp wrote:

> The per-user automatic whitelist is enabled and clearly doing "something"
> (because it's growing in size) but I can't find much documentation about it.

perldoc Mail::SpamAssassin::Plugin::AWL
perldoc Mail::SpamAssassin::AutoWhitelist

> Is there any way to query the email addresses stored in the AWL? For
> example, periodically I wouldn't mind going through the AWL and promoting
> addresses to the actual whitelist for each user (and then mark all the
> others as permanently blacklisted) just to "help" the AWL on its way.

If you trust the AWL enough to use it that way, maybe you should 
simply raise the "auto_whitelist_factor". This way the AWLs score 
adjustments will get bigger without the need for new code anywhere.

Personally I would only be prepared to straight white/blacklist 
for addresses that have a *very* high or low score in the AWL, 
but addresses with very high/low scores will result in a big 
score adjustment from the AWL anyway.

So promoting addresses from the AWL to white/black lists would 
only help if those lists are either used outside SA or used with 
short circuiting.

Considering promoting the addresses to straight black/white lists 
in SA, I'n not sure if SA handles partial IP addresses for 
whitelist_from_rcvd, wich is what is stored in the AWL.

> as I use MIMEDefang, this would allow me to implement the blacklist earlier

This shouldn't be too hard to do if you have the AWL use a SQL 
database.

Otherwise you should be able to do it with the help of 
Mail::SpamAssassin::AutoWhitelist.

> Looking through the docs - I can't see any
> way of querying the addresses and scores from the AWL... did I miss
> something?

perldoc Mail::SpamAssassin::AutoWhitelist should give some hints.

You might need tp read some source code though.

> Ultimately, what I'm planning on doing, either in my MIMEDefang filter or by
> parsing the sendmail logs every now and then, is to update user's whitelists
> such that any email address emailed *by* a user are automatically added to
> their personal whitelist in user_prefs.

I'm doing something similar to this with MIMEDefang and a 
SpamAssassin plugin. See below.

> Additionally, because most of my
> users use Outlook, I'd periodically synchronise Outlook address books with
> the server.

Do note that the AWL uses (a part of) the IP-address of the relay 
as well as the mail address. This information will be missing 
from the MUAs address books.

Whitelisting based only on email addresses often leads to FNs.

> Is there any way to query the email addresses stored in the AWL? For
> example, periodically I wouldn't mind going through the AWL and promoting
> addresses to the actual whitelist for each user (and then mark all the
[snip]
> MIMEDefang is configured to bounce email above a certain
> threshold: giving it the users' address books allows this bounce threshold
> to be very low (e.g. 3-5) as MIMEDefang could use a higher bounce threshold
> (e.g. 10) for recognised email addresses

You don't actually need to do anything special in SA for this. 
Since MIMEDefang knows from who a mail is, from wich relay, and 
to which local address, you could have MIMEDefang use different 
thresholds depending on this information.

Using the AWL data to adjust the threshold seems odd to me. Since 
the AWL data allready adjusts the score, an adjustment to the 
threshold as well based on the same data will just make the 
adjustment stronger. This can be done easier and with less code 
by simply adjusting "auto_whitelist_factor" for SA.

> SPAM with a forged from address (though from what I can see it's relatively
> rare to get SPAM from a forged address that you actually know).

Especially if you check the sending relay as well as the mail 
address. Wich I think you should.

> b) Has anyone configured SpamAssassin (via MIMEDefang or any other milter)
> to work this before?

Not exactly what describe, but another solutions with slightly 
similar goals.

* My mimedefang-filter saves information about all *outgoing* 
mail to a SQL database. I then have a SpamAssassin plugin that 
checks to see if incoming mail is likely to be replies to 
outgoing mail.

* The filter also keeps tracks of incoming spam/ham (as 
determined by SA) and uses this both to bypass spamassassin and 
to block mail before having to call SA.

Both the filter and the plugin is available at
http://whatever.frukt.org/mimedefangfilter.text.shtml

The filter is *huge*, but I hope it's not too hard to find the 
relevant parts of it.

Regards
/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/