You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Andy Figueroa <fi...@andyfigueroa.net> on 2007/01/19 22:36:16 UTC

use or not use awl

I'm growing increasingly wary of running with AWL enabled (I am).  When 
I see AWL invoked on particular messages, the score seems to push as 
often in the wrong direction as in the nominal right direction.

What is the evolving conventional wisdom regarding using AWL?

Andy Figueroa

Re: use or not use awl

Posted by Jonas Eckerman <jo...@frukt.org>.
Andy Figueroa wrote:

> What is the evolving conventional wisdom regarding using AWL?

I have no idea, but I do know what *my* "wisdom" regarding the AWL is.

We don't use the AWL because it gave us problems. I have no idea wether others have had the same problems.

1: Some ham was given very high spam scores because before the first legit mail from some senders we had received a some high scoring spam from the same adresses.

2: Some spam was given very low scores because they came from (forged) senders that had previously sent very low scoring ham.

3: Checking the logs and doing some calculations, I found that the AWL was almost never important in pushing a mail in the right direction. Just about all mail had allready been scored ham/spam correctly without the AWL.

So, the AWL simply didn't fit together with out incoming mail flow. It is possible that an AWL that takes SPF, DKIM/DomainKeys into account might be better.

Now, that was when we started to use SpamAssassin a couple of years ago. Things might well be different today.

But, today we're allready using another custom system in MIMEDefang insteat. This system only cares about maiul addresses addresses if they have passed SPF, DomaninKeys or DKIM. It completely bypasses SpamAssassin for mail coming from relays or such addresses that has sent enough ham and *no* spam in the past.

Since this system means that a lot of legit mail never has to go through SpamAssassin at all, I like it better than the AWL anyway. Together with greet-pause, a selective greylist, and an automatic relay black list this means that the majority of mails sent here never has to bee checked by SpamAssassin.

Regards
/Jonas
-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/


RE: use or not use awl

Posted by "Rosenbaum, Larry M." <ro...@ornl.gov>.
> From: Dave Koontz [mailto:dkoontz@mbc.edu]
> Not neccessarily. Put your awl on a sql database and add a timestamp
> column
> to the awl table, which gets automagically a new timestamp by the dbms
> each
> time a record is updated. The "timestamp" column type in Mysql is such
a
> type.
> 
> show create table awl:
> 
> CREATE TABLE `awl` (
>   `username` varchar(100) collate latin1_german1_ci NOT NULL default
'',
>   `email` varchar(200) collate latin1_german1_ci NOT NULL default '',
>   `ip` varchar(10) collate latin1_german1_ci NOT NULL default '',
>   `count` int(11) default '0',
>   `totscore` float default '0',
>   `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update
> CURRENT_TIMESTAMP,
>   PRIMARY KEY  (`username`,`email`,`ip`)
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci
> 
> Then you can easily expire by date with a cron job, for example expire
all
> that was not updated for the last 30 days:
> 
> delete from awl where timestamp < now() - interval 30 day

Can you tell me how to do something similar for the bayes_seen table and
MySQL?

Thanks, Larry

RE: use or not use awl

Posted by Dave Koontz <dk...@mbc.edu>.
 

-----Original Message-----
From: Alex Woick [mailto:alex@wombaz.de] 
Sent: Saturday, January 20, 2007 12:24 PM
To: Matt Kettler
Cc: Andy Figueroa; users@spamassassin.apache.org
Subject: Re: use or not use awl

Matt Kettler wrote:
> That said, I think the AWL is a great idea, but not ready for 
> production use on servers with reasonable mail volume. I say that 
> because it completely lacks any kind of useful (ie: atime based) expiry
mechanism.
> The only way to prune the AWL database is by hitcount, using the 
> check_whitelist script from the tools directory of the source tarball
>   
Not neccessarily. Put your awl on a sql database and add a timestamp column
to the awl table, which gets automagically a new timestamp by the dbms each
time a record is updated. The "timestamp" column type in Mysql is such a
type.

show create table awl:

CREATE TABLE `awl` (
  `username` varchar(100) collate latin1_german1_ci NOT NULL default '',
  `email` varchar(200) collate latin1_german1_ci NOT NULL default '',
  `ip` varchar(10) collate latin1_german1_ci NOT NULL default '',
  `count` int(11) default '0',
  `totscore` float default '0',
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`username`,`email`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci

Then you can easily expire by date with a cron job, for example expire all
that was not updated for the last 30 days:

delete from awl where timestamp < now() - interval 30 day

If you are running that sql statement often and have a large awl table, you
may want to add an index to the timestamp column. You can also make your
custom sql statement with a combination of timestamp and totscore as purge
criteria.

Alex



Re: use or not use awl

Posted by Matt Kettler <mk...@verizon.net>.
Alex Woick wrote:
> Matt Kettler wrote:
>> That said, I think the AWL is a great idea, but not ready for production
>> use on servers with reasonable mail volume. I say that because it
>> completely lacks any kind of useful (ie: atime based) expiry mechanism.
>> The only way to prune the AWL database is by hitcount, using the
>> check_whitelist script from the tools directory of the source tarball
>>   
> Not neccessarily. Put your awl on a sql database and add a timestamp
> column to the awl table, which gets automagically a new timestamp by
> the dbms each time a record is updated. The "timestamp" column type in
> Mysql is such a type.
>
Fair enough, with end-user add-ons and the use of SQL, the AWL can be
made production ready. However, out-of-the-box, it's not.




RE: use or not use awl

Posted by Dave Koontz <dk...@mbc.edu>.
IMO, all AWL needs is an auto expiry systems like bayes has.

For us as a College, AWL makes a HUGE difference when students submit their
thesis, term papers, etc. which at times may be on sexual debauchery, KP,
internet scams etc.  With AWL, it sees that all previous messages from this
individaul over the last x years have been good and does not block this
important email.   We enabled this feature as a direct result of faculty
complaints that some students most important / critical work sometimes
appeared as spam and was missed as a result.


-----Original Message-----
From: Alex Woick [mailto:alex@wombaz.de] 
Sent: Saturday, January 20, 2007 12:24 PM
To: Matt Kettler
Cc: Andy Figueroa; users@spamassassin.apache.org
Subject: Re: use or not use awl

Matt Kettler wrote:
> That said, I think the AWL is a great idea, but not ready for 
> production use on servers with reasonable mail volume. I say that 
> because it completely lacks any kind of useful (ie: atime based) expiry
mechanism.
> The only way to prune the AWL database is by hitcount, using the 
> check_whitelist script from the tools directory of the source tarball
>   
Not neccessarily. Put your awl on a sql database and add a timestamp column
to the awl table, which gets automagically a new timestamp by the dbms each
time a record is updated. The "timestamp" column type in Mysql is such a
type.

show create table awl:

CREATE TABLE `awl` (
  `username` varchar(100) collate latin1_german1_ci NOT NULL default '',
  `email` varchar(200) collate latin1_german1_ci NOT NULL default '',
  `ip` varchar(10) collate latin1_german1_ci NOT NULL default '',
  `count` int(11) default '0',
  `totscore` float default '0',
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`username`,`email`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci

Then you can easily expire by date with a cron job, for example expire all
that was not updated for the last 30 days:

delete from awl where timestamp < now() - interval 30 day

If you are running that sql statement often and have a large awl table, you
may want to add an index to the timestamp column. You can also make your
custom sql statement with a combination of timestamp and totscore as purge
criteria.

Alex



Re: use or not use awl

Posted by Alex Woick <al...@wombaz.de>.
Matt Kettler wrote:
> That said, I think the AWL is a great idea, but not ready for production
> use on servers with reasonable mail volume. I say that because it
> completely lacks any kind of useful (ie: atime based) expiry mechanism.
> The only way to prune the AWL database is by hitcount, using the
> check_whitelist script from the tools directory of the source tarball
>   
Not neccessarily. Put your awl on a sql database and add a timestamp 
column to the awl table, which gets automagically a new timestamp by the 
dbms each time a record is updated. The "timestamp" column type in Mysql 
is such a type.

show create table awl:

CREATE TABLE `awl` (
  `username` varchar(100) collate latin1_german1_ci NOT NULL default '',
  `email` varchar(200) collate latin1_german1_ci NOT NULL default '',
  `ip` varchar(10) collate latin1_german1_ci NOT NULL default '',
  `count` int(11) default '0',
  `totscore` float default '0',
  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update 
CURRENT_TIMESTAMP,
  PRIMARY KEY  (`username`,`email`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci

Then you can easily expire by date with a cron job, for example expire 
all that was not updated for the last 30 days:

delete from awl where timestamp < now() - interval 30 day

If you are running that sql statement often and have a large awl table, 
you may want to add an index to the timestamp column. You can also make 
your custom sql statement with a combination of timestamp and totscore 
as purge criteria.

Alex

Re: use or not use awl

Posted by Matt Kettler <mk...@verizon.net>.
Andy Figueroa wrote:
> I'm growing increasingly wary of running with AWL enabled (I am). 
> When I see AWL invoked on particular messages, the score seems to push
> as often in the wrong direction as in the nominal right direction.
>
> What is the evolving conventional wisdom regarding using AWL?

Don't worry about the score direction until you've read:

http://wiki.apache.org/spamassassin/AwlWrongWay

That said, I think the AWL is a great idea, but not ready for production
use on servers with reasonable mail volume. I say that because it
completely lacks any kind of useful (ie: atime based) expiry mechanism.
The only way to prune the AWL database is by hitcount, using the
check_whitelist script from the tools directory of the source tarball.

For small volume servers and home users it's probably OK, but beware the
AWL database doesn't auto-expire like bayes so you'll have to keep an
eye on its size.