You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Peter Farrell <pe...@gmail.com> on 2007/07/03 12:37:11 UTC

Are W. Stearn's blacklist in 3.2.* usable?

Hi all.

Testing new setup:
CentOS 4.4
amavisd-new-2.5.1
SpamAssassin version 3.2.1
  running on Perl version 5.8.5
+RulesDuJour
Quad proc Dell PE w/ 4 GB RAM.

Using calls to the timestamp function I've been testing this setup
over the past week. While following the debug output I've removed:
SARE_SPECIFIC
SARE_FRAUD and
SARE_HEADER0 from my TRUSTED_RULESETS in RulesDuJour/config

And also removed
99_sare_fraud_post25x.cf,
70_sare_header0.cf,
70_sare_specific.cf in /etc/mail/spamassassin -D --lint. It is not
compatible with SA3.2.

Fair enough. But the processing time during the manual test was still
really slow. Depending on the message, the total processing time
averaged between 8-15 >>minutes<< per message!
*If I then dropped both the blacklist[-uri] out, the timing was a
consistent ~45 seconds per message.
(using)
# su vscan -c 'spamassassin -D < sample-spam-GTUBE-junk.txt 2>&1' |
timestamp > $HOME/SAdebug_spam-GTUBE_10

[root@sabik ~]# head -1 SAdebug_spam-GTUBE_10; tail -1 SAdebug_spam-GTUBE_10
10:03:02.508 2.354 2.354 [32673] dbg: logger: adding facilities: all
10:12:29.882 569.727 0.000

It was down to the 2 blacklist files. So I removed them. I couldn't
see it in an 'obvious' way in the debug output, it would  just hang
forever after:
dbg: plugin: loading Mail::SpamAssassin::Plugin::ImageInfo from @INC
- So, I pulled all the rulesdujour out of /etc/mail/spamassassin and
added them in one by one, along w/ a debug test message until I could
find which rules were holding it back.

After, I put it in the production stream (w/ no blacklist) and let
around 5000 messages through. (With pyzor-razor2-dcc-SA-amavisd-clamd
all running correctly.) I awk'd out the timing from the mail.log I was
seeing the general average 'total processing time' between 4-7 seconds
per message. No errors in test debug output or anything via syslogd.
I'm quite happy with this, but I'd like to make use of the blacklist
as well!

So my questions are:
1. is the timing 'normal' when using the blacklist rules called
through 'spamassassin'? Is it just a storm in a teacup? When it's
called from Perl will it all be loaded into memory and the timing will
drop down?
2. are the rules compatible w/ the 3.2 branch of SA?
3. if it's 'wrong' how does one debug further? I've enabled level 5 in
amavisd.conf & 'smtpd -v' at the top of my master.cf. Am I looking in
the wrong place? Am I missing some sort of Perl module that would
mitigate this in some way? (I'll list these at the end)

-Peter Farrell
Cardiff, Wales


############################
installed modules
############################
Archive::Extract -- 0.18
Archive::Tar -- 1.30
Archive::Zip -- 1.18
BerkeleyDB -- 0.31
CPAN -- 1.9102
CPAN::Reporter -- 0.44
Class::ErrorHandler -- 0.01
Class::Loader -- 2.03
Compress::Raw::Zlib -- 2.004
Compress::Zlib -- 2.004
Config::Tiny -- 2.10
Convert::ASCII::Armour -- 1.4
Convert::PEM -- 0.07
Convert::TNEF -- 0.17
Convert::UUlib -- 1.08
Crypt::Blowfish -- 2.10
Crypt::CAST5_PP -- 1.04
Crypt::CBC -- 2.22
Crypt::DES -- 2.05
Crypt::DES_EDE3 -- 0.01
Crypt::DSA -- 0.14
Crypt::IDEA -- 1.08
Crypt::OpenPGP -- 1.03
Crypt::OpenSSL::RSA -- 0.24
Crypt::OpenSSL::Random -- 0.03
Crypt::Primes -- 0.50
Crypt::RIPEMD160 -- 0.04
Crypt::RSA -- 1.58
Crypt::Random -- 1.25
Crypt::Rijndael -- 1.04
Crypt::Twofish -- 2.12
Cwd -- 3.25
DB_File -- 1.815
Data::Buffer -- 0.04
Data::Dump -- 1.08
Digest::MD2 -- 2.03
Digest::MD5 -- 2.36
Digest::SHA -- 5.44
Digest::SHA1 -- 2.11
Encode::Detect -- 1.00
Error -- 0.17008
ExtUtils::CBuilder -- 0.19
ExtUtils::MakeMaker -- 6.32
File::Copy::Recursive -- 0.33
File::HomeDir -- 0.65
File::Temp -- 0.18
File::Which -- 0.05
File::pushd -- 0.99
HTML::Parser -- 3.56
IO -- 1.23
IO::CaptureOutput -- 1.03
IO::Compress::Base -- 2.004
IO::Compress::Zlib -- ???
IO::Socket::INET6 -- 2.51
IO::Socket::SSL -- 1.06
IO::Stringy -- 2.110
IO::Zlib -- 1.05
IP::Country -- 2.23
IPC::Cmd -- 0.36
IPC::Run3 -- 0.037
Image::Info -- 1.24
LWP -- 5.805
Locale::Maketext::Simple -- 0.18
Log::Message -- 0.01
Log::Message::Simple -- 0.01
MIME-tools -- ???
MIME::Base64 -- 3.07
Mail -- ???
Mail::DKIM -- 0.24
Mail::SPF -- v2.004
Mail::SPF::Query -- 1.999.1
Mail::SpamAssassin -- 3.002001
Math::Pari -- 2.010709
Module::Build -- 0.2808
Module::CoreList -- 2.11
Module::Load -- 0.10
Module::Load::Conditional -- 0.16
Module::Loaded -- 0.01
Module::Pluggable -- 3.6
Net -- ???
Net::CIDR::Lite -- 0.20
Net::DNS -- 0.59
Net::DNS::Resolver::Programmable -- 0.002.2
Net::IP -- 1.25
Net::Ident -- 1.20
Net::SSLeay -- 1.30
Net::Server -- 0.96
NetAddr::IP --  4.004
Object::Accessor -- 0.32
Package::Constants -- 0.01
Params::Check -- 0.26
Perl -- 5.8.5
Pod::Escapes -- 1.04
Pod::Parser -- 1.35
Pod::Simple -- 3.05
Probe::Perl -- 0.01
Socket6 -- 0.19
Sort::Versions -- 1.5
Sys::Hostname::Long -- 1.4
Tee -- 0.13
Term::ReadKey -- 2.14
Term::ReadLine -- 1.01
Term::UI -- 0.14
Test::Harness -- 2.64
Test::Reporter -- 1.27
Tie::EncryptedHash -- 1.8
Time::HiRes -- 1.9707
Time::Local -- 1.17
URI -- 1.35
Unix::Syslog -- 0.99
YAML -- 0.62
razor-agents -- ???
version -- 0.7203
########################

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Richard Frovarp <Ri...@sendit.nodak.edu>.
Jeff Chan wrote:
> Quoting Peter Farrell <pe...@gmail.com>:
>
>   
>> Hi all.
>>
>> Testing new setup:
>> CentOS 4.4
>> amavisd-new-2.5.1
>> SpamAssassin version 3.2.1
>>   running on Perl version 5.8.5
>> +RulesDuJour
>> Quad proc Dell PE w/ 4 GB RAM.
>>
>> Using calls to the timestamp function I've been testing this setup
>> over the past week. While following the debug output I've removed:
>> SARE_SPECIFIC
>> SARE_FRAUD and
>> SARE_HEADER0 from my TRUSTED_RULESETS in RulesDuJour/config
>>
>> And also removed
>> 99_sare_fraud_post25x.cf,
>> 70_sare_header0.cf,
>> 70_sare_specific.cf in /etc/mail/spamassassin -D --lint. It is not
>> compatible with SA3.2.
>>
>> Fair enough. But the processing time during the manual test was still
>> really slow. Depending on the message, the total processing time
>> averaged between 8-15 >>minutes<< per message!
>> *If I then dropped both the blacklist[-uri] out, the timing was a
>> consistent ~45 seconds per message.
>>     
>
>
> Please DO NOT use sa-blacklist.  Use multi.surbl.org instead.  Bill will tell
> you the same thing when he gets a chance.
>
> No one should be using sa-blacklist any more.  It's way too large and
> inefficient.  The WS bit in multi.surbl.org has the same data and it's in
> DNSBL form so there is no huge ruleset to fill up your memory, just DNS
> queries.   In your case it's probably causing spamassassin to swap out of
> memory.
>
> See:
>
>   http://www.surbl.org/
>
> Jeff C.
>
>   
Make sure you have a caching name server on the machine as well.

Richard

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Jeff Chan <je...@surbl.org>.
Quoting Theo Van Dinter <fe...@apache.org>:

> On Tue, Jul 03, 2007 at 06:04:33AM -0500, Jeff Chan wrote:
> > Please DO NOT use sa-blacklist.  Use multi.surbl.org instead.  Bill will
> tell
> > you the same thing when he gets a chance.
>
> It seems as if the blacklist.cf file is still available for people to
> download, since this question comes up periodically.  If people aren't
> supposed to use it, rm blacklist.cf ?


Yes, probably, and Bill would probably agree too.

Jeff C.

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Jul 03, 2007 at 06:04:33AM -0500, Jeff Chan wrote:
> Please DO NOT use sa-blacklist.  Use multi.surbl.org instead.  Bill will tell
> you the same thing when he gets a chance.

It seems as if the blacklist.cf file is still available for people to
download, since this question comes up periodically.  If people aren't
supposed to use it, rm blacklist.cf ?

-- 
Randomly Selected Tagline:
"It is not the strongest of the species that survives, not the most
 intelligent, but the one most responsive to change."    - Charles Darwin

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Jeff Chan <je...@surbl.org>.
Quoting Peter Farrell <pe...@gmail.com>:

> Hi all.
>
> Testing new setup:
> CentOS 4.4
> amavisd-new-2.5.1
> SpamAssassin version 3.2.1
>   running on Perl version 5.8.5
> +RulesDuJour
> Quad proc Dell PE w/ 4 GB RAM.
>
> Using calls to the timestamp function I've been testing this setup
> over the past week. While following the debug output I've removed:
> SARE_SPECIFIC
> SARE_FRAUD and
> SARE_HEADER0 from my TRUSTED_RULESETS in RulesDuJour/config
>
> And also removed
> 99_sare_fraud_post25x.cf,
> 70_sare_header0.cf,
> 70_sare_specific.cf in /etc/mail/spamassassin -D --lint. It is not
> compatible with SA3.2.
>
> Fair enough. But the processing time during the manual test was still
> really slow. Depending on the message, the total processing time
> averaged between 8-15 >>minutes<< per message!
> *If I then dropped both the blacklist[-uri] out, the timing was a
> consistent ~45 seconds per message.


Please DO NOT use sa-blacklist.  Use multi.surbl.org instead.  Bill will tell
you the same thing when he gets a chance.

No one should be using sa-blacklist any more.  It's way too large and
inefficient.  The WS bit in multi.surbl.org has the same data and it's in
DNSBL form so there is no huge ruleset to fill up your memory, just DNS
queries.   In your case it's probably causing spamassassin to swap out of
memory.

See:

  http://www.surbl.org/

Jeff C.

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Peter Farrell <pe...@gmail.com>.
Thanks for all the advice. It's been extremely helpful.
RE: the comment for local caching name server - I'd not really thought
about that when I was deploying these, but it makes sense and I rolled
that out this afternoon.

RE: RulesDuJour
I didn't find these things documented anywhere. Ie. What's for
production, what's for research, when not to mix-n-match, why one is
depreciated for another, etc.
As I said before - I was trying them by trial and error to see what
works while tracking my timing...  At the end of the day I'm left w/ a
much edited and picked apart parameter list for 'TRUSTED RULESETS'.

I had been on the SURBL site just this morning but nothing really
'clicked' for me. I re-read the docs, I knew it already existed in
/usr/share/spamassassin, etc.

I went over to William Stearn's website as well thinking I'd just had
a duffer file or something and saw that the last update was July 3rd -
and just assumed that I was meant to be using it. I mean, it's
integrated into the RDJ's, the site's updated regularly, he seems like
a pretty legit player, etc. What's a girl to do?

In any case - I've updated all local documentation for the next
person, the next time around. Many thanks!

-Peter Farrell

On 03/07/07, Matt Kettler <mk...@verizon.net> wrote:
> Peter Farrell wrote:
> > Hi all.
> >
> > Testing new setup:
> > CentOS 4.4
> > amavisd-new-2.5.1
> > SpamAssassin version 3.2.1
> >  running on Perl version 5.8.5
> > +RulesDuJour
> > Quad proc Dell PE w/ 4 GB RAM.
> >
> Point blank. In general, *NOBODY* should use WS's blacklist file's for
> ANYTHING. It is most unfortunate that RDJ has a built-in configuration
> for this file.
>
> Just take a look at the size of the files. sa-blacklist is over 24 MB!
>
> 1) the uri blacklist is redundant with SURBL. SURBL is lightweight and
> reasonably fast, while the uri blacklist is a heavy memory burden and
> relatively slow.
>
> 2) the email address blacklist is interesting for research purposes, but
> it's real-world use is almost pointless. spammers rotate domains in from
> addresses so often that the gains of this blacklist are limited, and the
> memory consumption is absurd.
>
> The files add something like 500MB to an instance of SA. That's *HUGE*.
> Check your memory usage and see if the blacklist file is making your box
> page. your box *might* be enough to handle the sa-blacklist, but
> personally I'd consider your box kinda borderline stats-wise for running
> sa-blacklist. I'd generally think more on the scale of 8GB of ram unless
> I was going to constrain SA to only existing in 1 or 2 instances.
> > So my questions are:
> > 1. is the timing 'normal' when using the blacklist rules called
> > through 'spamassassin'? Is it just a storm in a teacup? When it's
> > called from Perl will it all be loaded into memory and the timing will
> > drop down?
> Well, calling 'spamassassin' with sa-blacklist loaded is going to be
> very painful. sa-blacklist will cause SA to initialize around 500MB of
> memory, that's not quick.
>
> Or were those multi-minute times from amavis? That would be a bit much,
> and I'd be checking to see if you're thrashing your swap partition.
>
> Even so, I'd still expect it to take a least 60 seconds to scan a
> message with these blacklist files loaded, on a very fast CPU.
>
> > 2. are the rules compatible w/ the 3.2 branch of SA?
> Yes, both of WS's blacklist files are technically compatible with most
> any version of SA, save very, very old ones that don't support the uri
> keyword. (at the very least, both will work with anything from 2.40 and
> higher. digging back futher than 2.40 is an archaeological dig I'm not
> really interested in at the moment).
>
> However, in practice, sa-blacklist is not practical for real-world use,
> so you could also say it's incompatible with every version of SA.
>
> > 3. if it's 'wrong' how does one debug further? I've enabled level 5 in
> > amavisd.conf & 'smtpd -v' at the top of my master.cf. Am I looking in
> > the wrong place? Am I missing some sort of Perl module that would
> > mitigate this in some way? (I'll list these at the end)
> Nope. sa-blacklist is just too huge for practical purposes. SA is
> designed to efficiently support hundreds, even thousands of
> blacklist_from's, but sa-blacklist has hundreds of thousands of them.
> (691,372 in fact).
>
>
>
>

Re: Are W. Stearn's blacklist in 3.2.* usable?

Posted by Matt Kettler <mk...@verizon.net>.
Peter Farrell wrote:
> Hi all.
>
> Testing new setup:
> CentOS 4.4
> amavisd-new-2.5.1
> SpamAssassin version 3.2.1
>  running on Perl version 5.8.5
> +RulesDuJour
> Quad proc Dell PE w/ 4 GB RAM.
>
Point blank. In general, *NOBODY* should use WS's blacklist file's for
ANYTHING. It is most unfortunate that RDJ has a built-in configuration
for this file.

Just take a look at the size of the files. sa-blacklist is over 24 MB!

1) the uri blacklist is redundant with SURBL. SURBL is lightweight and
reasonably fast, while the uri blacklist is a heavy memory burden and
relatively slow.

2) the email address blacklist is interesting for research purposes, but
it's real-world use is almost pointless. spammers rotate domains in from
addresses so often that the gains of this blacklist are limited, and the
memory consumption is absurd.

The files add something like 500MB to an instance of SA. That's *HUGE*.
Check your memory usage and see if the blacklist file is making your box
page. your box *might* be enough to handle the sa-blacklist, but
personally I'd consider your box kinda borderline stats-wise for running
sa-blacklist. I'd generally think more on the scale of 8GB of ram unless
I was going to constrain SA to only existing in 1 or 2 instances.
> So my questions are:
> 1. is the timing 'normal' when using the blacklist rules called
> through 'spamassassin'? Is it just a storm in a teacup? When it's
> called from Perl will it all be loaded into memory and the timing will
> drop down?
Well, calling 'spamassassin' with sa-blacklist loaded is going to be
very painful. sa-blacklist will cause SA to initialize around 500MB of
memory, that's not quick.

Or were those multi-minute times from amavis? That would be a bit much,
and I'd be checking to see if you're thrashing your swap partition.

Even so, I'd still expect it to take a least 60 seconds to scan a
message with these blacklist files loaded, on a very fast CPU.

> 2. are the rules compatible w/ the 3.2 branch of SA?
Yes, both of WS's blacklist files are technically compatible with most
any version of SA, save very, very old ones that don't support the uri
keyword. (at the very least, both will work with anything from 2.40 and
higher. digging back futher than 2.40 is an archaeological dig I'm not
really interested in at the moment).

However, in practice, sa-blacklist is not practical for real-world use,
so you could also say it's incompatible with every version of SA.

> 3. if it's 'wrong' how does one debug further? I've enabled level 5 in
> amavisd.conf & 'smtpd -v' at the top of my master.cf. Am I looking in
> the wrong place? Am I missing some sort of Perl module that would
> mitigate this in some way? (I'll list these at the end)
Nope. sa-blacklist is just too huge for practical purposes. SA is
designed to efficiently support hundreds, even thousands of
blacklist_from's, but sa-blacklist has hundreds of thousands of them.
(691,372 in fact).