You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Juhapekka Tolvanen <ju...@cc.jyu.fi> on 2004/09/22 20:27:29 UTC

DSPAM-plugin for SpamAssassin 3.* ?

Okay... SpamAssassin 3.0 is out with its much-hyped plugin system. So,
now it would be nice to have plugin that asks from DSPAM, if an E-Mail
is spam.

http://www.nuclearelephant.com/projects/dspam/

What I exactly want is this:

1) Switch off that Bayesian filter of SpamAssassin, because it is
implemented in slow interpreted language called Perl.

2) Use DSPAM as Bayesian-like filter, because it is implemented in
lightning-fast compiled language called C.

3) If DSPAM can give percentage-based probability of spamminess, use
it to give different scores.

4) Use Berkeley DB-4 as storage-method in DSPAM, first. Other databases
may be added later.

5) When using spam-learning of SpamAssassin, make DSPAM to learn that
spam (or ham), too. In addition, let SpamAssassin to give all spam to
Razor2, Pyzor and DCC, too.

You may find this helpful when doing that plugin.

http://thread.gmane.org/gmane.mail.spam.spamassassin.general/44126

Any takers?

 * * *

Of course, I know there is a little bit overhead, when DSPAM-process is
started and fork()'ed, but hopefully it does not matter too much.
It seems there is a library called "libdspam". Maybe you could use
its functions instead of starting DSPAM-processes.

But remember: libdspam is licensed under GNU GPL 2. And SpamAssasin 3.0
is licensed under Apache Software License, Version 2. According to FSF
that license is incompatible with GNU GPL:

http://www.fsf.org/licenses/license-list.html#GPLIncompatibleLicenses

Maybe author of DSPAM could change license of that library to GNU LGPL.


-- 
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"halpojen hoitojen maailma uljas haluaa taistosi latistaa, mielesi
lipeävedellä valkaistuun ruotuunsa, joka on hautausmaa"                  CMX

Re: [sa-list] DSPAM-plugin for SpamAssassin 3.* ?

Posted by Kai Schaetzl <ma...@conactive.com>.
Juhapekka Tolvanen wrote on Thu, 23 Sep 2004 00:41:42 +0300:

> I just want to
> know, how much faster SpamAssassin will be, if its Bayesian engine is
> replaced with something else,
>

Not much. You are completely missing the point. If you want to have 
something faster and less ressource hungry (that's much more of an issue 
with SA than the "slowness") you have to code it COMPLETELY in C. I 
suggest you start tomorrow.

> And I can hear it myself, when
> hard disks make awful noise of swapping.

Amen.

Sorry, but I find your attitude quite unpleasant.

Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Thursday, September 23, 2004 2:08 AM +0300 Juhapekka Tolvanen 
<ju...@cc.jyu.fi> wrote:

> I can not code. But I can test your plugins and then send bug-reports.
> And if you give me good instrutions, I may even run benchmarks.

I don't believe we have a shortage of testers. But the time available to 
the developers is limited and they already have lots to do. Diverting them 
to this means they have to drop something else. Which outstanding items 
listed in bugzilla should be dropped?



Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Juhapekka Tolvanen <ju...@cc.jyu.fi>.
On Thu, 23 Sep 2004, +01:45:06 EEST (UTC +0300),
David Brodbeck <gu...@gull.us> pressed some keys:

> Juhapekka Tolvanen wrote:
> >I reiterate: It does not hurt, if we try out and see what happens.

> Is that an offer to try it?  There's nothing wrong with experimenting, 
> but you shouldn't expect other people to spend time testing your ideas 
> for you.

I can not code. But I can test your plugins and then send bug-reports.
And if you give me good instrutions, I may even run benchmarks.


-- 
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"halpojen hoitojen maailma uljas haluaa taistosi latistaa, mielesi
lipeävedellä valkaistuun ruotuunsa, joka on hautausmaa"                  CMX

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by David Brodbeck <gu...@gull.us>.
Juhapekka Tolvanen wrote:

>On Wed, 22 Sep 2004, +22:45:09 EEST (UTC +0300),
>Dan Mahoney, System Admin <da...@prime.gushi.org> pressed some keys:
>
>  
>
>>On Wed, 22 Sep 2004, Daniel Quinlan wrote:
>>    
>>
>
>  
>
>>>Juhapekka Tolvanen <ju...@cc.jyu.fi> writes:
>>>
>>>      
>>>
>>>>1) Switch off that Bayesian filter of SpamAssassin, because it is
>>>>implemented in slow interpreted language called Perl.
>>>>
>>>>2) Use DSPAM as Bayesian-like filter, because it is implemented in
>>>>lightning-fast compiled language called C.
>>>>        
>>>>
>
>  
>
>>Okay, and not to get off the topic on your opinion on perl versus c,
>>but the first thing perl does when it executes a script is compiles
>>it. This is why spamd is a decent solution despite being written in
>>perl, because it only starts up once.
>>    
>>
>
>  
>
>>I'm not saying that a constantly-running perl program is as fast as a
>>compiled C app all of the time, but if you're going to sit here and
>>suggest changes to the SpamAssassin development team without possibly
>>having evaluated 3.0.0-Release for 24 hours, you might want to drop
>>the condescending attitude, since I'm *sure* we all know what perl and
>>C are.
>>    
>>
>
>I really don't care about attitudes of author of DSPAM. I just want to
>know, how much faster SpamAssassin will be, if its Bayesian engine is
>replaced with something else, for example with DSPAM. It does not hurt,
>if we try it out and see what happens. And it does not hurt, if people
>have more alternatives.
>  
>
I don't see how just replacing the Bayesian part of SpamAssassin with 
something written in C is going to help.  In fact, I'd be willing to bet 
that the overhead of starting up another program would consume more time 
than you'd gain from any efficiency increase.  You'd have to rewrite the 
rule processing parts as well to get a real improvement.  (If you want 
to try it, though, "Bogofilter" is a nice C-based Bayesian filter and 
quite simple to set up.  I've used it for quite a while, but now I'm 
testing SpamAssassin as an added layer of protection.)

Also, the question is "is SpamAssassin fast enough," not "could it be 
faster if it were rewritten in another language"?  While the author of 
DSPAM is welcome to his opinion that running perl-based SpamAssassin on 
a production server is "a death wish," there are a lot of sites that 
*are* using it in production which seem to indicate the contrary.  I 
think you'll find that people who write software are very partisan about 
their own work and aren't reliable sources for comparison with other 
packages.

Keep in mind, too, that unless you're so short on memory that your 
computer is constantly swapping, SpamAssassin generally spends most of 
its time waiting for responses from RBL network servers.

>When I used SpamAssassin and its Bayesian filter in my home computer,
>it really was slow.
>
I expect it would be, with only 64 megabytes of RAM.  I'd consider that 
inadequate for just about any purpose these days, especially with RAM so 
cheap.


>I reiterate: It does not hurt, if we try out and see what happens.
>  
>
Is that an offer to try it?  There's nothing wrong with experimenting, 
but you shouldn't expect other people to spend time testing your ideas 
for you.


Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Rakesh <ra...@netcore.co.in>.
Juhapekka Tolvanen wrote:

>but if you
>        plan on running this on a production system with live users, it
>        is a death wish."
>
>  
>
Death Wish ! I really don't think so. I run SpamAssassin+Razor+URI 
checks and a good amount of rulesets with MailScanner, all written in 
PERL on production system processing about a million messages a day for 
about 120 virtual domains and three virus scanners. And the load on my 
system never crosses 0.8, so I would never believe that SpamAssassin is 
a death wish for a production system, just because some one with a  
system with low RAM and uncessary processes running says so. Infact 
SpamAssassin has saved my life from irritating client complains about 
spams.

>I can not code anything like that myself. I am just (l)user. 
>  
>
I think users who cannot code shouldn't boss the developers of what to 
do and what not do. Atleast we should write a few lines to thank them 
that they spend so much of their and spend so much efforts for no pay. 
Even I am user and I really thank them a lot for the great work that 
they are doing.

>
>I reiterate: It does not hurt, if we try out and see what happens.
>
>  
>
Trying out new stuff is always a good suggestion, but the attitude of 
suggestion always matters a lot

-- 
Regards, 
Rakesh B. Pal
Emergic CleanMail Team.
Netcore Solutions Pvt. Ltd.

==================================================================
perl -e"map{y/a-z/l-za-k/;print}shift" "Jjhi pcdiwtg Ptga wprztg,"
==================================================================



----------------------------------------------------------
Netcore's New Website
http://www.netcore.co.in
----------------------------------------------------------

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Lucas Albers <ad...@cs.montana.edu>.
Could you use the embedded perl?
Mimedefang uses that for better memory sharing between processes, appears
to work on most platforms running 5.6 or later of perl.


Justin Mason said:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> Sounds like we need to figure out how to get more of our stuff
> precompiled at spamd startup time so it can be copy-on-writed
> effectively...
>
> - --j.


-- 
Luke Computer Science System Administrator
Security Administrator,College of Engineering
Montana State University-Bozeman,Montana



Re: [sa-list] DSPAM-plugin for SpamAssassin 3.* ?

Posted by snowjack <sn...@fastmail.fm>.
Kai Schaetzl wrote:
> Snowjack wrote on Sat, 25 Sep 2004 16:01:01 -0700:
>>Actually, those numbers were from SA 2.64 with the URIDNSBL patch and 
>>most of the more conservative SARE rulesets. Didn't include BigEvil, of 
>>course, or any of the SARE rules that said they occasionally hit ham.
>
> When I just reduce to the rules coming with SA I still have more than 17 
> or 18 MB as you have, maybe 20. Adding a few SARE and some own rules and 
> I'm at 40 - 50, bigevil adds another 40 or so to that. What Perl are you 
> running?

perl 5.6.1

ls /etc/spamassassin
70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_genlsubj0.cf
70_sare_genlsubj_x30.cf
70_sare_header0.cf
70_sare_html0.cf
70_sare_random.cf
70_sare_ratware.cf
70_sare_specific.cf
70_sare_spoof.cf
70_sare_unsub.cf
71_sare_redirect_pre3.0.0.cf
72_sare_bml_post25x.cf
99_sare_fraud_post25x.cf
local.cf
surbl.cf


Re: [sa-list] DSPAM-plugin for SpamAssassin 3.* ?

Posted by Kai Schaetzl <ma...@conactive.com>.
Snowjack wrote on Sat, 25 Sep 2004 16:01:01 -0700:

> Actually, those numbers were from SA 2.64 with the URIDNSBL patch and 
> most of the more conservative SARE rulesets. Didn't include BigEvil, of 
> course, or any of the SARE rules that said they occasionally hit ham.
>

When I just reduce to the rules coming with SA I still have more than 17 
or 18 MB as you have, maybe 20. Adding a few SARE and some own rules and 
I'm at 40 - 50, bigevil adds another 40 or so to that. What Perl are you 
running?


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: [sa-list] DSPAM-plugin for SpamAssassin 3.* ?

Posted by snowjack <sn...@fastmail.fm>.
Kai Schaetzl wrote:
> Snowjack wrote on Thu, 23 Sep 2004 11:08:38 -0700:
>>Am I missing something?
>>
> Yes, all the SARE and other custom rulesets ;-) (Just as a FYI, not as a 
> critique).

Actually, those numbers were from SA 2.64 with the URIDNSBL patch and 
most of the more conservative SARE rulesets. Didn't include BigEvil, of 
course, or any of the SARE rules that said they occasionally hit ham.


Re: [sa-list] DSPAM-plugin for SpamAssassin 3.* ?

Posted by Kai Schaetzl <ma...@conactive.com>.
Snowjack wrote on Thu, 23 Sep 2004 11:08:38 -0700:

> Am I missing something?
>

Yes, all the SARE and other custom rulesets ;-) (Just as a FYI, not as a 
critique).


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by snowjack <sn...@fastmail.fm>.
David Brodbeck wrote:
> On Wed, 22 Sep 2004 17:26:12 -0700, snowjack wrote
>>Yeah, and it is true that SpamAssassin uses lots of RAM (20M per 
>>process?) So what, RAM is cheap!
> 
> If I'm not mistaken, some of that 20M is actually shared amongst all the spamd
> processes, so it's not as much memory usage as you'd think.  Five spamd
> processes that each claim to be using 20M may not actually be consuming a
> total of 100M.  *nix is tricky that way. ;)
> 

Hmm
# top
   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
31162 spamdata   9   0 25108  17M  6680 S    17.5  2.3   0:00 spamd
31163 spamdata  14   0 24596  16M  7284 S     7.7  2.2   0:00 spamd

25108 - 6680 = 18428 KB physical RAM usage
24596 - 7284 = 17312 KB physical RAM usage

Am I missing something?

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by David Brodbeck <gu...@gull.us>.
On Wed, 22 Sep 2004 17:26:12 -0700, snowjack wrote
> Yeah, and it is true that SpamAssassin uses lots of RAM (20M per 
> process?) So what, RAM is cheap!

If I'm not mistaken, some of that 20M is actually shared amongst all the spamd
processes, so it's not as much memory usage as you'd think.  Five spamd
processes that each claim to be using 20M may not actually be consuming a
total of 100M.  *nix is tricky that way. ;)


Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by snowjack <sn...@fastmail.fm>.
Juhapekka Tolvanen wrote:
>         "Myth 4: PERL is designed for language processing, so
>         SpamAssassin is written in a more appropriate language.
> 
>         Let me preface this with the fact that I've had about 10
>         years of experience coding PERL. While PERL is very useful
>         for language processing and web applications, it is also an
>         extremely slow, interpreted language. 

Process startup is slow. Perl is pretty efficient once the process is 
running, and a well-set-up SpamAssassin 3 configuration will already 
have the processes started before a spam is even received.

> 	  The average overhead
>         for a single PERL process is around 2MB of RAM. 

Yeah, and it is true that SpamAssassin uses lots of RAM (20M per 
process?) So what, RAM is cheap!

> I really don't care about attitudes of author of DSPAM. I just want to
> know, how much faster SpamAssassin will be, if its Bayesian engine is
> replaced with something else, for example with DSPAM. It does not hurt,
> if we try it out and see what happens. And it does not hurt, if people
> have more alternatives.

I had a little single-processor 1GHz Athlon machine with 256MB RAM using 
SpamAssassin to scan about 30,000 e-mails per day for a while. That was 
pushing the RAM usage a little, but it worked fine. I've since upgraded 
to about 750MB RAM just to be safe, and our load has dropped to about 
25,000 mails per day since I started rejecting (550) the high-scoring 
messages. The DSPAM authors are making it sound like SpamAssassin is 
more of a performance problem than it really is.

> If you want to know, what kind of computer I used, here are its specs:
> http://iki.fi/juhtolv/eng/tietokone.eng.html

Your biggest problem on this computer is only having 64M RAM and having 
all kinds of other software (Gnome? Enlightenment? Those will use a lot 
of your 64M all by themselves!) running when you're trying to load 
SpamAssassin. Your problem is that you need more RAM, not that there's 
something wrong with SpamAssassin! Yes, DSPAM will possibly use quite a 
bit less RAM, so it might be a decent choice for you. But I doubt that 
it's really as effective as SpamAssassin.

> BTW Creating SA-plugin that runs crm114 may be good thing to try
> out, too. And I don't mind, if some people create bogofilter- and
> SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems
> more interesting for me. I haven't been able to try it out, because it
> is not yet available as Debian-package and I haven't yet bothered to
> compile it myself. SpamAssassin is packaged in Debian already, but
> version 3.0 is not yet available as Debian-package.
> 
> I reiterate: It does not hurt, if we try out and see what happens.

Having SpamAssassin call some other program like DSPAM will make your 
performance much worse, because you will already have SA loaded, taking 
up a chunk of RAM, and then it is trying to load another program, which 
will use even _more_ RAM.

Your options are:
1) buy more RAM
-or-
2) quit using Gnome, Enlightenment, and SpamAssassin on that box, find a 
nice thin window manager (IceWM?) and use some low-memory-friendly spam 
scanner. There are mail clients out there that have Bayes filters built 
in. Bogofilter may also be an option.

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Nix <ni...@esperi.org.uk>.
On Thu, 23 Sep 2004, Juhapekka Tolvanen spake:
>         "Myth 4: PERL is designed for language processing, so
>         SpamAssassin is written in a more appropriate language.
>         Let me preface this with the fact that I've had about 10
>         years of experience coding PERL.

... yet he hasn't even read the Perl FAQ, which states

:                                                         But never write
: "PERL", because perl is not an acronym, apocryphal folklore and post-
: facto expansions notwithstanding.

Not a good start.

>                                          While PERL is very usefula
>         for language processing and web applications, it is also an
>         extremely slow, interpreted language.

Benchmarks? Oops, I don't see any, just unsubstantiated opinion.
Yes, Perl is interpreted.

>                                               The average overhead
>         for a single PERL process is around 2MB of RAM. Even compiled
>         PERL still requires the use of a bootstrapped interpreter and
>         bytecode translation.

... which happens exactly once, when you start spamd.

>                               PERL is very slow compared to a compiled
>         language, and the regular expression functions PERL supports
>         for text extraction have their roots in the C implementation
>         of regular expressions, which are much faster.

Nonsense. The Perl regexp implementation is *written* in C; so here the
DSPAM author is saying that C is intrinsically faster than itself.

>                                                        DSPAM has very
>         low-level string functions coded in C which are extremely fast,
>         effective, and don't even require the use of processor-intensive
>         regular expressions.

So does Perl. However, if you want to do the sort of textual pattern
matching that requires the construction of a DFA and optional
backtracking to implement in C, one uses a library that's good for that
sort of thing. We call those `regular expression libraries'.

Plus, the Bayesian part of SA (i.e., the only part which DSPAM could
replace) does not use regexps for anything more than identification of
tokens. If you can tokenize without using regular expressions or
anything reducable to them, or anything more formally powerful (which
would consume much more CPU power to match, and for which matching may
not ever terminate), then there's a whole lot of computer scientists
who'd like to talk to you...

It is true that SA spends the majority of its time inside the regular
expression matcher (written in *C*, note: it's not spending its time in
the interpreter proper). However, given the number of regexp-based rules
SA contains, that's unsurprising. If you rip all those rules out and
leave just a Bayes engine, SA would probably be rather faster.

>                              While PERL is useful for data extraction
>         and reporting, it is the completely wrong choice for language
>         processing, especially in a large-scale environment.

Disproved by reality; i.e., SpamAssassin itself.

>         analyzing one mailbox, PERL would be acceptable...but if you
>         plan on running this on a production system with live users, it
>         is a death wish."

Likewise disproved by reality.

> I really don't care about attitudes of author of DSPAM. I just want to

It's not the `attitudes' that are relevant, it's more that everything
he says there is either irrelevant or the purest moonshine.

> know, how much faster SpamAssassin will be, if its Bayesian engine is
> replaced with something else, for example with DSPAM. It does not hurt,
> if we try it out and see what happens. And it does not hurt, if people
> have more alternatives.

No faster.

The bottleneck with large SA installations using Bayes is not CPU time:
it is disk I/O (and RAM, of course). I can't see how reimplementing the
Bayesian engine can help fix this: replacing the *storage mechanism*
might be a good idea, though. (DB_File is the fastest large-scale
key->value storage mechanism available without setup work: if you don't
mind the setup work, you could try Bayes-in-SQL until you find an RDBMS
that's faster than Berkeley DB. You might be searching for some time.)

>             I even switched from plain SpamAssassin to spamd.

That's something that virtually everyone with the privileges to do
(i.e. root) should do, I think. Anything else is just being pointlessly
inefficient.

> After all these horrible experiences it is painful to read, when
> somebody tries to explain, how fast Perl is after all. Hell yes I know
> Perl-program is compiled when it starts, but it is not enough. Real
> compiled language like C is faster in many cases.

But SA spends nearly all its time in the regexp engine, and waiting (on
disk I/O, network traffic for net tests, and locks on the Bayes database
if a Bayes expiry is underway). Switching to C would just needlessly
hinder maintenance.

> I reiterate: It does not hurt, if we try out and see what happens.

Try it, then. :)

-- 
`I agree that school is a learning environment, and learning to
 intimidate others -- aka "social skills" -- is part of that.'
   --- jabberwocky

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Juhapekka Tolvanen <ju...@cc.jyu.fi>.
On Wed, 22 Sep 2004, +22:45:09 EEST (UTC +0300),
Dan Mahoney, System Admin <da...@prime.gushi.org> pressed some keys:

> On Wed, 22 Sep 2004, Daniel Quinlan wrote:

> >Juhapekka Tolvanen <ju...@cc.jyu.fi> writes:
> >
> >>1) Switch off that Bayesian filter of SpamAssassin, because it is
> >>implemented in slow interpreted language called Perl.
> >>
> >>2) Use DSPAM as Bayesian-like filter, because it is implemented in
> >>lightning-fast compiled language called C.

> Okay, and not to get off the topic on your opinion on perl versus c,
> but the first thing perl does when it executes a script is compiles
> it. This is why spamd is a decent solution despite being written in
> perl, because it only starts up once.

> I'm not saying that a constantly-running perl program is as fast as a
> compiled C app all of the time, but if you're going to sit here and
> suggest changes to the SpamAssassin development team without possibly
> having evaluated 3.0.0-Release for 24 hours, you might want to drop
> the condescending attitude, since I'm *sure* we all know what perl and
> C are.

If you know so well what C and Perl are, then what think about this:

http://www.nuclearelephant.com/projects/dspam/faq.html#1.7

And I'd especially like to know your opinion about this:

        "Myth 4: PERL is designed for language processing, so
        SpamAssassin is written in a more appropriate language.

        Let me preface this with the fact that I've had about 10
        years of experience coding PERL. While PERL is very useful
        for language processing and web applications, it is also an
        extremely slow, interpreted language. The average overhead
        for a single PERL process is around 2MB of RAM. Even compiled
        PERL still requires the use of a bootstrapped interpreter and
        bytecode translation. PERL is very slow compared to a compiled
        language, and the regular expression functions PERL supports
        for text extraction have their roots in the C implementation
        of regular expressions, which are much faster. DSPAM has very
        low-level string functions coded in C which are extremely fast,
        effective, and don't even require the use of processor-intensive
        regular expressions. While PERL is useful for data extraction
        and reporting, it is the completely wrong choice for language
        processing, especially in a large-scale environment. If you were
        analyzing one mailbox, PERL would be acceptable...but if you
        plan on running this on a production system with live users, it
        is a death wish."

I really don't care about attitudes of author of DSPAM. I just want to
know, how much faster SpamAssassin will be, if its Bayesian engine is
replaced with something else, for example with DSPAM. It does not hurt,
if we try it out and see what happens. And it does not hurt, if people
have more alternatives.

I can not code anything like that myself. I am just (l)user. If some
software is slow in my machine, I really feel it and see it. Even simple
system monitor software (for example ProcMeter 3) shows it, when something
is taking too much memory and CPU-time. And I can hear it myself, when
hard disks make awful noise of swapping.

When I used SpamAssassin and its Bayesian filter in my home computer,
it really was slow. Sometimes I saw dozens of E-Mails in output of
mailq-command. I even switched from plain SpamAssassin to spamd. I had
to use renice. And then even rerude or chrt. Then I switched to crm114
and it seems to work much faster: If I run mailq-command right after boot,
I may be able to see few E-Mails in queue, but most of the time it says
my mail queue is empty.

If you want to know, what kind of computer I used, here are its specs:

http://iki.fi/juhtolv/eng/tietokone.eng.html

I got better computer few weeks after switching to crm114.

After all these horrible experiences it is painful to read, when
somebody tries to explain, how fast Perl is after all. Hell yes I know
Perl-program is compiled when it starts, but it is not enough. Real
compiled language like C is faster in many cases.

But crm114 is interpeted language, too. It is specially designed for
creating Bayesian-like algorithms, though. Maybe that is one reason, it
runs so fast in my machine. Another reason may be, because it does only
Bayesian-like filtering and nothing else (like asking from RBLs or running
regexps).

BTW Creating SA-plugin that runs crm114 may be good thing to try
out, too. And I don't mind, if some people create bogofilter- and
SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems
more interesting for me. I haven't been able to try it out, because it
is not yet available as Debian-package and I haven't yet bothered to
compile it myself. SpamAssassin is packaged in Debian already, but
version 3.0 is not yet available as Debian-package.

I reiterate: It does not hurt, if we try out and see what happens.


-- 
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"halpojen hoitojen maailma uljas haluaa taistosi latistaa, mielesi
lipeävedellä valkaistuun ruotuunsa, joka on hautausmaa"                  CMX

Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by "Dan Mahoney, System Admin" <da...@prime.gushi.org>.
On Wed, 22 Sep 2004, Daniel Quinlan wrote:

> Juhapekka Tolvanen <ju...@cc.jyu.fi> writes:
>
>> 1) Switch off that Bayesian filter of SpamAssassin, because it is
>> implemented in slow interpreted language called Perl.
>>
>> 2) Use DSPAM as Bayesian-like filter, because it is implemented in
>> lightning-fast compiled language called C.

Okay, and not to get off the topic on your opinion on perl versus c, but 
the first thing perl does when it executes a script is compiles it.  This 
is why spamd is a decent solution despite being written in perl, because 
it only starts up once.

I'm not saying that a constantly-running perl program is as fast as a 
compiled C app all of the time, but if you're going to sit here and 
suggest changes to the SpamAssassin development team without possibly 
having evaluated 3.0.0-Release for 24 hours, you might want to drop the 
condescending attitude, since I'm *sure* we all know what perl and C are.

The dev team works damned hard, for no pay.  If you want something 
better, you're more than welcome to write it, in your language of choice.

Just my 0.02

-Dan Mahoney

>
> You might also want to look into Bogofilter or SpamProbe.
>
> However, it would be better to integrate something into SpamAssassin and
> just fix the performance issues to avoid reparsing and rerendering
> penalties.
>
>> But remember: libdspam is licensed under GNU GPL 2. And SpamAssasin 3.0
>> is licensed under Apache Software License, Version 2. According to FSF
>> that license is incompatible with GNU GPL:
>>
>> http://www.fsf.org/licenses/license-list.html#GPLIncompatibleLicenses
>
> The ASF disagrees:
>
>  http://www.apache.org/licenses/GPL-compatibility.html
>
>> Maybe author of DSPAM could change license of that library to GNU LGPL.
>
> The author of DSPAM (who I notice is on your Cc: list) seems to have
> some sort of grudge against SpamAssassin, judging from the vitriol on
> the DSPAM web pages, so good luck.  I'm not even sure a change is really
> needed, but if there's some sort of change needed, we could perhaps
> discuss it further.
>
> Daniel
>
> --
> Daniel Quinlan                     ApacheCon! 13-17 November (3 SpamAssassin
> http://www.pathname.com/~quinlan/  http://www.apachecon.com/  sessions & more)
>

--

"I hate Windows"

-Tigerwolf, Anthrocon 2004

--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------


Re: DSPAM-plugin for SpamAssassin 3.* ?

Posted by Daniel Quinlan <qu...@pathname.com>.
Juhapekka Tolvanen <ju...@cc.jyu.fi> writes:

> 1) Switch off that Bayesian filter of SpamAssassin, because it is
> implemented in slow interpreted language called Perl.
> 
> 2) Use DSPAM as Bayesian-like filter, because it is implemented in
> lightning-fast compiled language called C.

You might also want to look into Bogofilter or SpamProbe.

However, it would be better to integrate something into SpamAssassin and
just fix the performance issues to avoid reparsing and rerendering
penalties.

> But remember: libdspam is licensed under GNU GPL 2. And SpamAssasin 3.0
> is licensed under Apache Software License, Version 2. According to FSF
> that license is incompatible with GNU GPL:
> 
> http://www.fsf.org/licenses/license-list.html#GPLIncompatibleLicenses

The ASF disagrees:

  http://www.apache.org/licenses/GPL-compatibility.html
 
> Maybe author of DSPAM could change license of that library to GNU LGPL.

The author of DSPAM (who I notice is on your Cc: list) seems to have
some sort of grudge against SpamAssassin, judging from the vitriol on
the DSPAM web pages, so good luck.  I'm not even sure a change is really
needed, but if there's some sort of change needed, we could perhaps
discuss it further.

Daniel

-- 
Daniel Quinlan                     ApacheCon! 13-17 November (3 SpamAssassin
http://www.pathname.com/~quinlan/  http://www.apachecon.com/  sessions & more)