You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Mathieu Nantel <na...@ecopiabio.com> on 2004/10/18 20:32:10 UTC

[OFFTOPIC] Opinions on DSPAM

Good day list,

As I've read a few articles on DSPAM claiming that it's better/faster/sexier 
than spamassassin, I would appreciate having this list's comment on DSPAM. 
I'm sure quite a few of you have tried it and might have some interesting 
experiences to share. My understanding is that DSPAM relies solely on 
algorithms (Bayes, CHI^2), and that complications arise when you have to 
teach your users on how to train the system (which SA doesn't require as it's 
based on other things aside from Bayes).

I'm not wanting to start a church war - please be objective should you reply.

Thanks.
-- 
Mathieu Nantel, RHCE - Systems Manager
Ecopia BioSciences Inc.
(514) 336-2724 x434

Re: Plugin for DSPAM? Was: Re: [OFFTOPIC] Opinions on DSPAM

Posted by Mathieu Nantel <na...@ecopiabio.com>.

It already exists, as someone else pointed out. Check out the thread...

On October 19, 2004 06:28 am, Federico Giannici wrote:
> Mathieu Nantel wrote:
> > Thanks to all who replied. This sums it up quite well and confirms my
> > doubts about the algorithm-exclusive methods. As always I appreciate this
> > list's objectivity on the topic.
>
> OK, but if it is true that their "learnig methods" are better than SA's
> Bayes, than what about creating a SA plugin to use DSPAM as ONE of SA
> tests?
>
> Bye.

-- 
Mathieu Nantel, RHCE - Systems Manager
Ecopia BioSciences Inc.
(514) 336-2724 x434

Plugin for DSPAM? Was: Re: [OFFTOPIC] Opinions on DSPAM

Posted by Federico Giannici <gi...@neomedia.it>.

Mathieu Nantel wrote:
> Thanks to all who replied. This sums it up quite well and confirms my doubts 
> about the algorithm-exclusive methods. As always I appreciate this list's 
> objectivity on the topic. 

OK, but if it is true that their "learnig methods" are better than SA's 
Bayes, than what about creating a SA plugin to use DSPAM as ONE of SA tests?

Bye.

-- 
___________________________________________________
     __
    |-                      giannici@neomedia.it
    |ederico Giannici      http://www.neomedia.it
___________________________________________________

Re: [OFFTOPIC] Opinions on DSPAM

Posted by Mathieu Nantel <na...@ecopiabio.com>.

Thanks to all who replied. This sums it up quite well and confirms my doubts 
about the algorithm-exclusive methods. As always I appreciate this list's 
objectivity on the topic. 

-- 
Mathieu Nantel, RHCE - Systems Manager
Ecopia BioSciences Inc.
(514) 336-2724 x434

Re: [spamassassin] [OFFTOPIC] Opinions on DSPAM

Posted by Don Krause <dk...@optivus.com>.

On Mon, 2004-10-18 at 11:32, Mathieu Nantel wrote:
> Good day list,
> 
> As I've read a few articles on DSPAM claiming that it's better/faster/sexier 
> than spamassassin, I would appreciate having this list's comment on DSPAM. 
> I'm sure quite a few of you have tried it and might have some interesting 
> experiences to share. My understanding is that DSPAM relies solely on 
> algorithms (Bayes, CHI^2), and that complications arise when you have to 
> teach your users on how to train the system (which SA doesn't require as it's 
> based on other things aside from Bayes).
> 
> I'm not wanting to start a church war - please be objective should you reply.
> 
> Thanks.

I'm running it on a test system, with few (~15) users. It's been working
and trained for about 4 months, and has yet to achieve better than %75
percent accuracy.

One user in particular, gets about 500 Spam per month, to 3 good emails
per month. In this case, DSPAM fails terribly, failing to tag ANYTHING
as spam. In three months of constant training, it claims to have
achieved 102% accuracy, while clearly passing everything.

DSPAM has a nice user interface, but apparently requires way too much
user intervention in the beginning to be very useful.

YMMV

-- 
Don

Re: [OFFTOPIC] Opinions on DSPAM

Posted by Matt Kettler <mk...@evi-inc.com>.

At 02:32 PM 10/18/2004, Mathieu Nantel wrote:
>As I've read a few articles on DSPAM claiming that it's better/faster/sexier
>than spamassassin, I would appreciate having this list's comment on DSPAM.
>I'm sure quite a few of you have tried it and might have some interesting
>experiences to share. My understanding is that DSPAM relies solely on
>algorithms (Bayes, CHI^2), and that complications arise when you have to
>teach your users on how to train the system (which SA doesn't require as it's
>based on other things aside from Bayes).

You've pretty much summed up the answers yourself.

There's a lot of pure learning systems out there that work VERY well if 
you've got the time to train them, and keep them well trained. DSPAM, 
CRM114, spambayes, etc.

Pure Learning:

The strength of pure-learning systems is speed and simplicity.

They have the strength of quickly adapting to the learning that you feed them.

The weakness is that their need for training (can't run without it) and 
that their accuracy is entirely a function of the training quality. If you 
don't set up good training, they suck completely. (garbage in, garbage out)

One other weakness is a weakness to bayes-poison type attacks. Many deal 
quite well with this type of attack, but all have some degree of weakness 
to it that rule and dnsbl systems aren't susceptible to.

SpamAssaassin:

The strength of SA is it's use of many sources of spam criteria. It 
combines bayesian, regex rules, perl-coded-rules, DNSBLs, URIBLs, hash 
systems, past score-averaging systems. This makes it fairly resistant to 
poisoning. poison techniques that work for one element of analysis that SA 
uses won't work for all of them. However, to some degree this is both a 
strength and a weakness.

SA's also got learning ability, and even has a self-training ability based 
on the results of the other rules in the system. I know of no other 
self-trainers, but I could be wrong on this.

Unlike pure-learning systems SA has the ability to run without any 
training, for those who can't do bayes training. It's results are a bit 
less accurate, but it's quite workable, particularly with the aid of 
network checks.

Another Strength of SA is a high degree of user customization. You can add 
your own regex rules, and now even code-level plugins.

One of SA's weaknesses is speed and resource usage. A fully 
network-and-bayes enabled SA queries a lot of stuff, which can take time 
compared to a pure bayes-only system. It can also chew up a lot of memory 
(although pure learning systems can chew up a lot too, SA takes a bit of an 
extra hit here due to it's "kitchen sink" approach)

Another weakness is rate-of-release for new versions of the regex rules.

Re: [OFFTOPIC] Opinions on DSPAM

Posted by sn...@fastmail.fm.

(replying to self)
Two things: 
1) sorry for the dupe message earlier
2) the comment was referring to SURBLs, if that wasn't clear:

On Mon, 18 Oct 2004 12:35:16 -0700, snowjack@fastmail.fm said:
> Blacklisting domains that are included in the content of spam
> messages has been a very successful technique for us.

Using SURBLs is a HUGE win against the spammers. Thanks Jeff,
Raymond, Erik, and all the people who help make SURBLs so effective!
--
  
  snowjack(a)fastmail.fm

Re: [OFFTOPIC] Opinions on DSPAM

Posted by sn...@fastmail.fm.

On Mon, 18 Oct 2004 14:32:10 -0400, "Mathieu Nantel" said:
> As I've read a few articles on DSPAM claiming that it's
> better/faster/sexier than spamassassin, I would appreciate having this
> list's comment on DSPAM. 
> I'm sure quite a few of you have tried it and might have some interesting 
> experiences to share. My understanding is that DSPAM relies solely on 
> algorithms (Bayes, CHI^2), and that complications arise when you have to 
> teach your users on how to train the system (which SA doesn't require as
> it's based on other things aside from Bayes).
> 

Hi Mathieu,
I haven't done any carefully controlled studies, but I've been very
successful with SpamAssassin. I think SA has the better approach.
Spammers have gotten quite good at fooling the pure algorithm methods.
Some get their messages' spam probabilities down to 50% or so if not
lower, mainly by including a lot of innocuous text in their messages. If
you can also use DNS-based databases to look up IP addresses and domains
associated with known spammers, plus other rules such as known ratware
patterns in headers, keywords like 'viagra', and use that information in
combination with the Bayes and other algorithms to determine messages'
spamminess, you will have better accuracy. Blacklisting domains that are
included in the content of spam messages has been a very successful
technique for us.

Unfortunately, SA is a real memory hog and has higher hardware
requirements than DSPAM to handle the same message load. But that's not
a big issue for us. We average about 25,000 messages per day from the
Internet with ~400 users. SpamAssassin is running on a dedicated Athlon
1.5 GHz machine with about 750MB of RAM, and we haven't had any
problems. Peak RAM usage is about 600MB.

--
  
  snowjack(a)fastmail.fm

Re: [OFFTOPIC] Opinions on DSPAM

Posted by snowjack <sn...@fastmail.fm>.

Mathieu Nantel wrote:
> As I've read a few articles on DSPAM claiming that it's better/faster/sexier 
> than spamassassin, I would appreciate having this list's comment on DSPAM. 
> I'm sure quite a few of you have tried it and might have some interesting 
> experiences to share. My understanding is that DSPAM relies solely on 
> algorithms (Bayes, CHI^2), and that complications arise when you have to 
> teach your users on how to train the system (which SA doesn't require as it's 
> based on other things aside from Bayes).

Hi Mathieu,
I haven't done any carefully controlled studies, but I've been very 
successful with SpamAssassin. I think SA has the better approach. 
Spammers have gotten quite good at fooling the pure algorithm methods. 
Some get their messages' spam probabilities down to 50% or so if not 
lower, mainly by including a lot of innocuous text in their messages. If 
you can also use DNS-based databases to look up IP addresses and domains 
associated with known spammers, plus other rules such as known ratware 
patterns in headers, keywords like 'viagra', and use that information in 
combination with the Bayes and other algorithms to determine messages' 
spamminess, you will have better accuracy. Blacklisting domains that are 
included in the content of spam messages has been a very successful 
technique for us.

Unfortunately, SA is a real memory hog and has higher hardware 
requirements than DSPAM to handle the same message load. But that's not 
a big issue for us. We average about 25,000 messages per day from the 
Internet with ~400 users. SpamAssassin is running on a dedicated Athlon 
1.5 GHz machine with about 750MB of RAM, and we haven't had any 
problems. Peak RAM usage is about 600MB.