You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/02/14 20:04:49 UTC

Re: Spam and Ham have different headers - bayesian tricks

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Marc Perkel writes:
> Continuing with my experimenting with a second bayesian filter - using 
> spamprobe and controlling the tokens myself - and using SA to score the 
> output.
> 
> So - I noticed that spam and ham often have different header fields. 
> Some headers only show up in ham - and some headers only show up in 
> spam. So I tokenized the headers themselves and fed just the header 
> names in as data and got some really good results.
> 
> So - I don't know if SA is doing this but tokenizing the header names 
> (excluding the common ones that all headers have) is very effective.

yes, we do that.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCEPZRMJF5cimLx9ARArWOAKCNCT7foX79+h06EFFiL3lQ0lZjVQCgrh97
VO71tbPWil5052pDSmyley4=
=1m7C
-----END PGP SIGNATURE-----


Re: Spam and Ham have different headers - bayesian tricks

Posted by Tony Godshall <ap...@gmail.com>.
> 
> Tony Godshall wrote:
> 
> >Hi, Justin, all.
> >
> >I'm doing nearly the opposite:
> >
> >My upstream runs spamassasin.  I run a a non-naive bayesian
> >(crm114) myself.  The default config for crm114 is to strip spamassassin's
> >headers, but I've found that the SA headers give crm114 very very good hints,
> >and have acheived (to me, subjectively) amazing accuracy, not just for
> >spam vs nonspam but for spam vs 20 nonspam mailboxes, including troublesome
> >spammish lists like dev@spamassassin.apache.org.
> >
> >It seems to me that the insertion of the results of the bevy of SA
> >tests improve the
> >results of bayesian learning significantly.

On Mon, 14 Feb 2005 12:36:52 -0800, Marc Perkel <ma...@perkel.com> wrote:
> I agree. I think that ultimately a second bayesian filter that was
> trained on only rule names could replace the SA scoring and become self
> scoring rules.

But why only on rule names?  Why not let the learning filter have the
rest of the data too?

Re: Spam and Ham have different headers - bayesian tricks

Posted by Marc Perkel <ma...@perkel.com>.
I agree. I think that ultimately a second bayesian filter that was 
trained on only rule names could replace the SA scoring and become self 
scoring rules.

Tony Godshall wrote:

>Hi, Justin, all.
>
>I'm doing nearly the opposite:
>
>My upstream runs spamassasin.  I run a a non-naive bayesian 
>(crm114) myself.  The default config for crm114 is to strip spamassassin's 
>headers, but I've found that the SA headers give crm114 very very good hints,
>and have acheived (to me, subjectively) amazing accuracy, not just for 
>spam vs nonspam but for spam vs 20 nonspam mailboxes, including troublesome
>spammish lists like dev@spamassassin.apache.org.
>
>It seems to me that the insertion of the results of the bevy of SA
>tests improve the
>results of bayesian learning significantly.
>
>Tony
>
>  
>

Re: Spam and Ham have different headers - bayesian tricks

Posted by Marc Perkel <ma...@perkel.com>.

Justin Mason wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>Marc Perkel writes:
>  
>
>>Continuing with my experimenting with a second bayesian filter - using 
>>spamprobe and controlling the tokens myself - and using SA to score the 
>>output.
>>
>>So - I noticed that spam and ham often have different header fields. 
>>Some headers only show up in ham - and some headers only show up in 
>>spam. So I tokenized the headers themselves and fed just the header 
>>names in as data and got some really good results.
>>
>>So - I don't know if SA is doing this but tokenizing the header names 
>>(excluding the common ones that all headers have) is very effective.
>>    
>>
>
>yes, we do that.
>
>
>  
>
Great minds think alike .....

-- 
Marc Perkel - marc@perkel.com

Spam Filter: http://www.junkemailfilter.com
    My Blog: http://marc.perkel.com
My Religion: http://www.churchofreality.org
~ "If it's real - we believe in it!" ~