You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Leon Kolchinsky <lk...@univ.haifa.ac.il> on 2006/12/06 16:16:31 UTC

how to modify headers so sa-learn gives more accurate results?

Hello All,

I'm using the following script for reporting Razor and teaching BAYESIAN with ham and spam messages.

I have the following questions:
-------------------------------
1) If I have the following in local.cf:
use_bayes        1
bayes_auto_learn 1

Starting from what score message automatically learned by Bayesian?

2) I do quarantine to spam mails and manually review all spam, then I put all False Positives (ham) to ham folder and all spam to spam folder and run the following script to populate Bayesian and report to Razor.

Should I remove headers added like those - 
X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status: 

Or any others, so learning (sa-learn) would be more accurate?
Any other recommendations?




The script:
-----------
#!/bin/bash

########Revoking Ham##############
cd /var/spool/imap/user/spamcop/ham/
for i in *.;
do
echo Revoking $i
cat $i | /usr/bin/razor-revoke -home=/var/spool/amavis/.razor/
done
echo Revoke Completed!

########Reporting Spam##############
cd /var/spool/imap/user/spamcop/spam/
for i in *.;
do
echo Reporting $i
cat $i | /usr/bin/razor-report -home=/var/spool/amavis/.razor/
done
echo Reporting Completed!

#########Bayesian DB population with known ham and spam#########
#####Ham#########
chmod 755 /var/spool/imap/user/spamcop/ham
cd /var/spool/imap/user/spamcop/ham/
chmod 644 *.
su vscan -c "(sa-learn --showdots --ham *)"
echo ham learning completed!
#####Spam########
chmod 755 /var/spool/imap/user/spamcop/spam
cd /var/spool/imap/user/spamcop/spam/
chmod 644 *.
su vscan -c "(sa-learn --showdots --spam *)"
echo spam learning completed!



Best Regards,
Leon Kolchinsky

RE: how to modify headers so sa-learn gives more accurate results?

Posted by Leon Kolchinsky <lk...@univ.haifa.ac.il>.


> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Thursday, December 07, 2006 12:36 PM
> To: לאון קולצ'ינסקי
> Cc: users@spamassassin.apache.org
> Subject: Re: how to modify headers so sa-learn gives more accurate
> results?
> 
> Leon Kolchinsky wrote:
> > OK, Thanks,
> >
> > So the script should look like this now?
> >
> > sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --
> bayes_ignore_header X-Amavis-Alert --ham *
> >
> Erm.. bayes_ignore_header isn't a command-line option. It's a
> config-file option. put it in your local.cf.
> 
> 
> > The problem that I can't find any bayes_ignore_header option in
> > # man sa-learn
> >
> Of course not. see man Mail::SpamAssassin::Conf

Thanks, for pointing that out for me :)

Re: how to modify headers so sa-learn gives more accurate results?

Posted by Matt Kettler <mk...@verizon.net>.

Leon Kolchinsky wrote:
> OK, Thanks,
>
> So the script should look like this now?
>
> sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --bayes_ignore_header X-Amavis-Alert --ham *
>   
Erm.. bayes_ignore_header isn't a command-line option. It's a
config-file option. put it in your local.cf.


> The problem that I can't find any bayes_ignore_header option in 
> # man sa-learn
>   
Of course not. see man Mail::SpamAssassin::Conf

RE: how to modify headers so sa-learn gives more accurate results?

Posted by Leon Kolchinsky <lk...@univ.haifa.ac.il>.


> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Wednesday, December 06, 2006 7:15 PM
> To: לאון קולצ'ינסקי
> Cc: users@spamassassin.apache.org
> Subject: Re: how to modify headers so sa-learn gives more accurate
> results?
> 
> Leon Kolchinsky wrote:
> > Hello All,
> >
> > I'm using the following script for reporting Razor and teaching BAYESIAN
> with ham and spam messages.
> >
> > I have the following questions:
> > -------------------------------
> > 1) If I have the following in local.cf:
> > use_bayes        1
> > bayes_auto_learn 1
> >
> > Starting from what score message automatically learned by Bayesian?
> >
> > 2) I do quarantine to spam mails and manually review all spam, then I
> put all False Positives (ham) to ham folder and all spam to spam folder
> and run the following script to populate Bayesian and report to Razor.
> >
> > Should I remove headers added like those -
> > X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status:
> >
> sa-learn will automatically ignore any headers and other markups that
> were added by SA, so you don't need to remove those.
> 
> You can either remove X-Quarantine-ID, or use a "bayes_ignore_header"
> command to tell SA not to tokenize this.
> 

OK, Thanks,

So the script should look like this now?

sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --bayes_ignore_header X-Amavis-Alert --ham *

The problem that I can't find any bayes_ignore_header option in 
# man sa-learn


> > Or any others, so learning (sa-learn) would be more accurate?
> > Any other recommendations?
> >


Regards,
Leon

Re: how to modify headers so sa-learn gives more accurate results?

Posted by Matt Kettler <mk...@verizon.net>.

Leon Kolchinsky wrote:
> Hello All,
>
> I'm using the following script for reporting Razor and teaching BAYESIAN with ham and spam messages.
>
> I have the following questions:
> -------------------------------
> 1) If I have the following in local.cf:
> use_bayes        1
> bayes_auto_learn 1
>
> Starting from what score message automatically learned by Bayesian?
>
> 2) I do quarantine to spam mails and manually review all spam, then I put all False Positives (ham) to ham folder and all spam to spam folder and run the following script to populate Bayesian and report to Razor.
>
> Should I remove headers added like those - 
> X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status: 
>   
sa-learn will automatically ignore any headers and other markups that
were added by SA, so you don't need to remove those.

You can either remove X-Quarantine-ID, or use a "bayes_ignore_header"
command to tell SA not to tokenize this.

> Or any others, so learning (sa-learn) would be more accurate?
> Any other recommendations?
>