You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Leon Kolchinsky <lk...@univ.haifa.ac.il> on 2006/12/06 16:16:31 UTC
how to modify headers so sa-learn gives more accurate results?
Hello All,
I'm using the following script for reporting Razor and teaching BAYESIAN with ham and spam messages.
I have the following questions:
-------------------------------
1) If I have the following in local.cf:
use_bayes 1
bayes_auto_learn 1
Starting from what score message automatically learned by Bayesian?
2) I do quarantine to spam mails and manually review all spam, then I put all False Positives (ham) to ham folder and all spam to spam folder and run the following script to populate Bayesian and report to Razor.
Should I remove headers added like those -
X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status:
Or any others, so learning (sa-learn) would be more accurate?
Any other recommendations?
The script:
-----------
#!/bin/bash
########Revoking Ham##############
cd /var/spool/imap/user/spamcop/ham/
for i in *.;
do
echo Revoking $i
cat $i | /usr/bin/razor-revoke -home=/var/spool/amavis/.razor/
done
echo Revoke Completed!
########Reporting Spam##############
cd /var/spool/imap/user/spamcop/spam/
for i in *.;
do
echo Reporting $i
cat $i | /usr/bin/razor-report -home=/var/spool/amavis/.razor/
done
echo Reporting Completed!
#########Bayesian DB population with known ham and spam#########
#####Ham#########
chmod 755 /var/spool/imap/user/spamcop/ham
cd /var/spool/imap/user/spamcop/ham/
chmod 644 *.
su vscan -c "(sa-learn --showdots --ham *)"
echo ham learning completed!
#####Spam########
chmod 755 /var/spool/imap/user/spamcop/spam
cd /var/spool/imap/user/spamcop/spam/
chmod 644 *.
su vscan -c "(sa-learn --showdots --spam *)"
echo spam learning completed!
Best Regards,
Leon Kolchinsky
RE: how to modify headers so sa-learn gives more accurate results?
Posted by Leon Kolchinsky <lk...@univ.haifa.ac.il>.
> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Thursday, December 07, 2006 12:36 PM
> To: לאון קולצ'ינסקי
> Cc: users@spamassassin.apache.org
> Subject: Re: how to modify headers so sa-learn gives more accurate
> results?
>
> Leon Kolchinsky wrote:
> > OK, Thanks,
> >
> > So the script should look like this now?
> >
> > sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --
> bayes_ignore_header X-Amavis-Alert --ham *
> >
> Erm.. bayes_ignore_header isn't a command-line option. It's a
> config-file option. put it in your local.cf.
>
>
> > The problem that I can't find any bayes_ignore_header option in
> > # man sa-learn
> >
> Of course not. see man Mail::SpamAssassin::Conf
Thanks, for pointing that out for me :)
Re: how to modify headers so sa-learn gives more accurate results?
Posted by Matt Kettler <mk...@verizon.net>.
Leon Kolchinsky wrote:
> OK, Thanks,
>
> So the script should look like this now?
>
> sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --bayes_ignore_header X-Amavis-Alert --ham *
>
Erm.. bayes_ignore_header isn't a command-line option. It's a
config-file option. put it in your local.cf.
> The problem that I can't find any bayes_ignore_header option in
> # man sa-learn
>
Of course not. see man Mail::SpamAssassin::Conf
RE: how to modify headers so sa-learn gives more accurate results?
Posted by Leon Kolchinsky <lk...@univ.haifa.ac.il>.
> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Wednesday, December 06, 2006 7:15 PM
> To: לאון קולצ'ינסקי
> Cc: users@spamassassin.apache.org
> Subject: Re: how to modify headers so sa-learn gives more accurate
> results?
>
> Leon Kolchinsky wrote:
> > Hello All,
> >
> > I'm using the following script for reporting Razor and teaching BAYESIAN
> with ham and spam messages.
> >
> > I have the following questions:
> > -------------------------------
> > 1) If I have the following in local.cf:
> > use_bayes 1
> > bayes_auto_learn 1
> >
> > Starting from what score message automatically learned by Bayesian?
> >
> > 2) I do quarantine to spam mails and manually review all spam, then I
> put all False Positives (ham) to ham folder and all spam to spam folder
> and run the following script to populate Bayesian and report to Razor.
> >
> > Should I remove headers added like those -
> > X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status:
> >
> sa-learn will automatically ignore any headers and other markups that
> were added by SA, so you don't need to remove those.
>
> You can either remove X-Quarantine-ID, or use a "bayes_ignore_header"
> command to tell SA not to tokenize this.
>
OK, Thanks,
So the script should look like this now?
sa-learn --showdots --bayes_ignore_header X-Quarantine-ID --bayes_ignore_header X-Amavis-Alert --ham *
The problem that I can't find any bayes_ignore_header option in
# man sa-learn
> > Or any others, so learning (sa-learn) would be more accurate?
> > Any other recommendations?
> >
Regards,
Leon
Re: how to modify headers so sa-learn gives more accurate results?
Posted by Matt Kettler <mk...@verizon.net>.
Leon Kolchinsky wrote:
> Hello All,
>
> I'm using the following script for reporting Razor and teaching BAYESIAN with ham and spam messages.
>
> I have the following questions:
> -------------------------------
> 1) If I have the following in local.cf:
> use_bayes 1
> bayes_auto_learn 1
>
> Starting from what score message automatically learned by Bayesian?
>
> 2) I do quarantine to spam mails and manually review all spam, then I put all False Positives (ham) to ham folder and all spam to spam folder and run the following script to populate Bayesian and report to Razor.
>
> Should I remove headers added like those -
> X-Quarantine-ID: X-Spam-Flag: X-Spam-Score: X-Spam-Level: X-Spam-Status:
>
sa-learn will automatically ignore any headers and other markups that
were added by SA, so you don't need to remove those.
You can either remove X-Quarantine-ID, or use a "bayes_ignore_header"
command to tell SA not to tokenize this.
> Or any others, so learning (sa-learn) would be more accurate?
> Any other recommendations?
>