You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Sharma, Ashish" <as...@hp.com> on 2011/10/10 17:36:30 UTC

How to create spam score list for sample email messages

Hi,

I have a mail receiving setup where in Postfix (2.6.6) is the MTA and then I have amavisd-new (with spamassassin and CLamAV) as content filter.

I have enabled spam report header in my amavisd-new conf file.

I want to create a report of sample emails with the spam scores generated in accordance with permissible limits after deploying the spamassassin updated rulesets.

For that I am trying out on a shell script providing with my test email messages to the following (with following command):

spamassassin -C /etc/amavisd.conf -e --progress < testemail.eml

and be able to create a report, that would enlist the spam scores of all email messages that have been parsed by the above tool.

Is it possible?, actually I am unable to generate the spam scores in any output form via the above command to be added in the report.

Moreover I am using amavisd-new config file here, is it a right approach?

Will the above command affect any kind of Bayesian learning of the spamassassin setup ?, I don't want to do that.

Thanks
Ashish Sharma

RE: How to create spam score list for sample email messages

Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2011-10-11 at 15:37 +0000, Sharma, Ashish wrote:
> Martin,
> 
> Your testing strategy of spamassassin is interesting to emulate and I
> have following queries:
> 
> Following are the plugins that get loaded in my spamassassin:
> 
> SpamAssassin loaded plugins: AutoLearnThreshold, Bayes, BodyEval,
> Check, DKIM, DNSEval, FreeMail, FuzzyOcr, HTMLEval, HTTPSMismatch,
> Hashcash, HeaderEval, ImageInfo, MIMEEval, MIMEHeader, Pyzor, Razor2,
> RelayEval, ReplaceTags, SPF, SpamCop, URIDNSBL, URIDetail, URIEval,
> VBounce, WLBLEval, WhiteListSubject
> 
> 1. There are network rules in my spamassassin(correct me if I am
> wrong), How do you simulate and test them?
>
My testing machine has the same external access rights as the live box,
and also does all DNS lookups via the copy of bind 9 on the live server,
so I run the same set of spamd plugins on both boxes. Both use the same
SA version and the same sa_update cycle. 

In fact, I don't much mind if there are differences because my local
rule set has ended up with very little reliance on standard rules and
the testing set-up is primarily to develop local rules. This is because
almost all my spam comes from mailing lists where it has been input
through a web forum: the effect is that by and large header-based rules
don't file on it.

I have one locally developed plugin which whitelists senders who I have
previously sent mail to by accessing a view of my mail archive database.
The associated rule is in its own .cf file. Most of my straight-forward
rules are in local.cf and link to rules that use complex patterns in a
second .cf file, which is automatically built by a awk script that
translates human-readable text (i.e. one alternate pattern per line)
into faster but unreadable SA rules. These all populate the testing
system's configuration directory.

The testing system is managed by a set of scripts that start and stop it
as well as converting test results into readable statistics and
summaries. Finally there is a script that transfers the configuration to
the live system and then restarts it. This has a small amount of
selectivity about what gets transferred. I've released the rule
generator, but the rest of the set-up is probably too specific to my
needs to be worth the trouble of releasing. 

Anyway, I'm certain that anybody who needs something like it can easily
build it by accretion. All my scripts are written in a sinister mix of
bash and awk with a tiny amount of grep thrown in. If you don't know
awk, its worth getting to grips with: its fast and you can do a lot with
very little code once you understand its structure. I also use scp to
transfer files between systems: its very easy to use from a bash script.
   
> 2. How can I divide my spamassassin rulesets, so that network rules
> and local rules can be tested?
> 
Decide on functional rule groupings, put them into separate .cf files
and look at using a script or two to automate the process of applying
them to your live system(s).
   
HTH
Martin




RE: How to create spam score list for sample email messages

Posted by "Sharma, Ashish" <as...@hp.com>.
Martin,

Your testing strategy of spamassassin is interesting to emulate and I have following queries:

Following are the plugins that get loaded in my spamassassin:

SpamAssassin loaded plugins: AutoLearnThreshold, Bayes, BodyEval, Check, DKIM, DNSEval, FreeMail, FuzzyOcr, HTMLEval, HTTPSMismatch, Hashcash, HeaderEval, ImageInfo, MIMEEval, MIMEHeader, Pyzor, Razor2, RelayEval, ReplaceTags, SPF, SpamCop, URIDNSBL, URIDetail, URIEval, VBounce, WLBLEval, WhiteListSubject

1. There are network rules in my spamassassin(correct me if I am wrong), How do you simulate and test them?
2. How can I divide my spamassassin rulesets, so that network rules and local rules can be tested?

Can you please elaborate?

Thanks in advance
Ashish

-----Original Message-----
From: Martin Gregorie [mailto:martin@gregorie.org] 
Sent: Monday, October 10, 2011 9:59 PM
To: users@spamassassin.apache.org
Subject: Re: How to create spam score list for sample email messages

On Mon, 2011-10-10 at 15:36 +0000, Sharma, Ashish wrote:
> I want to create a report of sample emails with the spam scores
> generated in accordance with permissible limits after deploying the
> spamassassin updated rulesets.
>
> For that I am trying out on a shell script providing with my test
> email messages
>
I do something similar, but keep my test messages as separate text files
in a directory because I find that easier to manage. I do approximately
this on a computer that's entirely separate from my mail host and runs
its own copy of spamd so I can mess around with its rule sets and
configuration without upsetting the live copy of SA. The testing SA runs
in effectively the same configuration as my live SA because the test rig
has an identical set of SA config files: when I'm happy with the test
operation I export the entire set of configuration files to the live
system and then restart spamd. Here's the guts of the test system:

for f in testdata/*.txt
do
    spamc <testdata/$f | grep '^X-spam-status: ' >>result.txt
done
analysis_prog result.txt
rm result.txt

My analysis program is an awk script: that or Perl are probably the
weapons of choice for writing this type of program.

You probably need to feed the messages to amavis-new since it is
creating a special header, rather than to spamc/spamd as I do, but I
question whether your command line is right since amavis has direct
access to the Perl modules that make up spamassassin. 

Disclaimer: the previous paragraph contains almost everything I know
about amavis-new. 

Somebody else may be able to help with the amavis-new command line, but
not me since I don't use it. What I do know is that Postfix passes a
message at a time to spamc/spamd so its entirely probable it does the
same with amavis-new if you're running that as a Postfix service. 


Martin



Re: How to create spam score list for sample email messages

Posted by Martin Gregorie <ma...@gregorie.org>.
On Mon, 2011-10-10 at 20:08 +0100, RW wrote:
> On Mon, 10 Oct 2011 17:29:08 +0100
> Martin Gregorie wrote:
> 
> 
> > for f in testdata/*.txt
> > do
> >     spamc <testdata/$f | grep '^X-spam-status: ' >>result.txt
> 
> For that to work you need the setting
> 
> fold_headers 0

Fair comment: I use gawk rather than grep and my filter looks like this:

spamc -l <$s | gawk '
BEGIN           { tag=0 }
/^X-Spam/       { tag=1; print; next }
/^ / || /^\t/   { if (tag==1) { print } next }
                { tag = 0 }
' | where_ever

I don't use 'fold_headers 0' because I don't want *any* differences
between my test config and the live one. From the look of that filter I
obviously ran into the folded line thing solved the problem with gawk.


Martin





Re: How to create spam score list for sample email messages

Posted by RW <rw...@googlemail.com>.
On Mon, 10 Oct 2011 17:29:08 +0100
Martin Gregorie wrote:


> for f in testdata/*.txt
> do
>     spamc <testdata/$f | grep '^X-spam-status: ' >>result.txt

For that to work you need the setting

fold_headers 0

Re: How to create spam score list for sample email messages

Posted by Martin Gregorie <ma...@gregorie.org>.
On Mon, 2011-10-10 at 15:36 +0000, Sharma, Ashish wrote:
> I want to create a report of sample emails with the spam scores
> generated in accordance with permissible limits after deploying the
> spamassassin updated rulesets.
>
> For that I am trying out on a shell script providing with my test
> email messages
>
I do something similar, but keep my test messages as separate text files
in a directory because I find that easier to manage. I do approximately
this on a computer that's entirely separate from my mail host and runs
its own copy of spamd so I can mess around with its rule sets and
configuration without upsetting the live copy of SA. The testing SA runs
in effectively the same configuration as my live SA because the test rig
has an identical set of SA config files: when I'm happy with the test
operation I export the entire set of configuration files to the live
system and then restart spamd. Here's the guts of the test system:

for f in testdata/*.txt
do
    spamc <testdata/$f | grep '^X-spam-status: ' >>result.txt
done
analysis_prog result.txt
rm result.txt

My analysis program is an awk script: that or Perl are probably the
weapons of choice for writing this type of program.

You probably need to feed the messages to amavis-new since it is
creating a special header, rather than to spamc/spamd as I do, but I
question whether your command line is right since amavis has direct
access to the Perl modules that make up spamassassin. 

Disclaimer: the previous paragraph contains almost everything I know
about amavis-new. 

Somebody else may be able to help with the amavis-new command line, but
not me since I don't use it. What I do know is that Postfix passes a
message at a time to spamc/spamd so its entirely probable it does the
same with amavis-new if you're running that as a Postfix service. 


Martin