You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ben Stover <bx...@yahoo.co.uk> on 2014/06/09 09:47:34 UTC

Spam score range and distribution statistics?

As far as I found out SpamAssassin calculates the spam score and puts the value into the email header.

What is the maximum range of the score?

-10,....,+10

or other?

Is there a statistic for an average email account how much emails get which score?

In other words is there something like a gaussian distribution graphic visualisation?

Ben



Re: Spam score range and distribution statistics?

Posted by Antony Stone <An...@spamassassin.open.source.it>.
On Monday 09 June 2014 at 09:50, Matus UHLAR - fantomas wrote:

> On 09.06.14 09:47, Ben Stover wrote:
> >As far as I found out SpamAssassin calculates the spam score and puts the
> > value into the email header.
> >
> >What is the maximum range of the score?
> >
> >-10,....,+10
> 
> I don't think it has limits. Maybe just limist for integer.

http://spamassassin.apache.org/gtube for example has a default score of 1000.


Antony.

-- 
"In fact I wanted to be John Cleese and it took me some time to realise that 
the job was already taken."

 - Douglas Adams

                                                     Please reply to the list;
                                                           please don't CC me.

Re: Spam score range and distribution statistics?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 09.06.14 09:47, Ben Stover wrote:
>As far as I found out SpamAssassin calculates the spam score and puts the
> value into the email header.
>
>What is the maximum range of the score?
>
>-10,....,+10

I don't think it has limits. Maybe just limist for integer.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Eagles may soar, but weasels don't get sucked into jet engines. 

Re: Spam score range and distribution statistics?

Posted by Joe Quinn <jq...@pccc.com>.
On 6/9/2014 11:34 AM, Bowie Bailey wrote:
> On 6/9/2014 3:47 AM, Ben Stover wrote:
>> As far as I found out SpamAssassin calculates the spam score and puts 
>> the value into the email header.
>>
>> What is the maximum range of the score?
>>
>> -10,....,+10
>>
>> or other?
>
> There are no limits on the score.  The higher the score, the more 
> likely the email is spam and the lower the score, the more likely it 
> is to be non-spam.  Looking through the last month's worth of logs on 
> my server, I see scores ranging from -98 to 101.
>
>> Is there a statistic for an average email account how much emails get 
>> which score?
>>
>> In other words is there something like a gaussian distribution 
>> graphic visualisation?
>
> That would be different on every server depending on what type of spam 
> and ham you see and which rule sets you are running.  I graphed mine 
> out of curiosity and it forms a reasonable bell curve from -14 to 40 
> peaking at about 9.  Although there is an odd spike sticking up from 
> -3 to 1 for some reason (and a rather large spike at 0).
>
> I'm not a statistics guy, so I can't give you all the distribution 
> numbers -- and, as I said, it will likely differ a fair amount between 
> installations.
>
> Are you just looking for general information, or is there something 
> you are trying to determine?  If you tell us what you are looking for, 
> we may be able to give you some better answers.
>
That spike around zero is going to be your typical boring ham. It passes 
SPF and some other minor ham rules, and hits very very minor spam rules, 
if any.

Re: Spam score range and distribution statistics?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote:
> > In other words is there something like a gaussian distribution
> > graphic visualisation?
> 
> That would be different on every server depending on what type of spam 
> and ham you see and which rule sets you are running.  I graphed mine out 
> of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
> at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
> some reason (and a rather large spike at 0).

I don't think that second spike is odd. That's the majority of your ham.

Since the data-set includes both spam and ham combined, there are two
spikes to be expected. A single bell curve would mean too many messages
in the gray area, no clear distinction between ham and spam, and
consequently lots of false positives and negatives.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spam score range and distribution statistics?

Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/9/2014 3:47 AM, Ben Stover wrote:
> As far as I found out SpamAssassin calculates the spam score and puts the value into the email header.
>
> What is the maximum range of the score?
>
> -10,....,+10
>
> or other?

There are no limits on the score.  The higher the score, the more likely 
the email is spam and the lower the score, the more likely it is to be 
non-spam.  Looking through the last month's worth of logs on my server, 
I see scores ranging from -98 to 101.

> Is there a statistic for an average email account how much emails get which score?
>
> In other words is there something like a gaussian distribution graphic visualisation?

That would be different on every server depending on what type of spam 
and ham you see and which rule sets you are running.  I graphed mine out 
of curiosity and it forms a reasonable bell curve from -14 to 40 peaking 
at about 9.  Although there is an odd spike sticking up from -3 to 1 for 
some reason (and a rather large spike at 0).

I'm not a statistics guy, so I can't give you all the distribution 
numbers -- and, as I said, it will likely differ a fair amount between 
installations.

Are you just looking for general information, or is there something you 
are trying to determine?  If you tell us what you are looking for, we 
may be able to give you some better answers.

-- 
Bowie