You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jari Fredriksson <ja...@iki.fi> on 2012/08/08 23:31:16 UTC

masscheck corpus quality

How do you read this? This is from ruleqa page for today's corpus.

I have lots more ham messages than 3119. And  it shows no ham for this
year. What? I have 11230 ham messages in my corpus...
 

jarif            Spam messages    Score range    Ham messages     Score
range  
  in 2005              0                              80   (0%)  
[0,2]        
  in 2006              0                               9   (0%)  
[0,1]        
  in 2007              0                              21   (0%)  
[0,2]        
  in 2008              0                              12   (0%)  
[0,0]        
  in 2009              0                              73   (0%)  
[0,5]        
  in 2010              0                             474   (0%)  
[0,5]        
  in 2011              2   (0%)   [2,8]             2450   (1%)  
[0,8]        
  in 2011-09           1   (0%)   [3,3]               
0                       
  in 2011-12           3   (0%)   [0,2]               
0                       
  in 2012-01          16   (0%)   [0,12]              
0                       
  in 2012-02          13   (0%)   [0,46]              
0                       
  in 2012-03          20   (0%)   [0,28]              
0                       
  in 2012-04          75   (0%)   [0,66]              
0                       
  in 2012-05         295   (0%)   [0,49]              
0                       
  TOTAL:             425   (0%)   [0,66]            3119   (2%)  
[0,8]      

-- 

"Unfortunately suspend does mean things sometimes"

Husse Apr 25 2007



Re: masscheck corpus quality

Posted by Jari Fredriksson <ja...@iki.fi>.
09.08.2012 04:20, John Hardin kirjoitti:
> On Thu, 9 Aug 2012, Jari Fredriksson wrote:
>
>> How do you read this? This is from ruleqa page for today's corpus.
>>
>> I have lots more ham messages than 3119. And  it shows no ham for this
>> year. What? I have 11230 ham messages in my corpus...
>
> What numbers do you see when you hover over the hit % for your corpora
> in the "set 0, broken down by contributor" section? Are they similarly
> underreported?
>

Yes. But let's see how it looks when current day stats show up. I think
I have had local problems causing all of this.

-- 

You have an unusual magnetic personality.  Don't walk too close to
metal objects which are not fastened down.



Re: masscheck corpus quality

Posted by John Hardin <jh...@impsec.org>.
On Thu, 9 Aug 2012, Jari Fredriksson wrote:

> How do you read this? This is from ruleqa page for today's corpus.
>
> I have lots more ham messages than 3119. And  it shows no ham for this
> year. What? I have 11230 ham messages in my corpus...

What numbers do you see when you hover over the hit % for your corpora in 
the "set 0, broken down by contributor" section? Are they similarly 
underreported?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   No representation without taxation!
-----------------------------------------------------------------------
  7 days until the 67th anniversary of the end of World War II