You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by da...@chaosreigns.com on 2011/04/12 04:15:01 UTC

Re: [SA-dev] Script to collect IP reputation data from SA mass-check targets

On 03/30, Adam Katz wrote:
> Be careful about measuring the usefulness of that data; you'll have to
> measure samples against each other, and even then you will have
> imperfect results.

If this ever gets added to the mass-check tests, I'll be more than happy to
create a separate set of the data based only on data from people who are
not contributing to mass-checks.  Right now, I only have data from 1796
emails that aren't run through mass-check, so it's not worth it.  But I'm
keeping all input data separated by who contributed it, so a special
version for mass-check folks will be easy.  

I just posted some test results to the users list that I'm pretty happy
with.  I'd really like to get more data though.

Graph of the results:  http://www.chaosreigns.com/iprep/results.svg
Based on training on all corpora except mine, and then training on mine 1
spam and 1 ham at a time, calculating the accuracy at each step using a
separate test set of my email.  3 sets of lines from 3 runs using randomly
selected training and scoring sets.

Project web page:  http://www.chaosreigns.com/iprep/

-- 
"I don't want to die... just yet... not while there's... women."
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com