You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2011/03/30 17:20:18 UTC

Re: [SA-dev] Script to collect IP reputation data from SA mass-check targets

On 03/29/2011 06:09 PM, darxus@chaosreigns.com wrote:
> I'd like everybody to run it in a daily cron job (along with your
> mass-checks, if you're doing them).

Be careful about measuring the usefulness of that data; you'll have to
measure samples against each other, and even then you will have
imperfect results.


Re: [SA-dev] Script to collect IP reputation data from SA mass-check targets

Posted by da...@chaosreigns.com.
On 03/30, Adam Katz wrote:
> Be careful about measuring the usefulness of that data; you'll have to
> measure samples against each other, and even then you will have
> imperfect results.

If this ever gets added to the mass-check tests, I'll be more than happy to
create a separate set of the data based only on data from people who are
not contributing to mass-checks.  Right now, I only have data from 1796
emails that aren't run through mass-check, so it's not worth it.  But I'm
keeping all input data separated by who contributed it, so a special
version for mass-check folks will be easy.  

I just posted some test results to the users list that I'm pretty happy
with.  I'd really like to get more data though.

Graph of the results:  http://www.chaosreigns.com/iprep/results.svg
Based on training on all corpora except mine, and then training on mine 1
spam and 1 ham at a time, calculating the accuracy at each step using a
separate test set of my email.  3 sets of lines from 3 runs using randomly
selected training and scoring sets.

Project web page:  http://www.chaosreigns.com/iprep/

-- 
"I don't want to die... just yet... not while there's... women."
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com