You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by da...@chaosreigns.com on 2011/04/12 04:15:01 UTC
Re: [SA-dev] Script to collect IP reputation data from SA
mass-check targets
On 03/30, Adam Katz wrote:
> Be careful about measuring the usefulness of that data; you'll have to
> measure samples against each other, and even then you will have
> imperfect results.
If this ever gets added to the mass-check tests, I'll be more than happy to
create a separate set of the data based only on data from people who are
not contributing to mass-checks. Right now, I only have data from 1796
emails that aren't run through mass-check, so it's not worth it. But I'm
keeping all input data separated by who contributed it, so a special
version for mass-check folks will be easy.
I just posted some test results to the users list that I'm pretty happy
with. I'd really like to get more data though.
Graph of the results: http://www.chaosreigns.com/iprep/results.svg
Based on training on all corpora except mine, and then training on mine 1
spam and 1 ham at a time, calculating the accuracy at each step using a
separate test set of my email. 3 sets of lines from 3 runs using randomly
selected training and scoring sets.
Project web page: http://www.chaosreigns.com/iprep/
--
"I don't want to die... just yet... not while there's... women."
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com