You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by da...@chaosreigns.com on 2014/04/15 20:17:04 UTC

Reminder to clean your corpora

http://wiki.apache.org/spamassassin/CorpusCleaning


cd /path/to/your/spamassassin/masses
sort -rn -k 2 ham.log | head -200 > id.hi
./mboxget < id.hi > mbox
mutt -f mbox


I had a number of spams in my non-spam I was running masscheck on :( 
Please check yours as well.  My ham log file was called ham-net-darxus.log.
It looks like it's probably very important to look at weekly_mass_check
output, not just nightly_mass_check output (which doesn't include network
rules).  I may have been looking at the wrong one in the past.

>From one example, these are two rules I may have caused to get nerfed by my
negligence:

http://ruleqa.spamassassin.org/20140412-r1586838-n/RAZOR2_CHECK/detail
kpg-core and bpoliakoff hit more ham than me (possible false negatives).

http://ruleqa.spamassassin.org/20140412-r1586838-n/RCVD_IN_BRBL_LASTEXT/detail
axb-coi-bulk hit more ham than me.

-- 
"Since everything in life is but an experience perfect in being what
it is, having nothing to do with good or bad, acceptance or rejection,
one may well burst out in laughter." - Long Chen Pa
http://www.ChaosReigns.com