You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2007/01/18 15:18:40 UTC

[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
update for 3.2.0

------------------------------------------------------------------------------
  = Rescore Mass-Check Instructions =
  
- '''(These are the instructions for the now completed re-run of 3.1.0 mass-checks; see RescoreMassCheck for the overview of the general process in toto.  This page left as-is for the next time we have to do it!)'''
+ '''(see RescoreDetails310 for historical 3.1.0 mass-check documentation.)'''
  
  Here's the procedure you'll need to follow, if you wish to submit data for the
- rescoring run for 3.1.0 using MassCheck:
+ rescoring run for 3.2.0 using MassCheck:
  
- Clean up the corpus of mail you intend to MassCheck (see CorpusCleaning), and get an rsync account (see RsyncAccounts).  The latter can be done while mass-check is running, btw, it's not needed until the end; and the 'checking for false positives and false negatives' stage of corpus cleaning can be done afterwards as well.
+ Clean up the corpus of mail you intend to MassCheck (see CorpusCleaning). The
+ 'checking for false positives and false negatives' stage of corpus cleaning can
+ be done after mass-checks complete, if you like.
+ 
+ Get an rsync account (see RsyncAccounts).  If you are submitting nightly
+ mass-check results, the account you use for that will work.  Otherwise, getting
+ an account can be done while mass-check is running, since it's not needed until
+ the end.
  
  It's helpful, but not required, to have some or all of the helper applications
  installed:
  
   * the Mail::SPF::Query module
   * the Net::DNS module
+  * Razor
   * Pyzor
  
  If you're running nightly mass-checks, please feel free to disable them when
+ running the rescore mass-check runs.
- running the rescore mass-check runs.  Also, please note that the nightly
- submission accounts will work for rescore submissions as well.
  
- Note that it's essential that you mass-check ''both'' ham and spam for this run, as otherwise the Bayes rules will be affected.
+ Note that it's essential that you mass-check ''both'' ham and spam for this
+ run, as otherwise the Bayes rules will be affected.
  
  Then run these commands:
  
  {{{
-   wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre4.tar.gz
+   wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.2.0-pre??.tar.gz
-   tar xvfz Mail-SpamAssassin-3.1.0-pre4.tar.gz
+   tar xvfz Mail-SpamAssassin-3.2.0-pre??.tar.gz
-   cd Mail-SpamAssassin-3.1.0
+   cd Mail-SpamAssassin-3.2.0
    perl Makefile.PL < /dev/null
    make
  
@@ -62, +70 @@

  {{{spamassassin/user_prefs}}} file.  But SA should be able to infer it in most
  cases.  A good way to tell is if you see no SPF_PASS results -- SPF will not be used if the message passes through one or more trusted relays.
  
- Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages.
+ Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages.  (This step is very important.)
  
  Then submit your results!
  
@@ -75, +83 @@

    rsync -Pcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
  }}}
  
- (''note: previously, we used -C on those rsync commands.  it should be removed as the current host seems to be running a version of rsync that cannot handle that, giving this error: 'filter rules are too modern for remote rsync. rsync error: syntax or usage error (code 1) at exclude.c(1119)'.'')
- 
  That's it!
  
- The results for this run will need to be in by Friday July 22nd (tentatively).  If you're still running then, submit what you have so far and beg for more time.  We
+ The results for this run will need to be in by Tuesday Feb 6th (tentatively).  If you're still running then, submit what you have so far and beg for more time.  We
  may be pushing it out a little further anyway depending on how things go  ;)