You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/14 22:31:06 UTC

[Spamassassin Wiki] Update of "RescoreMassCheck" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreMassCheck

The comment on the change is:
update for 3.3.0

------------------------------------------------------------------------------
  = Rescore Mass-Check =
  
- '''(see RescoreMassCheck310 for the 3.1.x historical page)'''
+ '''(see RescoreMassCheck310 or RescoreMasscheck320 for historical releases)'''
  
  This is the procedure we use to generate new scores.  It takes quite a while and is labour-intensive, so we do it infrequently.
  
@@ -23, +23 @@

  
  = Procedure =
  
- Here's the process for generating the scores as of SpamAssassin 3.2.0:
+ Here's the process for generating the scores as of SpamAssassin 3.3.0:
  
  == 1. heads-up ==
  
@@ -50, +50 @@

  {{{
  ssh spamassassin.zones.apache.org
  cd /home/corpus-rsync
- OLDVERSION="3.1"
+ OLDVERSION="3.2"
  sudo mv corpus/submit scoregen-$OLDVERSION
  sudo mkdir corpus/submit
  sudo chown rsync corpus/submit
@@ -67, +67 @@

  
  svn cp \
          https://svn.apache.org/repos/asf/spamassassin/trunk \
-         https://svn.apache.org/repos/asf/spamassassin/tags/3_2_0_mcsnapshot_1
+         https://svn.apache.org/repos/asf/spamassassin/tags/3_3_0_mcsnapshot_1
  }}}
  
  (we can't use the standard build process here anymore since the dist tarball no longer includes "masses".  Use a descriptive, unique tag name.)
  
  == 2. announce mass-check ==
  
- RescoreDetails is the full announcement text (and instructions) for this phase.  It's sufficient just to send out a mail something like the one we used in 3.1.0:
+ RescoreDetails is the full announcement text (and instructions) for this phase.  It's sufficient just to send out a mail something like the one we used in previous releases:
  
  {{{
  To: users
  Cc: dev
- Subject: NOTICE: 3.2.0 rescoring mass-checks
+ Subject: NOTICE: 3.3.0 rescoring mass-checks
  
  OK, if you're planning to send us mass-check logs for the
- 3.2.0 rescoring, now's the time!
+ 3.3.0 rescoring, now's the time!
  
  http://wiki.apache.org/spamassassin/RescoreDetails has all
  the details.
@@ -122, +122 @@

  ./log-grep-recent -m 6 /home/corpus-rsync/corpus/submit/spam-*.log > spam-full.log
  }}}
  
- We may have to tweak the number of months specified for each type, if there's too much or too little mail resulting from the grep.  but 38 months / 6 months worked well for 3.2.0.
+ We may have to tweak the number of months specified for each type, if there's too much or too little mail resulting from the grep.  but 38 months / 6 months worked well for 3.3.0.
  
  == 4.2 tweak rules for evolver ==
  
- Go through the rulesrc dir, comment out all "score" lines except
- for rules that you think the scores are accurate like carefully-vetted net rules, or 0.001 informational rules.
+ Go through the rulesrc dir, comment out all "score" lines except for rules that you think the scores are accurate like carefully-vetted net rules, or 0.001 informational rules.
  
  == 4.3 resync to mcsnapshot rules list ==
  
@@ -145, +144 @@

  {{{
  cd /path/to/checkout/of/trunk
  svn co \
-   https://svn.apache.org/repos/asf/spamassassin/tags/3_2_0_mcsnapshot_1/rules \
+   https://svn.apache.org/repos/asf/spamassassin/tags/3_3_0_mcsnapshot_1/rules \
    rules-mcsnapshot
  cp rules-mcsnapshot/active.list rules/active.list
  make
@@ -159, +158 @@

  
  == 5. generate scores for score sets ==
  
- See RunningGa.  (in the past we used RunningPerceptron, but it acted up during 3.2.0 generation, so we used the GA again.)
+ See RunningGa.  (in the past we used RunningPerceptron, but it acted up during 3.3.0 generation, so we used the GA again.)
  
  Once this is complete, rules/50_scores.cf will have the generated scores, created by runGA. (TODO: I think.)
  
@@ -185, +184 @@

  Since stuff like the STATISTICS cannot ever be regenerated without the (randomised) test logs, these need to be saved, too.   Currently, I think the best bet is to upload the {{{rescore-logs.tgz}}} file somewhere on spamassassin.zones.apache.org; it doesn't have to be in a public place, ASF-committer-account-required is fine.  Just mention that path in the rescoring bug's comments.  last time, I did this:
  
  {{{
- sudo mkdir /home/corpus-rsync/ARCHIVE/3.2.0
+ sudo mkdir /home/corpus-rsync/ARCHIVE/3.3.0
- sudo mv rescore-logs.tgz /home/corpus-rsync/ARCHIVE/3.2.0/rescore-logs-bug5270.tgz
+ sudo mv rescore-logs.tgz /home/corpus-rsync/ARCHIVE/3.3.0/rescore-logs-bug6155.tgz
  }}}
  
  == 6.5. mark evolved-score rules as 'always published' ==