You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2007/01/29 15:37:43 UTC

[Spamassassin Wiki] Update of "PreflightBuildBot" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/PreflightBuildBot

The comment on the change is:
updated to buildbot 0.7.5

------------------------------------------------------------------------------
  
  The corpus it mass-checks is split in a certain way so that results will be available very quickly -- typically in under 10 minutes -- with increasing quantities of results becoming available as time elapses.
  
- Progress of the mass-checks are visible on [http://buildbot.spamassassin.org/preflight/ the Buildbot 'waterfall']; as they complete, their results become visible on the RuleQaApp.
+ Progress of the mass-checks are visible on [http://bbmass.spamassassin.org:8011/ the Buildbot 'waterfall']; as they complete, their results become visible on the RuleQaApp.
  
  == The preflight mass-check corpus ==
  
  This corpus is built from a selection of mail rsync'd up from various people; it's then "smoothed out" into several subsets.   These use differing amounts of mail, starting with a small set of mail in the "mc-fast" chunk, and gradually increasing until we get to the largest block in "mc-slower".   This division means that early "fast" results can arrive quickly, with less to scan, and as time goes on, more and more of the "slower" slaves complete their mass-checks and upload the results.
+ 
+ The "smoothing" and subset selection happens in mass-check nowadays.
  
  == What happens during the preflight buildbot process ==
  
@@ -25, +27 @@

  '''Test''': the mass-check takes place here.  This is usually the time-consuming part.
  
  '''Configure'''; a final summarisation step; first off, a 'FAST FREQS REPORT' is output, the HitFrequencies from the mass-check.  Next, the logs from the mass-check are copied to a safe location, and the 'corpus-hourly' script run to generate various reports from them for the RuleQaApp.  The URL for viewing the results in the RuleQaApp is printed prominently.
- 
- == Administrivia: how the corpus is generated ==
- 
- The corpus is created from the UploadedCorpora.  The script 'populate_cor' is run from cron periodically to rebuild the mass-checkable corpus from this.   It attempts to 'smooth out' the multiple corpora into several new corpora, named "mc-fast", "mc-med", "mc-slow", "mc-slower", matching the buildbot slave names at http://buildbot.spamassassin.org/preflight/ .
- 
- It does this by:
- 
- * extracting mboxes into mail directories of one file per message
- * creating symbolic links to those files in new corpus directories
- * for each new corpus dir, creating a 'targets' file for mass-check listing what files it's created for that corpus.
- 
- It attempts to use one person's corpus per each output corpus, but seeing as there's usually a glut of spam and a limited quantity of ham, it's not always anywhere near a one-to-one correlation.  All the same, by looking at [http://buildbot.spamassassin.org/bbmass/corpus_makeup.txt the logs from the build process], you can see where the correlations lie.
- 
- The output looks like this on-disk:
- 
- {{{
- /home/bbmass/tmpfs/cor/CORPUSNAME/TYPE/LINKNAME
- }}}
- 
- Each "CORPUSNAME" directory corresponds to one of the slave names, "mc-fast", "mc-med", etc. Under that, we have "TYPE", which is either "ham" or "spam". Next, "LINKNAME".  This is a readable filename for the symbolic link, which gives the reader an idea of where the message came from in the source corpora.
  
  == Uploading corpora ==
  
@@ -60, +42 @@

  NAME=mc-new
  
  sudo mkdir -p /home/bbmass/slaves/$NAME
- sudo chown bbmass /home/bbmass/slaves/$NAME
+ sudo chown buildbot /home/bbmass/slaves/$NAME
- 
  cd /home/bbmass/slaves/$NAME
- sudo su bbmass -c \
-          "mktap buildbot slave --basedir /home/bbmass/slaves/$NAME \
+ sudo su buildbot -c "buildbot create-slave --usepty=0 \
+          /home/bbmass/slaves/$NAME \
-          --master buildbot.spamassassin.org:9988 --name $NAME \
+          buildbot.spamassassin.org:9988 $NAME $PASSWORD"
-          --passwd $PASSWORD --usepty=0"
  
  echo $PASSWORD > $HOME/pwd
  sudo mv $HOME/pwd /home/buildbot/pwds/$NAME
@@ -76, +56 @@

  sudo vi /home/buildbot/bots/bbmass/master.cfg
  
          [search for mc-fast and add new lines/entries for $NAME]
+         [don't forget the 'scheduler' part!]
  
- sudo vi /etc/init.d/buildbot 
+ sudo vi /etc/init.d/bbmass 
  
          [search for mc-fast and add new lines/entries for $NAME]