You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2006/11/16 20:19:24 UTC

[Spamassassin Wiki] Update of "NightlyMassCheck" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/NightlyMassCheck

The comment on the change is:
revamp this page and add lots of HOWTO doc

------------------------------------------------------------------------------
  
  (There's also an older, clunkier version of the analysis scripts running on DanielQuinlan's server; see http://www.pathname.com/~corpus .)
  
- == How? ==
+ There are three ways to do this; using a script we distribute, doing it yourself, or just uploading your corpus to our server.
+ 
+ == How? (The Corpus-Nightly Script) ==
  
  The corpus-nightly script in the masses/rule-qa/ directory of the SpamAssassin
+ tree can be used to set up a mass-checker on your mail.  Here's a step-by-step account of the process.
- tree can be used to set this up. It's probably not very well documented,
- (WeLoveVolunteers), but it does work. 
  
- You'll also need to ask for RsyncAccounts and make sure you get a "nightly"
+ First off, you'll also need to ask for RsyncAccounts and make sure you get a
- account rather than a release-time account.
+ "nightly" account rather than a release-time account.   You also need to
+ install Subversion to get the "svn" command.
  
- == How? (in more detail) ==
+ Then run:
+ 
+ {{{
+ mkdir $HOME/nightlymc
+ cd $HOME/nightlymc
+ svn co http://svn.apache.org/repos/asf/spamassassin/trunk
+ cp masses/rule-qa/corpus.example ~/.corpus
+ }}}
+ 
+ Edit '~/.corpus' to have values something like this, replacing /home/jm
+ with whatever your own $HOME is.
+ 
+ {{{
+ vi ~/.corpus
+ # temporary working directory for summary results
+ tmp=/home/jm/nightlymc/tmp
+ 
+ # subversion directory location
+ # [this is the directory you have already checked out!]
+ tree=/home/jm/nightlymc/trunk
+ 
+ # rsync username and password (see RsyncAccounts)
+ username=jm
+ password=xyzzy
+ 
+ # weekly and nightly mass-check options
+ opts_weekly="--restart=500 --tail=15000 --net -j 8 -f /home/jm/nightlymc/targets"
+ opts_nightly="--restart=500 --tail=15000 -f /home/jm/nightlymc/targets"
+ 
+ # weekly and nightly mass-check user_prefs files
+ prefs_weekly=/home/jm/nightlymc/user_prefs.weekly
+ prefs_nightly=/home/jm/nightlymc/user_prefs.nightly
+ }}}
+ 
+ Now, create those two user_prefs files.  Here's suggested (basic)
+ settings:
+ 
+ user_prefs.nightly:
+ 
+ {{{
+ use_bayes 0
+ use_auto_whitelist 0
+ internal_networks 127/8
+ trusted_networks 127/8
+ }}}
+ 
+ I suggest just "cp"'ing that file to {{{user_prefs.weekly}}} as well,
+ but if you wanted different settings to control network rules, go ahead.
+ It might make sense to extend those with full trusted-networks
+ data, if you like.
+ 
+ Edit {{{~/nightlymc/targets}}}:
+ 
+ {{{
+ ham:detect:/local/cor/recent/ham/*
+ spam:detect:/local/cor/recent/spam/*
+ }}}
+ 
+ That's it -- now run
+ {{{/home/jm/nightlymc/trunk/masses/rule-qa/corpus-nightly}}} and watch as it
+ starts mass-checking.  Once you're happy enough with it, set that command
+ to run in cron.
+ 
+ Note: the best time to run a mass-check is as soon as possible after 0900
+ UTC.  (Watch out for daylight savings time!  The best way to do that is to run it at the corresponding times for 0900 and 1000 UTC in your timezone; it'll automatically ignore the one that's too early.)
+ 
+ == How? (The DIY Version) ==
+ 
+ Here's more detail on that process, if you don't want to use the "corpus-nightly" script.
  
  Get ahold of http://rsync.spamassassin.org/$VERS-versions.txt, where
  $VERS is either "nightly" or "weekly".  "nightly" is updated a little
@@ -81, +151 @@

  (The version of the tree available at rsync://rsync.spamassassin.org/tagged_builds/nightly_mass_check and .../weekly_mass_check already has this file
  included.)
  
- == An Easier Way ==
+ == How? (An Easier Way) ==
  
  There is one; if you rsync up your corpus to the buildbot server, as described in UploadedCorpora, it can be mass-checked there instead.  Unfortunately you have to share your mail corpus with whoever might have access to that machine.  (It's not expected that anyone will actually ''look'', but if you are very concerned about privacy, you may be advised to strip out the more private mails before uploading, or mass-check on your own machine instead.)
  

Re: [Spamassassin Wiki] Update of "NightlyMassCheck" by JustinMason

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Daryl C. W. O'Shea wrote:
> Apache Wiki wrote:
> 
>> + Note: the best time to run a mass-check is as soon as possible after 
>> 0900
>> + UTC.  (Watch out for daylight savings time!  The best way to do that 
>> is to run it at the corresponding times for 0900 and 1000 UTC in your 
>> timezone; it'll automatically ignore the one that's too early.)
> 
> This doesn't make sense. 0900 UTC is 0900 UTC no matter what timezone 
> you're in. :)

Actually, I just didn't parse that well... I guess that part makes sense.


> In any case, you only want to have it run once... it sleeps for an hour, 
> not exits, no?
> 
> 
> Daryl
> 


Re: [Spamassassin Wiki] Update of "NightlyMassCheck" by JustinMason

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Apache Wiki wrote:

> + Note: the best time to run a mass-check is as soon as possible after 0900
> + UTC.  (Watch out for daylight savings time!  The best way to do that is to run it at the corresponding times for 0900 and 1000 UTC in your timezone; it'll automatically ignore the one that's too early.)

This doesn't make sense. 0900 UTC is 0900 UTC no matter what timezone 
you're in. :)

In any case, you only want to have it run once... it sleeps for an hour, 
not exits, no?


Daryl