You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2007/07/31 14:16:46 UTC

[Spamassassin Wiki] Update of "MassCheck" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/MassCheck

The comment on the change is:
update mass-check wiki docs

------------------------------------------------------------------------------
  
  "mass-check" is a tool included with the SpamAssassin source distribution in the [wiki:MassesOverview 'masses' directory] to test rules for accuracy and hit-rate.  If you're writing custom rules, you really should use this to test them.
  
- First, you need HandClassifiedCorpora.  Let's say that's made up of two maildir folders, "/path/to/ham" and "/path/to/spam".
+ First, you need HandClassifiedCorpora.  Let's say that's made up of two mbox folders, "/path/to/ham" and "/path/to/spam".
  
  Next, cd into the "masses" directory of the source distribution:
  
  {{{
      cd masses
      ./mass-check --progress \
-               ham:dir:/path/to/ham \
+               ham:mbox:/path/to/ham \
-               spam:dir:/path/to/spam
+               spam:mbox:/path/to/spam
  }}}
  
  This will create two files, "ham.log" and "spam.log" containing the hitting rules, read from the rules dir "../rules" as they are applied to that corpus.  Each line of the two log files represents details about one email message, and there's a line for every message.
  
  mass-check also takes other options to control whether network tests are run, whether multiple processes are run in parallel, how the output is presented, etc.; read the comments at the top of the file for details.  Here's some key bits:
+ 
+ == Configuration File ==
+ 
+ Mass-check reads a "user_prefs" file in "spamassassin/user_prefs".  You need to create this yourself, it will not be created for you.
  
  == Using network tests ==
  
@@ -36, +40 @@

      echo "use_bayes 1" > spamassassin/user_prefs
  }}}
  
+ or to turn it off:
+ 
+ {{{
+     cd masses
+     mkdir spamassassin
+     echo "use_bayes 0" > spamassassin/user_prefs
+ }}}
+ 
  == Once mass-check completes ==
  
  The next step is to run hit-frequencies: see HitFrequencies for details.
  
  == Usage ==
  
- 
- 
- usage:[[BR]]
  mass-check [options] target ...
   
  ||-c=file      || set configuration/rules directory[[BR]]||
@@ -85, +94 @@

   
  Just left over functions we should remove at some point: [[BR]]
  ||--bayes      || report score from Bayesian classifier [[BR]]||
+ 
+ == Usage: Targets ==
   
  non-option arguments are used as target names (mail files and folders), 
  the target format is: <class>:<format>:<location> [[BR]]
  ||class      || is "spam" or "ham" [[BR]]||
- ||format     || is "dir", "file", "mbx", or "mbox" [[BR]]||
+ ||format     || is "dir", "file", "mbx", "mbox", or "detect" [[BR]]||
  ||location   || is a file or directory name. Globbing of ~ and * is supported. [[BR]]||
+ 
+ "detect" can be used as a format.  This assumes "mbox" for any file whose path contains the pattern "/\.mbox/i", "file" anything that is not a directory, or "directory" otherwise.
+ 
  
  ----------------------