You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2005/11/18 06:40:45 UTC

[Spamassassin Wiki] Update of "MassCheck" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/MassCheck

The comment on the change is:
moved from MassesOverview

------------------------------------------------------------------------------
  
  "mass-check" is a tool included with the SpamAssassin source distribution to test rules for accuracy and hit-rate.  If you're writing custom rules, you really should use this to test them.
  
- First, you need HandClassifiedCorpora.  Let's say that's divided into two maildir folders, "/path/to/ham" and "/path/to/spam".
+ First, you need HandClassifiedCorpora.  Let's say that's made up of two maildir folders, "/path/to/ham" and "/path/to/spam".
  
  Next, cd into the "masses" directory of the source distribution:
  
@@ -15, +15 @@

                spam:dir:/path/to/spam
  }}}
  
- This will create two files, "ham.log" and "spam.log" containing hit-rates from the rules in the rules dir "../rules" as they are applied to that corpus.
+ This will create two files, "ham.log" and "spam.log" containing the hitting rules, read from the rules dir "../rules" as they are applied to that corpus.  Each line of the two log files represents details about one email message, and there's a line for every message.
  
  mass-check also takes other options to control whether network tests are run, whether multiple processes are run in parallel, how the output is presented, etc.; read the comments at the top of the file for details.  Here's some key bits:
  
@@ -40, +40 @@

  
  The next step is to run hit-frequencies: see HitFrequencies for details.
  
+ == Usage ==
+ 
+ 
+ 
+ usage:[[BR]]
+ mass-check [options] target ...
+  
+ ||-c=file      || set configuration/rules directory[[BR]]||
+ ||-p=dir        ||set user-prefs directory[[BR]]||
+ ||-f=file      || read list of targets from <file>[[BR]]||
+ ||-j=jobs      || specify the number of processes to run simultaneously[[BR]]||
+ ||--net        || turn on network checks![[BR]]||
+ ||--mid        || report Message-ID from each message[[BR]]||
+ ||--debug       ||report debugging information[[BR]]||
+ ||--progress    ||show progress updates during check[[BR]]||
+ ||--rewrite=OUT ||save rewritten message to OUT (default is /tmp/out)[[BR]]||
+ ||--showdots    ||print a dot for each scanned message[[BR]]||
+ ||--rules=RE   || Only test rules matching the given regexp RE[[BR]]||
+ ||--restart=N  || restart all of the children after processing N messages[[BR]]||
+ ||--deencap=RE || Extract SpamAssassin-encapsulated spam mails only if they were encapsulated by servers matching the regexp RE (default = extract all SpamAssassin-encapsulated mails)||
+  
+ log options[[BR]]
+ ||-o            ||write all logs to stdout[[BR]]||
+ ||--loghits     ||log the text hit for patterns (useful for debugging)[[BR]]||
+ ||--loguris     ||log the URIs found[[BR]]||
+ ||--hamlog=log  ||use <log> as ham log ('ham.log' is default)[[BR]]||
+ ||--spamlog=log ||use <log> as spam log ('spam.log' is default)[[BR]]||
+  
+ message selection options[[BR]]
+ ||-n            ||no date sorting or spam/ham interleaving[[BR]]||
+ ||--after=N    || only test mails received after time_t N (negative values are an offset from current time, e.g. -86400 = last day) or after date as parsed by Time::Parsedate (e.g. '-6 months') [[BR]]||
+ ||--before=N   || same as --after, except received times are before time_t N [[BR]]||
+ ||--all        || don't skip big messages [[BR]]||
+ ||--head=N     || only check first N ham and N spam (N messages if -n used) [[BR]]||
+ ||--tail=N     || only check last N ham and N spam (N messages if -n used) [[BR]]||
+  
+ simple target options (implies -o and no ham/spam classification) [[BR]]
+ ||--dir        || subsequent targets are directories [[BR]]||
+ ||--file       || subsequent targets are files in RFC 822 format [[BR]]||
+ ||--mbox       || subsequent targets are mbox files [[BR]]||
+ ||--mbx         ||subsequent targets are mbx files [[BR]]||
+  
+ Just left over functions we should remove at some point: [[BR]]
+ ||--bayes      || report score from Bayesian classifier [[BR]]||
+  
+ non-option arguments are used as target names (mail files and folders), 
+ the target format is: <class>:<format>:<location> [[BR]]
+ ||class      || is "spam" or "ham" [[BR]]||
+ ||format     || is "dir", "file", "mbx", or "mbox" [[BR]]||
+ ||location   || is a file or directory name. Globbing of ~ and * is supported. [[BR]]||
+ 
  ----------------------
  
  CategorySoftware