You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Henrik Krohns <he...@hege.li> on 2018/09/27 05:06:41 UTC

Please use --after in mass-checks

Hello mass checkers,

Please notice the --after clauses added to automasscheck.cf.

# Use --after selector for corpus to prevent unnecessary processing.
# Current ruleqa settings: ham 6 years, spam 2 months
# Anything older than that will be ignored by ruleqa regardless.
run_all_masschecks() {
  ### sample: single corpus ###
  run_masscheck single-corpus \
          --after=-174182400 ham:dir:/path/to/Maildir/.Ham/ \
          --after=-4838400 spam:dir:/path/to/Maildir/.Spam/


Some of you are submitting spam older than 8 weeks.  While it breaks
nothing, it's just wasting your own resources since ruleqa will filter it
anyway.  :-)


$ find spam*log -mtime -30 | while read -r f; do echo === $f; perl -ne 'next unless /\btime=(\d+)/; $age = (time-$1)/604800; print "$age\n"' < $f | histogram; done

(looking at weeks here)

=== spam-darxus.log
Count: 43283
Range:  0.203 - 299.126; Mean: 127.438; Median: 133.250; Stddev: 78.538
Percentiles:  90th: 234.266; 95th: 250.646; 99th: 289.647
   0.203 -    1.090:     5 |
   1.090 -    2.629:     9 |
   2.629 -    5.301:    30 |
   5.301 -    9.943:  1005 ####
   9.943 -   18.003:  2509 #########
  18.003 -   32.001:  3413 #############
  32.001 -   56.309:  4700 ##################
  56.309 -   98.521:  5097 ###################
  98.521 -  171.826: 12462 ###############################################
 171.826 -  299.126: 14053 #####################################################

=== spam-grenier.log
Count: 3345
Range:  0.292 - 306.112; Mean: 164.772; Median: 186.234; Stddev: 64.757
Percentiles:  90th: 236.786; 95th: 240.271; 99th: 245.981
   0.292 -    1.233:     3 |
   1.233 -    2.859:     8 |
   2.859 -    5.669:     8 |
   5.669 -   10.525:    18 #
  10.525 -   18.919:    31 #
  18.919 -   33.424:    85 ##
  33.424 -   58.494:   119 ###
  58.494 -  101.821:   404 ############
 101.821 -  176.701:   839 ########################
 176.701 -  306.112:  1830 #####################################################

=== spam-jarif.log
Count: 1556
Range:  0.252 - 189.368; Mean: 18.291; Median: 12.747; Stddev: 15.556
Percentiles:  90th: 35.101; 95th: 36.848; 99th: 38.096
   0.252 -    1.069:    72 #####
   1.069 -    2.420:    94 #######
   2.420 -    4.652:   203 ###############
   4.652 -    8.341:   344 #########################
   8.341 -   14.438:    94 #######
  14.438 -   24.515:    21 ##
  24.515 -   41.169:   725 #####################################################
  41.169 -  189.368:     3 |

=== spam-jbrooks.log
Count: 6039
Range:  0.457 - 59.422; Mean: 13.405; Median: 10.897; Stddev: 10.720
Percentiles:  90th: 34.105; 95th: 34.978; 99th: 36.631
   0.457 -    1.115:   315 ###########
   1.115 -    2.070:   371 #############
   2.070 -    3.455:   188 #######
   3.455 -    5.465:   932 ##################################
   5.465 -    8.384:   852 ###############################
   8.384 -   12.619:   613 ######################
  12.619 -   18.765:  1472 #####################################################
  18.765 -   27.686:   391 ##############
  27.686 -   40.632:   900 ################################
  40.632 -   59.422:     5 |

=== spam-llanga.log
Count: 10805
Range:  0.284 - 78.659; Mean: 45.645; Median: 50.956; Stddev: 18.487
Percentiles:  90th: 66.387; 95th: 69.045; 99th: 77.248
   0.284 -    0.941:    38 |
   0.941 -    1.932:    71 #
   1.932 -    3.431:   108 #
   3.431 -    5.694:   153 ##
   5.694 -    9.115:   264 ###
   9.115 -   14.284:   236 ##
  14.284 -   22.093:   616 ######
  22.093 -   33.892:  1119 ###########
  33.892 -   51.721:  3001 ###############################
  51.721 -   78.659:  5199 #####################################################


Cheers,
Henrik

Re: Please use --after in mass-checks

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2018 at 06:03:13PM +0300, Henrik Krohns wrote:
> On Thu, Sep 27, 2018 at 07:52:14AM -0700, John Hardin wrote:
> > On Thu, 27 Sep 2018, Henrik Krohns wrote:
> > 
> > >
> > >Hello mass checkers,
> > >
> > >Please notice the --after clauses added to automasscheck.cf.
> > >
> > ># Use --after selector for corpus to prevent unnecessary processing.
> > ># Current ruleqa settings: ham 6 years, spam 2 months
> > ># Anything older than that will be ignored by ruleqa regardless.
> > >run_all_masschecks() {
> > > ### sample: single corpus ###
> > > run_masscheck single-corpus \
> > >         --after=-174182400 ham:dir:/path/to/Maildir/.Ham/ \
> > >         --after=-4838400 spam:dir:/path/to/Maildir/.Spam/
> > 
> > What are those values in terms of? delta seconds from now?
> 
> Yep. I figured people don't have parsedate. :-)
> 
> $ ./mass-check --help
> 
>   --after=N     only test mails received after time_t N (negative values
>                 are an offset from current time, e.g. -86400 = last day)
>                 or after date as parsed by Time::ParseDate (e.g. '-6 months')

FYI, the server side values can be found from masses/rule-qa/reports-from-logs

# what's the max age of mail we will accept data from? (in weeks)
# TODO: maybe this should be in ~/.corpus
my $OLDEST_HAM_WEEKS    = 72 * 4;       # 72 months = 6 years
my $OLDEST_SPAM_WEEKS    = 2 * 4;       # 2 months


Re: Please use --after in mass-checks

Posted by Henrik Krohns <he...@hege.li>.
On Thu, Sep 27, 2018 at 07:52:14AM -0700, John Hardin wrote:
> On Thu, 27 Sep 2018, Henrik Krohns wrote:
> 
> >
> >Hello mass checkers,
> >
> >Please notice the --after clauses added to automasscheck.cf.
> >
> ># Use --after selector for corpus to prevent unnecessary processing.
> ># Current ruleqa settings: ham 6 years, spam 2 months
> ># Anything older than that will be ignored by ruleqa regardless.
> >run_all_masschecks() {
> > ### sample: single corpus ###
> > run_masscheck single-corpus \
> >         --after=-174182400 ham:dir:/path/to/Maildir/.Ham/ \
> >         --after=-4838400 spam:dir:/path/to/Maildir/.Spam/
> 
> What are those values in terms of? delta seconds from now?

Yep. I figured people don't have parsedate. :-)

$ ./mass-check --help

  --after=N     only test mails received after time_t N (negative values
                are an offset from current time, e.g. -86400 = last day)
                or after date as parsed by Time::ParseDate (e.g. '-6 months')


Re: Please use --after in mass-checks

Posted by John Hardin <jh...@impsec.org>.
On Thu, 27 Sep 2018, Henrik Krohns wrote:

>
> Hello mass checkers,
>
> Please notice the --after clauses added to automasscheck.cf.
>
> # Use --after selector for corpus to prevent unnecessary processing.
> # Current ruleqa settings: ham 6 years, spam 2 months
> # Anything older than that will be ignored by ruleqa regardless.
> run_all_masschecks() {
>  ### sample: single corpus ###
>  run_masscheck single-corpus \
>          --after=-174182400 ham:dir:/path/to/Maildir/.Ham/ \
>          --after=-4838400 spam:dir:/path/to/Maildir/.Spam/

What are those values in terms of? delta seconds from now?


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   From the Liberty perspective, it doesn't matter if it's a
   jackboot or a Birkenstock smashing your face.         -- Robb Allen
-----------------------------------------------------------------------
  2 days until the 77th anniversary of the massacre at Babi Yar
  Disarmament enables genocide - Registration enables disarmament