You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Henrik K <he...@hege.li> on 2018/09/03 14:29:20 UTC

Masscheck reuse

Hey guys,

I'm wondering why pretty much no masscheck submitter is using --reuse?

I just committed fixes for lots of missing reuse flags, and now I can
actually do a ./mass-check --reuse --net run without ANY dns lookups
launching.  So it's super fast too.

What reason would there be to prefer running without reuse?  Is this simply
a case of missing guidance/documentation?  Looking at some corpus logs,
judging by Maildir file timestamps there are even few years old messages run
through.  How can that make any sense, I wouldn't run anything older than
an hour through DNSBLs.

Of course I understand if someones messages don't have a scantime
X-Spam-Status header for some reason, but even that could be easily fixable
by simply running the messages through a dedicated spamd as soon as possible
to add the headers.

Cheers,
Henrik

Re: Masscheck reuse

Posted by "Kevin A. McGrail" <km...@apache.org>.

Adding a few more to this email question.

On 9/3/2018 10:29 AM, Henrik K wrote:
> Hey guys,
>
> I'm wondering why pretty much no masscheck submitter is using --reuse?
>
> I just committed fixes for lots of missing reuse flags, and now I can
> actually do a ./mass-check --reuse --net run without ANY dns lookups
> launching.  So it's super fast too.
>
> What reason would there be to prefer running without reuse?  Is this simply
> a case of missing guidance/documentation?  Looking at some corpus logs,
> judging by Maildir file timestamps there are even few years old messages run
> through.  How can that make any sense, I wouldn't run anything older than
> an hour through DNSBLs.
>
> Of course I understand if someones messages don't have a scantime
> X-Spam-Status header for some reason, but even that could be easily fixable
> by simply running the messages through a dedicated spamd as soon as possible
> to add the headers.
>
> Cheers,
> Henrik


-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Re: Masscheck reuse

Posted by "Kevin A. McGrail" <km...@apache.org>.

Might want to put this on the wiki too!  Adding SASA group too for their
input.
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Thu, Oct 4, 2018 at 10:28 AM Henrik K <he...@hege.li> wrote:

>
> Still hoping to get some conversation going on about reuse.
>
> Personally I create my corpus like this:
>
> - hacked amavisd-milter to save unmodified message copy to "pristine"
>   directory
>
> - run a separate clean install of trunk SA/spamd that has default rules,
>   razor/pyzor/dcc etc, and only runs all "reuse" flagged rules
>   (my recent trunk commit)
>     --pre "loadplugin Mail::SpamAssassin::Plugin::Reuse"
>     --pre "run_reuse_tests_only 1"
>
> - cron every minute: run messages from "pristine" directory through
>   above spamd to add X-Spam-Status header and move to "corpus"
>
> - a bit later get mailids and resulting ham/spam status from my main
> amavis,
>   and sort out "corpus" to "corpus_ham/spam" (of course with some manual
>   vetting, dspam crosscheck etc)
>
> Since my main setup uses extreme whitelisting and shortcircuiting, this is
> the only way to get 100% legit corpus.  It takes very little resources
> anyway, since that spamd just runs network lookups (which are mostly cached
> already).
>
> Basically I'd like to see masscheckers do something similar.  Doesn't
> matter
> where you source all the corpus, it is possible to clean them up to
> "pristine status" and run ASAP though spamd setup like above.  That way
> they
> have legit X-Spam-Status header that can be reused even years later.
>
> Of course if your corpus already has X-Spam-Status from mail receive time
> (and all possible plugins and checks are enabled), then it's simply the
> case
> of enabling reuse.  But shortcircuited messages should be skipped.
>
> I also recently added REUSE config here:
>
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/
>
>
>
>
> On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> >
> > If you look at the ancient mass-check code before Reuse.pm was split from
> > it, it shows the original intention:
> >
> >
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> >
> > # --reuse without --net means we need to just zero ALL net rules; skip
> net
> > # lookups entirely except for the reused ones.
> > (then it proceeds to zero scores for all "tflags net" rules)
> >
> > Ok I'm not even sure why it's talking about --reuse withOUT --net, since
> the
> > point here is to do separate scoresets with and without network checks?
> One
> > would simply run local checks only, or --reuse --net.
> >
> > If everyone used reuse, would there even be need for "weekly" masschecks
> as
> > every day simply included the network checks!?  If you ask me, without
> > --reuse one would be only allowed to submit "nightly" masschecks (no
> --net).
> >
> > Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes
> scores
> > for those.  So it is important to remember to use "reuse XXX" for any net
> > rules, since it doesn't automatically iterate through them anymore!
> Which
> > in my mind is silly, why not simply iterate again through "tflags net"
> and
> > forget "reuse" stanza completely.
> >
> > Cheers,
> > Henrik
> >
> >
> >
> >
> > On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > >
> > > Hey guys,
> > >
> > > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > >
> > > I just committed fixes for lots of missing reuse flags, and now I can
> > > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > > launching.  So it's super fast too.
> > >
> > > What reason would there be to prefer running without reuse?  Is this
> simply
> > > a case of missing guidance/documentation?  Looking at some corpus logs,
> > > judging by Maildir file timestamps there are even few years old
> messages run
> > > through.  How can that make any sense, I wouldn't run anything older
> than
> > > an hour through DNSBLs.
> > >
> > > Of course I understand if someones messages don't have a scantime
> > > X-Spam-Status header for some reason, but even that could be easily
> fixable
> > > by simply running the messages through a dedicated spamd as soon as
> possible
> > > to add the headers.
> > >
> > > Cheers,
> > > Henrik
>

Re: Masscheck reuse

Posted by "Kevin A. McGrail" <km...@apache.org>.

Might want to put this on the wiki too!  Adding SASA group too for their
input.
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Thu, Oct 4, 2018 at 10:28 AM Henrik K <he...@hege.li> wrote:

>
> Still hoping to get some conversation going on about reuse.
>
> Personally I create my corpus like this:
>
> - hacked amavisd-milter to save unmodified message copy to "pristine"
>   directory
>
> - run a separate clean install of trunk SA/spamd that has default rules,
>   razor/pyzor/dcc etc, and only runs all "reuse" flagged rules
>   (my recent trunk commit)
>     --pre "loadplugin Mail::SpamAssassin::Plugin::Reuse"
>     --pre "run_reuse_tests_only 1"
>
> - cron every minute: run messages from "pristine" directory through
>   above spamd to add X-Spam-Status header and move to "corpus"
>
> - a bit later get mailids and resulting ham/spam status from my main
> amavis,
>   and sort out "corpus" to "corpus_ham/spam" (of course with some manual
>   vetting, dspam crosscheck etc)
>
> Since my main setup uses extreme whitelisting and shortcircuiting, this is
> the only way to get 100% legit corpus.  It takes very little resources
> anyway, since that spamd just runs network lookups (which are mostly cached
> already).
>
> Basically I'd like to see masscheckers do something similar.  Doesn't
> matter
> where you source all the corpus, it is possible to clean them up to
> "pristine status" and run ASAP though spamd setup like above.  That way
> they
> have legit X-Spam-Status header that can be reused even years later.
>
> Of course if your corpus already has X-Spam-Status from mail receive time
> (and all possible plugins and checks are enabled), then it's simply the
> case
> of enabling reuse.  But shortcircuited messages should be skipped.
>
> I also recently added REUSE config here:
>
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/
>
>
>
>
> On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> >
> > If you look at the ancient mass-check code before Reuse.pm was split from
> > it, it shows the original intention:
> >
> >
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> >
> > # --reuse without --net means we need to just zero ALL net rules; skip
> net
> > # lookups entirely except for the reused ones.
> > (then it proceeds to zero scores for all "tflags net" rules)
> >
> > Ok I'm not even sure why it's talking about --reuse withOUT --net, since
> the
> > point here is to do separate scoresets with and without network checks?
> One
> > would simply run local checks only, or --reuse --net.
> >
> > If everyone used reuse, would there even be need for "weekly" masschecks
> as
> > every day simply included the network checks!?  If you ask me, without
> > --reuse one would be only allowed to submit "nightly" masschecks (no
> --net).
> >
> > Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes
> scores
> > for those.  So it is important to remember to use "reuse XXX" for any net
> > rules, since it doesn't automatically iterate through them anymore!
> Which
> > in my mind is silly, why not simply iterate again through "tflags net"
> and
> > forget "reuse" stanza completely.
> >
> > Cheers,
> > Henrik
> >
> >
> >
> >
> > On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > >
> > > Hey guys,
> > >
> > > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > >
> > > I just committed fixes for lots of missing reuse flags, and now I can
> > > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > > launching.  So it's super fast too.
> > >
> > > What reason would there be to prefer running without reuse?  Is this
> simply
> > > a case of missing guidance/documentation?  Looking at some corpus logs,
> > > judging by Maildir file timestamps there are even few years old
> messages run
> > > through.  How can that make any sense, I wouldn't run anything older
> than
> > > an hour through DNSBLs.
> > >
> > > Of course I understand if someones messages don't have a scantime
> > > X-Spam-Status header for some reason, but even that could be easily
> fixable
> > > by simply running the messages through a dedicated spamd as soon as
> possible
> > > to add the headers.
> > >
> > > Cheers,
> > > Henrik
>

Re: Masscheck reuse

Posted by Henrik K <he...@hege.li>.

Still hoping to get some conversation going on about reuse.

Personally I create my corpus like this:

- hacked amavisd-milter to save unmodified message copy to "pristine"
  directory

- run a separate clean install of trunk SA/spamd that has default rules,
  razor/pyzor/dcc etc, and only runs all "reuse" flagged rules
  (my recent trunk commit)
    --pre "loadplugin Mail::SpamAssassin::Plugin::Reuse"
    --pre "run_reuse_tests_only 1"

- cron every minute: run messages from "pristine" directory through
  above spamd to add X-Spam-Status header and move to "corpus"

- a bit later get mailids and resulting ham/spam status from my main amavis,
  and sort out "corpus" to "corpus_ham/spam" (of course with some manual
  vetting, dspam crosscheck etc)
  
Since my main setup uses extreme whitelisting and shortcircuiting, this is
the only way to get 100% legit corpus.  It takes very little resources
anyway, since that spamd just runs network lookups (which are mostly cached
already).

Basically I'd like to see masscheckers do something similar.  Doesn't matter
where you source all the corpus, it is possible to clean them up to
"pristine status" and run ASAP though spamd setup like above.  That way they
have legit X-Spam-Status header that can be reused even years later.

Of course if your corpus already has X-Spam-Status from mail receive time
(and all possible plugins and checks are enabled), then it's simply the case
of enabling reuse.  But shortcircuited messages should be skipped.

I also recently added REUSE config here:
http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/




On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> 
> If you look at the ancient mass-check code before Reuse.pm was split from
> it, it shows the original intention:
> 
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> 
> # --reuse without --net means we need to just zero ALL net rules; skip net
> # lookups entirely except for the reused ones.
> (then it proceeds to zero scores for all "tflags net" rules)
> 
> Ok I'm not even sure why it's talking about --reuse withOUT --net, since the
> point here is to do separate scoresets with and without network checks?  One
> would simply run local checks only, or --reuse --net.
> 
> If everyone used reuse, would there even be need for "weekly" masschecks as
> every day simply included the network checks!?  If you ask me, without
> --reuse one would be only allowed to submit "nightly" masschecks (no --net).
> 
> Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes scores
> for those.  So it is important to remember to use "reuse XXX" for any net
> rules, since it doesn't automatically iterate through them anymore!  Which
> in my mind is silly, why not simply iterate again through "tflags net" and
> forget "reuse" stanza completely.
> 
> Cheers,
> Henrik
> 
> 
> 
> 
> On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > 
> > Hey guys,
> > 
> > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > 
> > I just committed fixes for lots of missing reuse flags, and now I can
> > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > launching.  So it's super fast too.
> > 
> > What reason would there be to prefer running without reuse?  Is this simply
> > a case of missing guidance/documentation?  Looking at some corpus logs,
> > judging by Maildir file timestamps there are even few years old messages run
> > through.  How can that make any sense, I wouldn't run anything older than
> > an hour through DNSBLs.
> > 
> > Of course I understand if someones messages don't have a scantime
> > X-Spam-Status header for some reason, but even that could be easily fixable
> > by simply running the messages through a dedicated spamd as soon as possible
> > to add the headers.
> > 
> > Cheers,
> > Henrik

Re: Masscheck reuse

Posted by Henrik Krohns <he...@hege.li>.

Crossposting to ruleqa (just subbed!), feel free to continue there. :-)

On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> 
> If you look at the ancient mass-check code before Reuse.pm was split from
> it, it shows the original intention:
> 
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> 
> # --reuse without --net means we need to just zero ALL net rules; skip net
> # lookups entirely except for the reused ones.
> (then it proceeds to zero scores for all "tflags net" rules)
> 
> Ok I'm not even sure why it's talking about --reuse withOUT --net, since the
> point here is to do separate scoresets with and without network checks?  One
> would simply run local checks only, or --reuse --net.
> 
> If everyone used reuse, would there even be need for "weekly" masschecks as
> every day simply included the network checks!?  If you ask me, without
> --reuse one would be only allowed to submit "nightly" masschecks (no --net).
> 
> Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes scores
> for those.  So it is important to remember to use "reuse XXX" for any net
> rules, since it doesn't automatically iterate through them anymore!  Which
> in my mind is silly, why not simply iterate again through "tflags net" and
> forget "reuse" stanza completely.
> 
> Cheers,
> Henrik
> 
> 
> 
> 
> On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > 
> > Hey guys,
> > 
> > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > 
> > I just committed fixes for lots of missing reuse flags, and now I can
> > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > launching.  So it's super fast too.
> > 
> > What reason would there be to prefer running without reuse?  Is this simply
> > a case of missing guidance/documentation?  Looking at some corpus logs,
> > judging by Maildir file timestamps there are even few years old messages run
> > through.  How can that make any sense, I wouldn't run anything older than
> > an hour through DNSBLs.
> > 
> > Of course I understand if someones messages don't have a scantime
> > X-Spam-Status header for some reason, but even that could be easily fixable
> > by simply running the messages through a dedicated spamd as soon as possible
> > to add the headers.
> > 
> > Cheers,
> > Henrik

Re: Masscheck reuse

Posted by Henrik Krohns <he...@hege.li>.

Crossposting to ruleqa (just subbed!), feel free to continue there. :-)

On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> 
> If you look at the ancient mass-check code before Reuse.pm was split from
> it, it shows the original intention:
> 
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> 
> # --reuse without --net means we need to just zero ALL net rules; skip net
> # lookups entirely except for the reused ones.
> (then it proceeds to zero scores for all "tflags net" rules)
> 
> Ok I'm not even sure why it's talking about --reuse withOUT --net, since the
> point here is to do separate scoresets with and without network checks?  One
> would simply run local checks only, or --reuse --net.
> 
> If everyone used reuse, would there even be need for "weekly" masschecks as
> every day simply included the network checks!?  If you ask me, without
> --reuse one would be only allowed to submit "nightly" masschecks (no --net).
> 
> Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes scores
> for those.  So it is important to remember to use "reuse XXX" for any net
> rules, since it doesn't automatically iterate through them anymore!  Which
> in my mind is silly, why not simply iterate again through "tflags net" and
> forget "reuse" stanza completely.
> 
> Cheers,
> Henrik
> 
> 
> 
> 
> On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > 
> > Hey guys,
> > 
> > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > 
> > I just committed fixes for lots of missing reuse flags, and now I can
> > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > launching.  So it's super fast too.
> > 
> > What reason would there be to prefer running without reuse?  Is this simply
> > a case of missing guidance/documentation?  Looking at some corpus logs,
> > judging by Maildir file timestamps there are even few years old messages run
> > through.  How can that make any sense, I wouldn't run anything older than
> > an hour through DNSBLs.
> > 
> > Of course I understand if someones messages don't have a scantime
> > X-Spam-Status header for some reason, but even that could be easily fixable
> > by simply running the messages through a dedicated spamd as soon as possible
> > to add the headers.
> > 
> > Cheers,
> > Henrik

Re: Masscheck reuse

Posted by Henrik Krohns <he...@hege.li>.

Crossposting to ruleqa (just subbed!), feel free to continue there. :-)

On Mon, Sep 03, 2018 at 05:55:05PM +0300, Henrik Krohns wrote:
> 
> If you look at the ancient mass-check code before Reuse.pm was split from
> it, it shows the original intention:
> 
> http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup
> 
> # --reuse without --net means we need to just zero ALL net rules; skip net
> # lookups entirely except for the reused ones.
> (then it proceeds to zero scores for all "tflags net" rules)
> 
> Ok I'm not even sure why it's talking about --reuse withOUT --net, since the
> point here is to do separate scoresets with and without network checks?  One
> would simply run local checks only, or --reuse --net.
> 
> If everyone used reuse, would there even be need for "weekly" masschecks as
> every day simply included the network checks!?  If you ask me, without
> --reuse one would be only allowed to submit "nightly" masschecks (no --net).
> 
> Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes scores
> for those.  So it is important to remember to use "reuse XXX" for any net
> rules, since it doesn't automatically iterate through them anymore!  Which
> in my mind is silly, why not simply iterate again through "tflags net" and
> forget "reuse" stanza completely.
> 
> Cheers,
> Henrik
> 
> 
> 
> 
> On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> > 
> > Hey guys,
> > 
> > I'm wondering why pretty much no masscheck submitter is using --reuse?
> > 
> > I just committed fixes for lots of missing reuse flags, and now I can
> > actually do a ./mass-check --reuse --net run without ANY dns lookups
> > launching.  So it's super fast too.
> > 
> > What reason would there be to prefer running without reuse?  Is this simply
> > a case of missing guidance/documentation?  Looking at some corpus logs,
> > judging by Maildir file timestamps there are even few years old messages run
> > through.  How can that make any sense, I wouldn't run anything older than
> > an hour through DNSBLs.
> > 
> > Of course I understand if someones messages don't have a scantime
> > X-Spam-Status header for some reason, but even that could be easily fixable
> > by simply running the messages through a dedicated spamd as soon as possible
> > to add the headers.
> > 
> > Cheers,
> > Henrik

Re: Masscheck reuse

Posted by Henrik Krohns <he...@hege.li>.

If you look at the ancient mass-check code before Reuse.pm was split from
it, it shows the original intention:

http://svn.apache.org/viewvc/spamassassin/trunk/masses/mass-check?revision=721962&view=markup

# --reuse without --net means we need to just zero ALL net rules; skip net
# lookups entirely except for the reused ones.
(then it proceeds to zero scores for all "tflags net" rules)

Ok I'm not even sure why it's talking about --reuse withOUT --net, since the
point here is to do separate scoresets with and without network checks?  One
would simply run local checks only, or --reuse --net.

If everyone used reuse, would there even be need for "weekly" masschecks as
every day simply included the network checks!?  If you ask me, without
--reuse one would be only allowed to submit "nightly" masschecks (no --net).

Current Reuse.pm simply reads "reuse XXX" config clauses, and zeroes scores
for those.  So it is important to remember to use "reuse XXX" for any net
rules, since it doesn't automatically iterate through them anymore!  Which
in my mind is silly, why not simply iterate again through "tflags net" and
forget "reuse" stanza completely.

Cheers,
Henrik

On Mon, Sep 03, 2018 at 05:29:20PM +0300, Henrik K wrote:
> 
> Hey guys,
> 
> I'm wondering why pretty much no masscheck submitter is using --reuse?
> 
> I just committed fixes for lots of missing reuse flags, and now I can
> actually do a ./mass-check --reuse --net run without ANY dns lookups
> launching.  So it's super fast too.
> 
> What reason would there be to prefer running without reuse?  Is this simply
> a case of missing guidance/documentation?  Looking at some corpus logs,
> judging by Maildir file timestamps there are even few years old messages run
> through.  How can that make any sense, I wouldn't run anything older than
> an hour through DNSBLs.
> 
> Of course I understand if someones messages don't have a scantime
> X-Spam-Status header for some reason, but even that could be easily fixable
> by simply running the messages through a dedicated spamd as soon as possible
> to add the headers.
> 
> Cheers,
> Henrik

Re: Masscheck reuse

Posted by "Kevin A. McGrail" <km...@apache.org>.

Adding a few more to this email question.

On 9/3/2018 10:29 AM, Henrik K wrote:
> Hey guys,
>
> I'm wondering why pretty much no masscheck submitter is using --reuse?
>
> I just committed fixes for lots of missing reuse flags, and now I can
> actually do a ./mass-check --reuse --net run without ANY dns lookups
> launching.  So it's super fast too.
>
> What reason would there be to prefer running without reuse?  Is this simply
> a case of missing guidance/documentation?  Looking at some corpus logs,
> judging by Maildir file timestamps there are even few years old messages run
> through.  How can that make any sense, I wouldn't run anything older than
> an hour through DNSBLs.
>
> Of course I understand if someones messages don't have a scantime
> X-Spam-Status header for some reason, but even that could be easily fixable
> by simply running the messages through a dedicated spamd as soon as possible
> to add the headers.
>
> Cheers,
> Henrik


-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Re: Masscheck reuse

Posted by "Kevin A. McGrail" <km...@apache.org>.

Adding a few more to this email question.

On 9/3/2018 10:29 AM, Henrik K wrote:
> Hey guys,
>
> I'm wondering why pretty much no masscheck submitter is using --reuse?
>
> I just committed fixes for lots of missing reuse flags, and now I can
> actually do a ./mass-check --reuse --net run without ANY dns lookups
> launching.  So it's super fast too.
>
> What reason would there be to prefer running without reuse?  Is this simply
> a case of missing guidance/documentation?  Looking at some corpus logs,
> judging by Maildir file timestamps there are even few years old messages run
> through.  How can that make any sense, I wouldn't run anything older than
> an hour through DNSBLs.
>
> Of course I understand if someones messages don't have a scantime
> X-Spam-Status header for some reason, but even that could be easily fixable
> by simply running the messages through a dedicated spamd as soon as possible
> to add the headers.
>
> Cheers,
> Henrik


-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171