You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/02/28 02:47:38 UTC

Re: AWL bloat-reducer

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Separate program seems like the way to go, but I am very hesitant at
>adding new commands/options to handle expiry rather than just doing it
>all automatically behind the scenes.

BTW, I'm considering maybe we should have a command for running
periodic expire tasks for Bayes and AWL, and other long-running modes
of operation; this would:

  (a) do bayes expires, if needed
  (b) do AWL expires if needed
  (c) other long-runtime tasks that may be suited to "offline" generation,
      e.g. generating trusted_networks caches from a Bayes db dump
      or similar
  (d) possibly downloading frequently-updated data from a central
      server if needed for future rules

something like "sa-cron".

Right now, we just suggest that large-scale bayes users can run
"sa-learn --rebuild" from cron; strikes me that there'll be other jobs
that may need that treatment too.

Or should we just have some kind of inference code to do that stuff from
the engine automatically, like we currently have for bayes?

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAP/M6QTcbUG5Y7woRAmsOAJ4hFpVCmEzsOk66KULRPg4yE1AlzACfft+J
qcySw8/fC8lf8Qeh/vQFt+U=
=ssA2
-----END PGP SIGNATURE-----


Re: AWL bloat-reducer

Posted by Daniel Quinlan <qu...@pathname.com>.
Justin Mason <jm...@jmason.org> writes:

> BTW, I'm considering maybe we should have a command for running
> periodic expire tasks for Bayes and AWL, and other long-running modes
> of operation; this would:

The problem with this is that it goes against the goal of usability.

cron jobs?!?  The only reason we have cron jobs is because we're
software developers, system administrators, etc.  Think like a user who
might struggle through setting up .procmail file.

>   (a) do bayes expires, if needed
>   (b) do AWL expires if needed
>   (c) other long-runtime tasks that may be suited to "offline" generation,
>       e.g. generating trusted_networks caches from a Bayes db dump
>       or similar
>   (d) possibly downloading frequently-updated data from a central
>       server if needed for future rules
>
> something like "sa-cron".

sa-update

At most, I might be able to live with a once-a-month type of program.
Anything that happens more often should not require a separate program,
I think.  No separate program would be better.

> Right now, we just suggest that large-scale bayes users can run
> "sa-learn --rebuild" from cron; strikes me that there'll be other jobs
> that may need that treatment too.

That's suboptimal too.

> Or should we just have some kind of inference code to do that stuff from
> the engine automatically, like we currently have for bayes?

Isn't there some way we do work in smaller amounts?  Argh.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting