You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/02/28 02:47:38 UTC
Re: AWL bloat-reducer
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>Separate program seems like the way to go, but I am very hesitant at
>adding new commands/options to handle expiry rather than just doing it
>all automatically behind the scenes.
BTW, I'm considering maybe we should have a command for running
periodic expire tasks for Bayes and AWL, and other long-running modes
of operation; this would:
(a) do bayes expires, if needed
(b) do AWL expires if needed
(c) other long-runtime tasks that may be suited to "offline" generation,
e.g. generating trusted_networks caches from a Bayes db dump
or similar
(d) possibly downloading frequently-updated data from a central
server if needed for future rules
something like "sa-cron".
Right now, we just suggest that large-scale bayes users can run
"sa-learn --rebuild" from cron; strikes me that there'll be other jobs
that may need that treatment too.
Or should we just have some kind of inference code to do that stuff from
the engine automatically, like we currently have for bayes?
- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFAP/M6QTcbUG5Y7woRAmsOAJ4hFpVCmEzsOk66KULRPg4yE1AlzACfft+J
qcySw8/fC8lf8Qeh/vQFt+U=
=ssA2
-----END PGP SIGNATURE-----
Re: AWL bloat-reducer
Posted by Daniel Quinlan <qu...@pathname.com>.
Justin Mason <jm...@jmason.org> writes:
> BTW, I'm considering maybe we should have a command for running
> periodic expire tasks for Bayes and AWL, and other long-running modes
> of operation; this would:
The problem with this is that it goes against the goal of usability.
cron jobs?!? The only reason we have cron jobs is because we're
software developers, system administrators, etc. Think like a user who
might struggle through setting up .procmail file.
> (a) do bayes expires, if needed
> (b) do AWL expires if needed
> (c) other long-runtime tasks that may be suited to "offline" generation,
> e.g. generating trusted_networks caches from a Bayes db dump
> or similar
> (d) possibly downloading frequently-updated data from a central
> server if needed for future rules
>
> something like "sa-cron".
sa-update
At most, I might be able to live with a once-a-month type of program.
Anything that happens more often should not require a separate program,
I think. No separate program would be better.
> Right now, we just suggest that large-scale bayes users can run
> "sa-learn --rebuild" from cron; strikes me that there'll be other jobs
> that may need that treatment too.
That's suboptimal too.
> Or should we just have some kind of inference code to do that stuff from
> the engine automatically, like we currently have for bayes?
Isn't there some way we do work in smaller amounts? Argh.
Daniel
--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting