You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Cedric Knight, GreenNet" <ce...@gn.apc.org> on 2007/10/24 22:14:39 UTC

Re: Bayes - one database per user or one for everybody?

Hi

I've a possibly related enquiry to an old one below, and would be
grateful for advice or pointers.

We haven't actually *needed* Bayes thanks to greylisting, remote URI
lookups and lots of custom rules.  While a few users are interested in
a filter they can manually train, most wouldn't bother, and most
receive similar types of ham mail, which makes me wonder whether a
single group-writeable database is best, currently
/var/amavis/.spamassassin/bayes, probably without bayes_auto_learn.

However, as some tech-savvy users do want their own Bayes db, one
thought was to use the default user .spamassassin folders but have
symbolic links to the central database for most users.  Is this crazy?
Has anyone tried it?  What are the implications on disk I/O of the
various options, including several GB worth of individual databases?
Is there anything I particularly need to look out for in terms of
performance on the live server?

The basic problem is that AFAIK bayes_path can't be set as a user
preference (global and then overridden by say a user preferences
database), as would be needed to have some users use a communal
database, and some their own.  I can see bayes_sql_override_username
could achieve a similar function, but that necessarily rules out
having DBM databases.  Users here do have their own home directories,
and would have ability to train via sending as MIME attachment, but no
shell access.  I realise as I write this that my wish is even more
difficult because amavis doesn't extract or pass user information to
SA in any case, and it would presumably mean running spamc in
procmailrc...  Is there any way of checking two dbs, one global and
one per-user?

A lot of questions, and any pointers or experience is appreciated.

One further one: are per-user databases important for accuracy of
auto-whitelisting?

Thanks

Ced

On 11 July 2007, Micha³ Jêczalik <mi...@jeczalik.com> wrote:
> Hello,
>
> I'm migrating to SQL Bayes storage method. I have plenty of email
> accounts. By this time, all of them had their own database in their
> home directories. Such approach unfortunately consumes a lot of disk
> space, so now I'm thinking about bayes_sql_override_username option,
> which allows me to have one single database for all.
>
> I wonder if it's better to have a single database (which probably
> could be larger than the size of 8MB per user I allowed with DBM
> storage method) or keep per-user ones?
>
> So, what are the advantages of a single database? And does it make
any
> sense to make it larger? Maybe 8MB of tokens is simply enough and it
> doesn't pay to use more resources to seek in a larger base? Are
there
> any security or privacy problems with this setup?
>
> BTW, users don't have access to their databases, they are unable to
> feed any spam/ham manually, so loosing this ability is not a problem
> for me.
>
> Regards,
> --
> Michal Jeczalik, +48.603.64.62.97