You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Raul Dias <ra...@dias.com.br> on 2007/01/26 19:12:45 UTC

per-user and site-wide bayes databases toghether

Hi,

I would like to have side by side a per-user and a site-wide database.

AFAIK, right now I can have either one or the other.

IMHE, I think that the per-user database is more effective, specially
for HAM, but a side wide one will help improve SPAM detection (lower
false negatives) and improve users with low mail count.

So, is this possible right now? 
(I dont think so, but had to ask.)

I have no problem in writting perl code.  If I have to implement/hack
this, any tips on where to start or how to implement are very welcome.

Any opinions in why to not do this (or to do this) are also welcome.


Raul Dias


RE: per-user and site-wide bayes databases toghether

Posted by Dan Barker <db...@visioncomm.net>.
If "they" say you can't, then this is how you'd do it.<g> (Training would
need to be via scripts, not Autolearn, I imagine)

SpamAssassin uses Bayes via database queries. So, you rename the tables to
something different, and define a view of the same name as the table had
been. It will be called by SA, but will return whatever you want the view to
return. In this case, I'd guess it would be the union of the personal bayes
and the site-wide bayes. You'd need to look into the actual columns to see
if you must sum them for dups, but I imagine that would be pretty trivial
logic.

The only hack I see is to update the sa-learn process to use the correct
(renamed) table names. Views are your friend!

Dan

ps: "they" are the folks who know SpamAssassin. I know squirrel (er, ah, Ess
Que El).

-----Original Message-----
From: Raul Dias [mailto:raul@dias.com.br]
Sent: Friday, January 26, 2007 1:13 PM
To: users@spamassassin.apache.org
Subject: per-user and site-wide bayes databases toghether


Hi,

I would like to have side by side a per-user and a site-wide database.

AFAIK, right now I can have either one or the other.

IMHE, I think that the per-user database is more effective, specially
for HAM, but a side wide one will help improve SPAM detection (lower
false negatives) and improve users with low mail count.

So, is this possible right now?
(I dont think so, but had to ask.)

I have no problem in writting perl code.  If I have to implement/hack
this, any tips on where to start or how to implement are very welcome.

Any opinions in why to not do this (or to do this) are also welcome.


Raul Dias