You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Gaby Vanhegan <ga...@vanhegan.net> on 2004/11/02 15:53:47 UTC
Site-wide bayes database, autolearn address
Hi,
Just upgraded to 3.0.1 running under qmail on OpenBSD and am happy to
report no problems. However, whilst I was doing this, I had a few
ideas. I've had a shufty through the archives for these but I didn't
find an appropriate answer. I have 3 questions:
1. I would like to setup a sitewide bayes database that all mailboxes
will use. This saves having to make every user learn their own spam and
should improve the overall accuracy of the system. Is this particularly
difficult to setup with an SQL backend? What happens if the database is
unavailable? What is the performance hit on the database in these
situations? We see around 20000 messages a day on the server.
2. I would like to setup an automatic email address that people can send
uncaught spam to, which will then be learnt as spam and put into the
bayes database. Has anyone managed to do this? The problem I forsee is
handling the forward as attachment or forward inline that different mail
clients use. Presumably we would need to make people forward them as
attachments, then have a procmail script that handles all mail accordingly.
3. I see entries such as:
autolearn=ham
autolearn=spam
autolearn=unavailable
autolearn=none
In the mail logs. Is there a spam score threshold that triggers the
autolearning behaviour? Is the default sensible? Should it be a little
lower? I see high-scored spam not being learned as such and wonder if
this ought to be tweaked a little.
Gaby
--
Ha! Ha! Ha! Dislocation...
- Phil Ken Sebben
gaby@vanhegan.net
http://vanhegan.net
Re: Site-wide bayes database, autolearn address
Posted by Gaby Vanhegan <ga...@vanhegan.net>.
Keith Hackworth wrote:
> As for 1 and 3, I don't know, but 2, I did myself.
> Actually, the biggest problem you'll run into is that when you forward the
> message, it tinkers with the headers of the message. I found a solution
> to this that doesn't require special scripts to strip the 'false' headers.
Forwarding the email as an attachment may help, but as you say, it will
rip out most of the headers. We do have SquirrelMail installed on our
server though, but not many of our users use that, preferring to pop
from home.
I suppose we could put some instructions up where the user would view
the message source, paste that into web form and that would get piped
directly into sa-learn and then into the SQL bayes database. It's
pernickerty but it would work, and relies on the sitewide SQL database
working.
Gaby
--
Ha! Ha! Ha! Dislocation...
- Phil Ken Sebben
gaby@vanhegan.net
http://vanhegan.net
Re: Site-wide bayes database, autolearn address
Posted by Keith Hackworth <ke...@rpemail.com>.
> Hi,
>
> Just upgraded to 3.0.1 running under qmail on OpenBSD and am happy to
> report no problems. However, whilst I was doing this, I had a few
> ideas. I've had a shufty through the archives for these but I didn't
> find an appropriate answer. I have 3 questions:
>
> 1. I would like to setup a sitewide bayes database that all mailboxes
> will use. This saves having to make every user learn their own spam and
> should improve the overall accuracy of the system. Is this particularly
> difficult to setup with an SQL backend? What happens if the database is
> unavailable? What is the performance hit on the database in these
> situations? We see around 20000 messages a day on the server.
>
> 2. I would like to setup an automatic email address that people can send
> uncaught spam to, which will then be learnt as spam and put into the
> bayes database. Has anyone managed to do this? The problem I forsee is
> handling the forward as attachment or forward inline that different mail
> clients use. Presumably we would need to make people forward them as
> attachments, then have a procmail script that handles all mail
> accordingly.
>
> 3. I see entries such as:
>
> autolearn=ham
> autolearn=spam
> autolearn=unavailable
> autolearn=none
>
> In the mail logs. Is there a spam score threshold that triggers the
> autolearning behaviour? Is the default sensible? Should it be a little
> lower? I see high-scored spam not being learned as such and wonder if
> this ought to be tweaked a little.
>
> Gaby
>
> --
> Ha! Ha! Ha! Dislocation...
> - Phil Ken Sebben
>
> gaby@vanhegan.net
> http://vanhegan.net
>
As for 1 and 3, I don't know, but 2, I did myself.
Actually, the biggest problem you'll run into is that when you forward the
message, it tinkers with the headers of the message. I found a solution
to this that doesn't require special scripts to strip the 'false' headers.
We run SquirrelMail as a webmail front-end to courier-imap. I created a
couple buttons as an extension to the amavis-sa plugins in SquirrelMail.
The buttons are "this is spam" and "this isn't spam". When a user clicks
one of these, it actually moves the message (yes, at the OS level) from
the mbox of the user who is viewing their email to my spam only mailbox.
Fortunately, courier is pretty tolerant to this type of "abuse".
Keith