You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2010/08/04 23:54:03 UTC
Re: Text contained in HTML comments causing BAYES_00 to classify
as non-spam
On Wed, 2010-08-04 at 14:39 -0700, Happy Chap wrote:
> Bowie Bailey wrote:
> > Stupid question here, but are you sure you are training the same
> > database that SA is using?
> >
> > This is a fairly frequent problem. Common cases are:
> >
> > 1) SA being called as 'mailuser' and you are doing manual training on
> > root's database.
> > 2) You are manually training everything to the 'mailuser' database, but
> > SA is actually using per-user databases.
>
> Good question Bowie.
>
> I don't think that's happening. We do have a generic system-wide procmailrc
> but it's first command is for a DROPPRIVS, which I think/thought then runs
> as the specific user and in the procmail recipe a call is then made to spamc
> (although it is called without the -u option because, as I say, I think by
> issuing a DROPPRIVS it's running as that user so -u shouldn't be necessary).
*nod*
> If this doesn't sound right, by all means say - it's quite a while since i
> set all this up!
>
> Training is definitely happening on a per user basis (ie. the script is
> calling sa-learn -u).
So when you confirmed by running sa-learn --dump magic previously, did
you first su to the user in question? The Bayes database does exist in
the user's $HOME/.spamassassin/, right?
Despite running per-user, site-wide Bayes DB still is possible IIRC, if
you e.g. use an SQL backend.
Anyway, since you still get BAYES_00 on these, you really should have a
close look at the tokens Bayes considers most confident. And why. With
some training, it most certainly at least should level up near BAYES_50,
not stay at 00. The tokens should help tell you why.
--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Text contained in HTML comments causing BAYES_00 to classify
as non-spam
Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/4/2010 6:07 PM, Happy Chap wrote:
>
> No, we're not using an SQL backend and every users has their own bayes
> database.
You mentioned previously that you are using 'sa-learn -u'. I thought
that option only worked with SQL databases?
In my setup, I have lots of virtual users under the same UID with
per-user settings and bayes (using spamd's '-x' and
'--virtual-config-dir' options). So when I run sa-learn, I explicitly
specify the database path to make sure it's learning to the right place
(sa-learn --dbpath /path/to/bayes ...).
--
Bowie
Re: Text contained in HTML comments causing BAYES_00 to classify as
non-spam
Posted by Happy Chap <sa...@happychap.plus.com>.
Karsten Bräckelmann-2 wrote:
>
>
> So when you confirmed by running sa-learn --dump magic previously, did
> you first su to the user in question? The Bayes database does exist in
> the user's $HOME/.spamassassin/, right?
>
Yes, I had su'ed to that user and yes, they have their own bayes_seen,
bayes_toks, etc. in $HOME/.spamassassin
Karsten Bräckelmann-2 wrote:
>
>
> Despite running per-user, site-wide Bayes DB still is possible IIRC, if
> you e.g. use an SQL backend.
>
>
No, we're not using an SQL backend and every users has their own bayes
database.
Karsten Bräckelmann-2 wrote:
>
>
> Anyway, since you still get BAYES_00 on these, you really should have a
> close look at the tokens Bayes considers most confident. And why. With
> some training, it most certainly at least should level up near BAYES_50,
> not stay at 00. The tokens should help tell you why.
>
>
OK, will do.
Thanks again for your help Karsten.
David.
--
View this message in context: http://old.nabble.com/Text-contained-in-HTML-comments-causing-BAYES_00-to-classify-as-non-spam-tp29342874p29351738.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.