You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2010/08/04 23:54:03 UTC

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

On Wed, 2010-08-04 at 14:39 -0700, Happy Chap wrote:
> Bowie Bailey wrote:

> > Stupid question here, but are you sure you are training the same
> > database that SA is using?
> > 
> > This is a fairly frequent problem.  Common cases are:
> > 
> > 1) SA being called as 'mailuser' and you are doing manual training on
> > root's database.
> > 2) You are manually training everything to the 'mailuser' database, but
> > SA is actually using per-user databases.
> 
> Good question Bowie. 
> 
> I don't think that's happening. We do have a generic system-wide procmailrc
> but it's first command is for a DROPPRIVS, which I think/thought then runs
> as the specific user and in the procmail recipe a call is then made to spamc
> (although it is called without the -u option because, as I say, I think by
> issuing a DROPPRIVS it's running as that user so -u shouldn't be necessary).

*nod*

> If this doesn't sound right, by all means say - it's quite a while since i
> set all this up!
> 
> Training is definitely happening on a per user basis (ie. the script is
> calling sa-learn -u).

So when you confirmed by running sa-learn --dump magic previously, did
you first su to the user in question? The Bayes database does exist in
the user's $HOME/.spamassassin/, right?

Despite running per-user, site-wide Bayes DB still is possible IIRC, if
you e.g. use an SQL backend.


Anyway, since you still get BAYES_00 on these, you really should have a
close look at the tokens Bayes considers most confident. And why. With
some training, it most certainly at least should level up near BAYES_50,
not stay at 00. The tokens should help tell you why.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 8/4/2010 6:07 PM, Happy Chap wrote:
>
> No, we're not using an SQL backend and every users has their own bayes
> database.

You mentioned previously that you are using 'sa-learn -u'.  I thought
that option only worked with SQL databases?

In my setup, I have lots of virtual users under the same UID with
per-user settings and bayes (using spamd's '-x' and
'--virtual-config-dir' options).  So when I run sa-learn, I explicitly
specify the database path to make sure it's learning to the right place
(sa-learn --dbpath /path/to/bayes ...).

-- 
Bowie

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

Posted by Happy Chap <sa...@happychap.plus.com>.


Karsten Bräckelmann-2 wrote:
> 
> 
> So when you confirmed by running sa-learn --dump magic previously, did
> you first su to the user in question? The Bayes database does exist in
> the user's $HOME/.spamassassin/, right?
>  

Yes, I had su'ed to that user and yes, they have their own bayes_seen,
bayes_toks, etc. in $HOME/.spamassassin


Karsten Bräckelmann-2 wrote:
> 
> 
> Despite running per-user, site-wide Bayes DB still is possible IIRC, if
> you e.g. use an SQL backend.
> 
> 

No, we're not using an SQL backend and every users has their own bayes
database.


Karsten Bräckelmann-2 wrote:
> 
> 
> Anyway, since you still get BAYES_00 on these, you really should have a
> close look at the tokens Bayes considers most confident. And why. With
> some training, it most certainly at least should level up near BAYES_50,
> not stay at 00. The tokens should help tell you why.
> 
> 

OK, will do.

Thanks again for your help Karsten.

David.
-- 
View this message in context: http://old.nabble.com/Text-contained-in-HTML-comments-causing-BAYES_00-to-classify-as-non-spam-tp29342874p29351738.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.