You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2008/08/03 05:29:41 UTC

Re: Japanese characters/language blocked by spamassassing + amavisd-new

On Sun, 2008-08-03 at 01:59 +0700, Fuad NAHDI wrote:
> > Train your Bayes, learn ham mail. Also, drop your AWL database and start
> > fresh, since it currently maintains an average score of about 12 for
> > that particular sender.
> >
> > Get rid of third party rules, if they don't apply to your particular
> > mail stream. Seriously, reconsider *all* third party rules and review
> > their performance on *your* mail. This is a problem you created
> > yourself, not an issue with SA.
> >
> >
> > Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
> > stock SA still scores that example at 5.078. Slightly beyond the
> > threshold.
> >
> > However, properly learning ham will correct this, and a sane AWL
> > database will help with future mail, too. If you keep your whitelist,
> > you'll easily get the score down below 0.
> >
> > Also, you should consider LARTing the sender to use a proper MUA. Or, if
> > you run into rules like MIME_HTML_ONLY frequently, adjust the score
> > locally to better cope with your particular mail.
> >
> 
> Hi guenther,
> 
> Yes I know it is my configuration problem and I did not suspect neither SA
> nor amavisd issues.

Sorry, I did not mean to imply that it is your fault alone and clear
stock SA out of the picture. My main point is, though, that lack of
Bayes training and adding third-party rules accounted for a score of 12
alone, while SA accounts for 5. I meant to point out that the lions
share is non-SA rules. And I did admit that a stock SA with no Bayes
training would have resulted in a FP.

The bottom line is, that  (a) proper Bayes training is crucial, and 
(b) third party-rules must not be used without a close look at their
results with respect to your particular mail stream.


> The thing is I don't know how to figure it out.

You got the rules that trigger and their score. Check where these rules
come from, and whether they perform according to their score. Start with
the rules that account for large-ish scores. Remove third-party rules,
that turn out to have a negative impact. Tune individual rules scores if
need be.


> Your answer is seriously very details explanation. Now I understand the
> problem.
> 
> Many thanks for your reply.

No problem, glad it did help. :)


> Jakarta, INDONESIA

Given your country and the fact you got a problem with Japanese language
mail, you might find it particular important to train Japanese mail. If
you get a lot of JP spam, but only a few important hams, this may even
include 'sa-learn --forget' on some JP spam, and still train the ham.
Just a guess, though.

  guenther  -- who should be sleeping by now


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}