You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2008/08/02 18:29:12 UTC

Re: Japanese characters/language blocked by spamassassing + amavisd-new

On Sat, 2008-08-02 at 15:58 +0700, Fuad NAHDI wrote:
> Hi all,
> 
> I have postfix (ver 2.3.3) with mysql, virtual users, amavisd-new, clamav
> and spamassassin (ver 3.2.5), dcc,  pyzor and razor running on  centos
> 5.1.
> Everything works fine but spamassassin + amavisd-new frequently give a
> high score for emails coming from Japan (using Japanese
> character/language).

Sneak preview of the comments below:  Part of the reason Japanese mail
is scored high on your system is, because you trained your Bayes to
believe it is spam, and you are seriously punishing senders from .jp
domains.  But read on.


> Sample X-Spam-Status:
> --------------------
> X-Spam-Status: Yes, score=11.732 tag=x tag2=5 kill=8 tests=[AWL=0.404,
>      BAYES_99=3.5, DBL_12_LETTER_FLDR=0.2, DBL_12_LETTER_PGIMG=0.2,

Your Bayes is trained badly. Use sa-learn to correct it, and learn
Japanese ham as ham.

>      FM_FRM_RN_L_BRACK=2.674, FM_MULTI_ODD2=1.1, FM_WHITEONWHITE=0.45,

Neither these DBL_*, nor the FM_* rules are part of stock SA. With a
notable exception of FM_FRM_RN_L_BRACK.

>      HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HS_INDEX_PARAM=0.001,

The *_EQ_JP rules are not part of stock SA. Given your complaint, you
seriously should not use these.

>      HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001,
>      HTML_NONELEMENT_40_50=0.944, MIME_HTML_ONLY=1.457, SARE_RAND_2=2.5,

Bad sending MUA, composing HTML mail with no text/plain part.

>      SARE_URI_BARGAIN=0.634, SARE_URI_LET_DIG_PIC=1.157,
>      USER_IN_WHITELIST_TO=-6]

SARE_* rules are not part of stock SA.


> Any advices will be apreciated.

Train your Bayes, learn ham mail. Also, drop your AWL database and start
fresh, since it currently maintains an average score of about 12 for
that particular sender.

Get rid of third party rules, if they don't apply to your particular
mail stream. Seriously, reconsider *all* third party rules and review
their performance on *your* mail. This is a problem you created
yourself, not an issue with SA.


Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
stock SA still scores that example at 5.078. Slightly beyond the
threshold.

However, properly learning ham will correct this, and a sane AWL
database will help with future mail, too. If you keep your whitelist,
you'll easily get the score down below 0.

Also, you should consider LARTing the sender to use a proper MUA. Or, if
you run into rules like MIME_HTML_ONLY frequently, adjust the score
locally to better cope with your particular mail.


Now, if someone please could translate Donis reply... ;)

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Japanese characters/language blocked by spamassassing + amavisd-new

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sun, 2008-08-03 at 01:59 +0700, Fuad NAHDI wrote:
> > Train your Bayes, learn ham mail. Also, drop your AWL database and start
> > fresh, since it currently maintains an average score of about 12 for
> > that particular sender.
> >
> > Get rid of third party rules, if they don't apply to your particular
> > mail stream. Seriously, reconsider *all* third party rules and review
> > their performance on *your* mail. This is a problem you created
> > yourself, not an issue with SA.
> >
> >
> > Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
> > stock SA still scores that example at 5.078. Slightly beyond the
> > threshold.
> >
> > However, properly learning ham will correct this, and a sane AWL
> > database will help with future mail, too. If you keep your whitelist,
> > you'll easily get the score down below 0.
> >
> > Also, you should consider LARTing the sender to use a proper MUA. Or, if
> > you run into rules like MIME_HTML_ONLY frequently, adjust the score
> > locally to better cope with your particular mail.
> >
> 
> Hi guenther,
> 
> Yes I know it is my configuration problem and I did not suspect neither SA
> nor amavisd issues.

Sorry, I did not mean to imply that it is your fault alone and clear
stock SA out of the picture. My main point is, though, that lack of
Bayes training and adding third-party rules accounted for a score of 12
alone, while SA accounts for 5. I meant to point out that the lions
share is non-SA rules. And I did admit that a stock SA with no Bayes
training would have resulted in a FP.

The bottom line is, that  (a) proper Bayes training is crucial, and 
(b) third party-rules must not be used without a close look at their
results with respect to your particular mail stream.


> The thing is I don't know how to figure it out.

You got the rules that trigger and their score. Check where these rules
come from, and whether they perform according to their score. Start with
the rules that account for large-ish scores. Remove third-party rules,
that turn out to have a negative impact. Tune individual rules scores if
need be.


> Your answer is seriously very details explanation. Now I understand the
> problem.
> 
> Many thanks for your reply.

No problem, glad it did help. :)


> Jakarta, INDONESIA

Given your country and the fact you got a problem with Japanese language
mail, you might find it particular important to train Japanese mail. If
you get a lot of JP spam, but only a few important hams, this may even
include 'sa-learn --forget' on some JP spam, and still train the ham.
Just a guess, though.

  guenther  -- who should be sleeping by now


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Japanese characters/language blocked by spamassassing + amavisd-new

Posted by Fuad NAHDI <fu...@katalis.web.id>.
> Train your Bayes, learn ham mail. Also, drop your AWL database and start
> fresh, since it currently maintains an average score of about 12 for
> that particular sender.
>
> Get rid of third party rules, if they don't apply to your particular
> mail stream. Seriously, reconsider *all* third party rules and review
> their performance on *your* mail. This is a problem you created
> yourself, not an issue with SA.
>
>
> Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
> stock SA still scores that example at 5.078. Slightly beyond the
> threshold.
>
> However, properly learning ham will correct this, and a sane AWL
> database will help with future mail, too. If you keep your whitelist,
> you'll easily get the score down below 0.
>
> Also, you should consider LARTing the sender to use a proper MUA. Or, if
> you run into rules like MIME_HTML_ONLY frequently, adjust the score
> locally to better cope with your particular mail.
>

Hi guenther,

Yes I know it is my configuration problem and I did not suspect neither SA
nor amavisd issues. The thing is I don't know how to figure it out.
Your answer is seriously very details explanation. Now I understand the
problem.

Many thanks for your reply.


>
> Now, if someone please could translate Donis reply... ;)

;) He suspected the 99_FVGT_meta.cf rule making the high score so he asked
me to replace this rule to 00_FVGT_File001.cf. A good idea so I follow his
recommendation also. Thanks pak Doni.


Fuad NAHDI,
Jakarta, INDONESIA


>
>   guenther
>
>
> --
> char
> *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8?
> c<<=1:
> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
> }}}
>
>