You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mauricio Tavares <ra...@gmail.com> on 2013/10/16 18:58:34 UTC

Email in Russian not triggering UNWANTED_LANGUAGE_BODY

)      Email in question is at http://pastie.org/8403863; I put it
there so it would not harm anyone with its HTTP-Posting-URI header.

In my local.cf I have

ok_languages en
ok_locales en
add_header all Languages _LANGUAGES_

And have textcat enabled. Many emails, most recently in Chinese and
Spanish, have been flagged. But the one mentioned above did not
trigger the language check. How come?

Also, why running it manually (sudo -u amavis spamassassin -D <
spamtest3) results in a higher score,

Content analysis details:   (4.9 points, 4.7 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.0 HK_RANDOM_FROM         From username looks random
 0.9 DKIM_ADSP_NXDOMAIN     No valid author signature and domain not in DNS
 0.7 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                            [score: 0.5018]
 1.5 BODY_8BITS             BODY: Body includes 8 consecutive 8-bit characters
 0.6 INVALID_MSGID          Message-Id is not valid, according to RFC 2822
 0.2 BODY_URI_ONLY          Message body is only one line containing a URI

than running it through postfix+amavis+spamassassin?

Re: Email in Russian not triggering UNWANTED_LANGUAGE_BODY

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2013-10-17 at 09:13 -0700, John Hardin wrote:
> On Thu, 17 Oct 2013, Mauricio Tavares wrote:

> > Reading
> > http://www.mail-archive.com/spamassassin-talk@lists.sourceforge.net/msg23962.html
> > I was wondering where those .lm files are (vi.lm, en.lm, etc). I could
> > not find them in the ubuntu box I have spamassassin installed on.

They are in the SA sources lm/ directory and (mostly) taken directly
from the original TextCat program.

> That I don't know, apart from assuming they're somewhere under the SA 
> install directory since they're being shipped as part of a SA plugin.

They aren't. The 'languages' file generated from these *.lm files is
shipped with SA. The source files are only needed for modification of
existing or adding of new languages -- which is what the above archive
link is about.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Email in Russian not triggering UNWANTED_LANGUAGE_BODY

Posted by John Hardin <jh...@impsec.org>.
On Thu, 17 Oct 2013, Mauricio Tavares wrote:

> On Wed, Oct 16, 2013 at 1:44 PM, John Hardin <jh...@impsec.org> wrote:
>> On Wed, 16 Oct 2013, Mauricio Tavares wrote:
>>
>>> )      Email in question is at http://pastie.org/8403863; I put it
>>> there so it would not harm anyone with its HTTP-Posting-URI header.
>>>
>>> In my local.cf I have
>>>
>>> ok_languages en
>>> ok_locales en
>>> add_header all Languages _LANGUAGES_
>>>
>>> And have textcat enabled. Many emails, most recently in Chinese and
>>> Spanish, have been flagged. But the one mentioned above did not
>>> trigger the language check. How come?
>>
>>
>> Perhaps:
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6364
>
>      Quite possibly. Reading
> http://www.mail-archive.com/spamassassin-talk@lists.sourceforge.net/msg23962.html
> I was wondering where those .lm files are (vi.lm, en.lm, etc). I could
> not find them in the ubuntu box I have spamassassin installed on.

That I don't know, apart from assuming they're somewhere under the SA 
install directory since they're being shipped as part of a SA plugin.

Please keep responses on-list so that others can help and can benefit from 
a solution to this problem.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Mine eyes have seen the horror of the voting of the horde;
   They've looted the fromagerie where guv'ment cheese is stored;
   If war's not won before the break they grow so quickly bored;
   Their vote counts as much as yours.                          -- Tam
-----------------------------------------------------------------------
  504 days since the first successful private support mission to ISS (SpaceX)

Re: Email in Russian not triggering UNWANTED_LANGUAGE_BODY

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 Oct 2013, Mauricio Tavares wrote:

> )      Email in question is at http://pastie.org/8403863; I put it
> there so it would not harm anyone with its HTTP-Posting-URI header.
>
> In my local.cf I have
>
> ok_languages en
> ok_locales en
> add_header all Languages _LANGUAGES_
>
> And have textcat enabled. Many emails, most recently in Chinese and
> Spanish, have been flagged. But the one mentioned above did not
> trigger the language check. How come?

Perhaps:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6364


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Where We Want You To Go Today 09/13/07: Microsoft patents in-OS
   adware architecture that incorporates monitoring and analysis of
   user actions and interrupting the user to display apparently
   relevant advertisements (U.S. Patent #20070214042)
-----------------------------------------------------------------------
  503 days since the first successful private support mission to ISS (SpaceX)