You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Warren Togami <wt...@redhat.com> on 2009/07/08 05:39:56 UTC

normalize_charset option impact?

> ***************************************************************************
> NOTE: the optional Encode::Detect module is not installed.
>
>   If you plan to use the normalize_charset config setting to detect
>   charsets and convert them into Unicode, you will need to install
>   this module.

What is the performance impact of using normalize_charset?

Why is this not required by default?

Is this necessary to filter mail for Asian languages?

Warren Togami
wtogami@redhat.com

Re: normalize_charset option impact?

Posted by Justin Mason <jm...@jmason.org>.
it allows body rules to be written in UTF8, but still match text
written in other charsets.

This is useful if you want to match the *text*, rather than the actual
*bytes* that are being spammed.  in my opinion though, spammer
patterns can be matched as strings of bytes, since that's how the
spammers send them, so I don't recommend it.

it's probably more useful for people who want to make corporate policy
rules (matching profanity etc.)

--j.

On Wed, Jul 8, 2009 at 04:39, Warren Togami<wt...@redhat.com> wrote:
>>
>> ***************************************************************************
>> NOTE: the optional Encode::Detect module is not installed.
>>
>>  If you plan to use the normalize_charset config setting to detect
>>  charsets and convert them into Unicode, you will need to install
>>  this module.
>
> What is the performance impact of using normalize_charset?
>
> Why is this not required by default?
>
> Is this necessary to filter mail for Asian languages?
>
> Warren Togami
> wtogami@redhat.com
>
>

Re: normalize_charset option impact?

Posted by Mark Martinec <Ma...@ijs.si>.
On Wednesday 08 July 2009 05:39:56 Warren Togami wrote:
> >NOTE: the optional Encode::Detect module is not installed.
> >   If you plan to use the normalize_charset config setting to detect
> >   charsets and convert them into Unicode, you will need to install
> >   this module.
>
> What is the performance impact of using normalize_charset?
>
> Why is this not required by default?

It can be really slow with some rules (by a factor of 30), see:

  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5691

I admit I lost interest in it and just turned it off, as my problem report
was dismissed as invalid two years ago.

> Is this necessary to filter mail for Asian languages?

Don't know.

  Mark