You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Johnson, Robert F" <ro...@intel.com> on 2005/04/13 00:11:19 UTC

Rules to identify simplified and traditional chinese character sets

I have a requirement for a rule that will identify emails using either
traditional or simplified Chinese character sets. 

 I was able to create a rule that finds these codes in the Internet
headers but I have noticed that some emails have the char set identified
in the mime header and not the Internet header.  

This code fragment illustrates how I do this for Internet headers:

		header   CHINESE_WL_1     Content-Type =~ /gb2312/i
		describe CHINESE_WL_1     White list Simplified Chinese

Does anyone no how to create a rule to detect these codes in a mime
header?



Re: Rules to identify simplified and traditional chinese character sets

Posted by Loren Wilton <lw...@earthlink.net>.
> This code fragment illustrates how I do this for Internet headers:
>
> header   CHINESE_WL_1     Content-Type =~ /gb2312/i
> describe CHINESE_WL_1     White list Simplified Chinese
>
> Does anyone no how to create a rule to detect these codes in a mime
> header?

There was talk on the dev list a while back of being able to test the items
in MIME headers.  I'm not clear on whether anything ever came of that.

In any case you can run a 'full' to look for the headers and find them.
Perhaps something like (untested):

full CHINESE_xxx /^Content-Type:\s+gb2312\b/im

        Loren