You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Vincent Li <vi...@gmail.com> on 2006/01/09 22:58:44 UTC

How SpamAssassin recognize chinese character?

Hi list:

I have been using SpamAssassin for quite a while, and used SARE rules 
and other custom rules. I am interested in writing my own chinese spam 
rules to block chinese spam email.

I found chinese rules at  
http://www.ccert.edu.cn/spam/sa/Chinese_rules.cf
and http://www.geewhiz.ca/images/b/b2/99_geewhizg_zh.cf .

My question is how SpamAssassin internally recognize the chinese 
character in perl regex since all the rules are written in chinese 
character (or in GB2312 character set)? why the rules is not written in 
unicode charset like utf-8?

I read a lot about unicode, utf-8, perlunicode, perlre..still could not 
find relevant information.

Thanks in advance!

Vincent

Re: How SpamAssassin recognize chinese character?

Posted by Vincent Li <vi...@gmail.com>.
On 9-Jan-06, at 2:16 PM, Matt Kettler wrote:

> Jon Armitage wrote:
>> Vincent Li wrote:
>>
>>> I have been using SpamAssassin for quite a while, and used SARE rules
>>> and other custom rules. I am interested in writing my own chinese 
>>> spam
>>> rules to block chinese spam email.
>>>
>> I cheat and use an Exim acl statement to reject messages composed in
>> unwanted character sets. However, I don't know which other MTAs would 
>> be
>> able to do this, or even if this blanket approach would suit you.
>
> I really think that Vincent is looking to write rules that detect 
> Chinese spam,
> without affecting his normal Chinese nonspam mail.
>
> Based on tone of his email being based on wanting to look for specific 
> Chinese
> text, and referencing sites that have such rules, I suspect Vincent 
> speaks
> Chinese, and receives nonspam in that language.

Yes, I am Chinese and speaks Chinese :-)

Vincent


Re: How SpamAssassin recognize chinese character?

Posted by Matt Kettler <mk...@evi-inc.com>.
Jon Armitage wrote:
> Vincent Li wrote:
> 
>> I have been using SpamAssassin for quite a while, and used SARE rules
>> and other custom rules. I am interested in writing my own chinese spam
>> rules to block chinese spam email.
>>
> I cheat and use an Exim acl statement to reject messages composed in
> unwanted character sets. However, I don't know which other MTAs would be
> able to do this, or even if this blanket approach would suit you.

I really think that Vincent is looking to write rules that detect Chinese spam,
without affecting his normal Chinese nonspam mail.

Based on tone of his email being based on wanting to look for specific Chinese
text, and referencing sites that have such rules, I suspect Vincent speaks
Chinese, and receives nonspam in that language.


Re: How SpamAssassin recognize chinese character?

Posted by Jeff Peng <da...@hotmail.com>.
If you are using the perl of 5.8.0 or higher,it process the unicode 
characters well.So you should not worry about that Perl how to interpret 
the Chinese character.Just use the rules as normally as english language.


>From: Vincent Li <vi...@gmail.com>
>To: users@spamassassin.apache.org
>Subject: Re: How SpamAssassin recognize chinese character?
>Date: Mon, 9 Jan 2006 14:26:09 -0800
>
>
>On 9 Jan 2006, at 10:08 PM, Jon Armitage wrote:
>
>>Vincent Li wrote:
>>>I have been using SpamAssassin for quite a while, and used SARE  
>>>rules and other custom rules. I am interested in writing my own  
>>>chinese spam rules to block chinese spam email.
>>I cheat and use an Exim acl statement to reject messages composed  
>>in unwanted character sets. However, I don't know which other MTAs  
>>would be able to do this, or even if this blanket approach would  
>>suit you.
>>
>>Jon
>
>Hi Joh:
>
>I am in academic enviroment, we do receive some legitemate chinese  
>email and the chinese rules I downloaded works well. I am just  
>curious how SpamAssassin or Perl interpret the rules written in 
>Chinese?
>
>Vincent

_________________________________________________________________
与世界各地的朋友进行交流,免费下载 MSN Messenger:  
http://messenger.msn.com/cn 


Re: How SpamAssassin recognize chinese character?

Posted by Vincent Li <vi...@gmail.com>.
On 9 Jan 2006, at 10:08 PM, Jon Armitage wrote:

> Vincent Li wrote:
>> I have been using SpamAssassin for quite a while, and used SARE  
>> rules and other custom rules. I am interested in writing my own  
>> chinese spam rules to block chinese spam email.
> I cheat and use an Exim acl statement to reject messages composed  
> in unwanted character sets. However, I don't know which other MTAs  
> would be able to do this, or even if this blanket approach would  
> suit you.
>
> Jon

Hi Joh:

I am in academic enviroment, we do receive some legitemate chinese  
email and the chinese rules I downloaded works well. I am just  
curious how SpamAssassin or Perl interpret the rules written in Chinese?

Vincent

Re: How SpamAssassin recognize chinese character?

Posted by Jon Armitage <jo...@hepworthband.co.uk>.
Vincent Li wrote:
> I have been using SpamAssassin for quite a while, and used SARE rules 
> and other custom rules. I am interested in writing my own chinese spam 
> rules to block chinese spam email.
> 
I cheat and use an Exim acl statement to reject messages composed in 
unwanted character sets. However, I don't know which other MTAs would be 
able to do this, or even if this blanket approach would suit you.

Jon