You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/07/03 21:35:32 UTC

[Bug 6146] New: FPs with Oriental text: TVD_SPACE_RATIO etc.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6146

           Summary: FPs with Oriental text: TVD_SPACE_RATIO etc.
           Product: Spamassassin
           Version: 3.2.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Rules (Eval Tests)
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: cedric@gn.apc.org


The following rules triggered on ham in gb2312 character set:
HTML_FONT_FACE_BAD, MIME_BASE64_TEXT, TVD_SPACE_RATIO

I don't read Chinese myself, but some text/plain parts in such a character set
have reason to be in base64.  Also it seems that in Chinese you rarely use the
space bar, which is sufficient to trigger TVD_SPACE_RATIO.

I also find such email hits Bayes, because all the Chinese email used to train
it has been spam.  Maybe it should be checked whether there is enough Chinese
ham represented in the corpus, and also big5, gb2312 etc parts be excluded from
TVD_SPACE_RATIO and MIME_BASE64_TEXT.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.