You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kenneth Porter <sh...@sewingwitch.com> on 2013/08/28 08:06:55 UTC

Matching base64 subject

I'm trying to use this set of rules to spot Chinese or Russian characters 
in the subject line:

<http://www.timk.de/it-blog/howto-find-chinese-or-russian-spam-encoded-in-utf-8-with-spamassassin/>

To debug the rules, I've replaced the leading __ in sub-rules with T_.

The rules don't seem to match the base64-encoded UTF8 sequences I'm seeing 
in subject lines.

For example:

X-Spam-Status: No, score=1.7 required=5.0 tests=BAYES_50,
	CHARSET_UTF8_B_SUBJ_LATIN,HTML_FONT_FACE_BAD,HTML_MESSAGE,
	T_CHARSET_SUBJECT_UTF8_B_ENCODED,T_CHARSET_SUBJECT_UTF8_ENCODED 
autolearn=no
	version=3.3.1

Subject: =?utf-8?B?54mp5paZ6K6h5YiS5Y2P6LCDL+iJvueUnw==?=

The first character is 7269 hex, which if the rules are correct should be 
matched by __CHARSET__UTF8_SUBJ_CJK1.

I'm using this to decode the base64 between the question marks to inspect 
the result:

<http://www.opinionatedgeek.com/dotnet/tools/base64decode/>