You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by re...@newsguy.com on 2007/11/26 19:21:07 UTC

Where can read about SA's FARAWAY tags

Where can I read about how SA arrives at the FARAWAY tags.

Its not always from charset headers or two letter country indicators
in header address's.  At least I think I'm seeing some mail that
contains neither but SA has still correctly identified them.


Re: Where can read about SA's FARAWAY tags

Posted by Matt Kettler <mk...@verizon.net>.
reader@newsguy.com wrote:
> Where can I read about how SA arrives at the FARAWAY tags.
>
> Its not always from charset headers or two letter country indicators
> in header address's.  At least I think I'm seeing some mail that
> contains neither but SA has still correctly identified them.
>   
Well, it's got to be there somewhere. Be aware that often messages are
encoded in ways that hide this kind of thing from you (ie: base64, or
your standard html-esq character encoding). SpamAssassin decodes all of
this kind of stuff as a matter of course when scanning messages.

As for the rules:

CHARSET_FARAWAY:
Underlying eval function: check_for_faraway_charset() in MIMEEval.pm
Detects based on: character set in the mime Content-Type: of the message
header

MIME_CHARSET_FARAWAY
Underlying eval function: check_for_mime('mime_faraway_charset') in
MIMEEval.pm
Detects based on: character set in the mime Content-Type: of the message
attachments

HTML_CHARSET_FARAWAY
Underlying eval function: html_charset_faraway() in HTMLEval.pm
Detects based on: character set in the Content-Type: of a meta
http-equiv tag embedded in HTML.

CHARSET_FARAWAY_HEADER
check_for_faraway_charset_in_headers()
Detects based on: Embedded charachter encoding marks in the Subject and
From: headers. You'd have to look at the raw message source to see it,
but it's generally things like this somewhere in the header:

=?GB2312?

Which indicates encoded simplified Chinese text follows.




>
>