You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Thomas Arend <ml...@arend-whv.info> on 2005/02/24 11:42:03 UTC

Character Sets in Subject and To/From

Hello,

I got lots of messages with subjects of the form:

Subject: =?utf-8?q?Wholesale Rolex Watc?=
 =?utf-8?q?hes?=

Also mail Addresses use this type of obfuscation.

My Question: How are thes character set changes handled by SpamAssassin rules 
and bayesian filtering.


Best regards

Thomas Arend
-- 
icq:133073900
http://www.t-arend.de

Re: Character Sets in Subject and To/From

Posted by Matt Kettler <mk...@evi-inc.com>.
At 02:33 PM 2/24/2005, Thomas Arend wrote:
>When I understand you right my rolex rule is spoiled by this trick.
>
>Because
>header LOCAL_ENCSUBJECT     Subject: =~ /rolex/i
>
>will not fire on these subjects.

You misunderstood me completely.

That rule should fire on those subject lines just fine.

SA will automatically decode the character sets and then feed the decoded 
text to your rule. You don't need to take any extra action to try to detect 
encoded text. SA handles this for you by default.

SA always decodes unless you change it to "Subject:raw" instead of "Subject:".





Re: Character Sets in Subject and To/From

Posted by Thomas Arend <ml...@arend-whv.info>.
Am Donnerstag, 24. Februar 2005 19:12 schrieb Matt Kettler:
> At 05:42 AM 2/24/2005, Thomas Arend wrote:
> >I got lots of messages with subjects of the form:
> >
> >Subject: =3D?utf-8?q?Wholesale Rolex Watc?=3D
> >  =3D?utf-8?q?hes?=3D
> >
> >Also mail Addresses use this type of obfuscation.
> >
> >My Question: How are thes character set changes handled by SpamAssassin
> >rules and bayesian filtering.
>
> Normal rules and bayes see them after they've been decoded. So as far as
> 90% of SA is concerned, the character set changes aren't there.
>
> Rules that specifically want to detect this stuff can do so by using the
>
> :raw modifier.. i.e.:
>
> header LOCAL_ENCSUBJECT     Subject:raw =~ /\=\?.*\?\=/i
>
> Matches subject lines like:
>
> Subject: =?iso-8859-8?Q?=F2=EC_=E7=EB=EE=FA_=E4=E9=EC=E3?=

When I understand you right my rolex rule is spoiled by this trick.

Because  

header LOCAL_ENCSUBJECT     Subject: =~ /rolex/i

will not fire on these subjects.



Thomas
-- 
icq:133073900
http://www.t-arend.de

Re: Character Sets in Subject and To/From

Posted by Matt Kettler <mk...@evi-inc.com>.
At 05:42 AM 2/24/2005, Thomas Arend wrote:

>I got lots of messages with subjects of the form:
>
>Subject: =3D?utf-8?q?Wholesale Rolex Watc?=3D
>  =3D?utf-8?q?hes?=3D
>
>Also mail Addresses use this type of obfuscation.
>
>My Question: How are thes character set changes handled by SpamAssassin 
>rules and bayesian filtering.

Normal rules and bayes see them after they've been decoded. So as far as 
90% of SA is concerned, the character set changes aren't there.

Rules that specifically want to detect this stuff can do so by using the 
:raw modifier.. i.e.:

header LOCAL_ENCSUBJECT     Subject:raw =~ /\=\?.*\?\=/i

Matches subject lines like:

Subject: =?iso-8859-8?Q?=F2=EC_=E7=EB=EE=FA_=E4=E9=EC=E3?=







Re: Character Sets in Subject and To/From

Posted by Thomas Arend <ml...@arend-whv.info>.
Am Freitag, 25. Februar 2005 02:41 schrieb Robert Menschel:
> Hello Thomas,
>
> Thursday, February 24, 2005, 2:42:03 AM, you wrote:
>
> TA> Hello,
>
> TA> I got lots of messages with subjects of the form:
> TA> Subject: =?utf-8?q?Wholesale Rolex Watc?=
> TA>  =?utf-8?q?hes?=
> TA> Also mail Addresses use this type of obfuscation.
> TA> My Question: How are thes character set changes handled by SpamAssassin
> rules TA> and bayesian filtering.
>
> See the UTF8 rules in
> http://www.rulesemporium.com/rules/70_sare_genlsubj_eng.cf
>
> Haven't found this process useful in addresses yet, but if you'll send
> me an example so I can verify what's going on, I'll run a few tests.
>
> Bob Menschel

I have send you the example by private mail to avoid spoiling the spamfilter 
of others.

Thomas
-- 
icq:133073900
http://www.t-arend.de

Re: Character Sets in Subject and To/From

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Thomas,

Thursday, February 24, 2005, 2:42:03 AM, you wrote:

TA> Hello,

TA> I got lots of messages with subjects of the form:
TA> Subject: =?utf-8?q?Wholesale Rolex Watc?=
TA>  =?utf-8?q?hes?=
TA> Also mail Addresses use this type of obfuscation.
TA> My Question: How are thes character set changes handled by SpamAssassin rules
TA> and bayesian filtering.

See the UTF8 rules in
http://www.rulesemporium.com/rules/70_sare_genlsubj_eng.cf

Haven't found this process useful in addresses yet, but if you'll send
me an example so I can verify what's going on, I'll run a few tests.

Bob Menschel