You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by wolfgang <me...@gmx.net> on 2005/04/30 13:27:39 UTC
character set / encoding problem?
Again and again, we receive messages that contain stuff like
<a href=3d"http://advinc-ma=2enetfirms=2ecom/">
instead of
<a href="http://advinc-ma.netfirms.com/">
That prevents uri / body rules like e.g.
/netfirms\.com/
and URIBL rules from being triggered. I wonder if there is some "function" to
automatically "de-code" such items instead of having to use stuff like
/netfirms(?:\.|=2e)com/ and how i could use it with SA.
regards,
wolfgang
Re: character set / encoding problem?
Posted by Fred <sp...@freddyt.com>.
wolfgang wrote:
> In an older episode (Saturday 30 April 2005 14:45), Theo Van Dinter
> wrote:
>> "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc...
>> SA handles "proper" encoding (it handles a lot of non-proper encoding
>> as well), but doesn't make guesses if the MIME part says there is no
>> encoding.
I remember a discussion a while back about this, =2e is invalid while =2E is
valid.
But then I searched and found this:
Rule #1: (General 8-bit representation) Any octet, except those
indicating a line break according to the newline convention of the
canonical (standard) form of the data being encoded, may be
represented by an "=" followed by a two digit hexadecimal
representation of the octet's value. The digits of the
hexadecimal alphabet, for this purpose, are "0123456789ABCDEF".
Uppercase letters must be used when sending hexadecimal data,
though a robust implementation may choose to recognize lowercase
letters on receipt. Thus, for example, the value 12 (ASCII form
feed) can be represented by "=0C", and the value 61 (ASCII EQUAL
SIGN) can be represented by "=3D". Except when the following
rules allow an alternative encoding, this rule is mandatory.
IT's this line: "Uppercase letters must be used when sending hexadecimal
data,
though a robust implementation may choose to recognize lowercase
letters on receipt."
Re: character set / encoding problem?
Posted by wolfgang <me...@gmx.net>.
In an older episode (Saturday 30 April 2005 14:45), Theo Van Dinter wrote:
> "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc...
> SA handles "proper" encoding (it handles a lot of non-proper encoding
> as well), but doesn't make guesses if the MIME part says there is no
> encoding.
>
> Without samples of the message, it's hard to comment on why something does
or
> does not work.
the message headers say
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
I enclose the message for reference, local user data obfuscated with "xxx".
regards,
wolfgang
Re: character set / encoding problem?
Posted by Theo Van Dinter <fe...@kluge.net>.
On Sat, Apr 30, 2005 at 02:41:57PM -0500, David B Funk wrote:
> We've already gone 'round this issue in past discussions on this list, the
> DEVs reply was, maybe 'fixed' in future releases.
Ok, fair enough. Then FYI: 3.1 handles the lowercase version. :)
--
Randomly Generated Tagline:
"I was up all night trying to round off infinity." - Bob Lazarus
Re: character set / encoding problem?
Posted by wolfgang <me...@gmx.net>.
In an older episode (Saturday 30 April 2005 21:41), David B Funk wrote:
> In the meantime, I've coded local rules that explicitly target this bogus
> encoding as a spam sign:
>
> body L_BOGUS_QP1 /\b=2e(?:com|biz|info|net|org|us)[:\/]\b/
> describe L_BOGUS_QP1 Bogus QuotedPrintable encoding
> score L_BOGUS_QP1 1.1
>
> meta L_BOGUS_QP2 (L_BOGUS_QP1 && HTML_MESSAGE)
> describe L_BOGUS_QP2 HTML message that uses Bogus QP
> score L_BOGUS_QP2 1.5
they don't work for me with the message I enclosed earlier.
why "\b=2e" by the way?
regards,
wolfgang
Re: character set / encoding problem?
Posted by David B Funk <db...@engineering.uiowa.edu>.
On Sat, 30 Apr 2005, Theo Van Dinter wrote:
> On Sat, Apr 30, 2005 at 01:27:39PM +0200, wolfgang wrote:
> > Again and again, we receive messages that contain stuff like
> > <a href=3d"http://advinc-ma=2enetfirms=2ecom/">
> > instead of
> > <a href="http://advinc-ma.netfirms.com/">
> >
> > I wonder if there is some "function" to
> > automatically "de-code" such items instead of having to use stuff like
> > /netfirms(?:\.|=2e)com/ and how i could use it with SA.
>
> "=3d" is quoted-printable encoding for "=", "=2e" for ".", etc...
> SA handles "proper" encoding (it handles a lot of non-proper encoding
> as well), but doesn't make guesses if the MIME part says there is no
> encoding.
No, '=3d' is BOGUS, it is not RFC compliant quoted-printable encoding.
The MIME RFC states clearly that the hex characters MUST be CAPS
(EG '=3D' is valid QP, '=3d' is not). SA does not handle the bogus form
altho many mail clients do.
We've already gone 'round this issue in past discussions on this list, the
DEVs reply was, maybe 'fixed' in future releases.
In the meantime, I've coded local rules that explicitly target this bogus
encoding as a spam sign:
body L_BOGUS_QP1 /\b=2e(?:com|biz|info|net|org|us)[:\/]\b/
describe L_BOGUS_QP1 Bogus QuotedPrintable encoding
score L_BOGUS_QP1 1.1
meta L_BOGUS_QP2 (L_BOGUS_QP1 && HTML_MESSAGE)
describe L_BOGUS_QP2 HTML message that uses Bogus QP
score L_BOGUS_QP2 1.5
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: character set / encoding problem?
Posted by Theo Van Dinter <fe...@kluge.net>.
On Sat, Apr 30, 2005 at 01:27:39PM +0200, wolfgang wrote:
> Again and again, we receive messages that contain stuff like
> <a href=3d"http://advinc-ma=2enetfirms=2ecom/">
> instead of
> <a href="http://advinc-ma.netfirms.com/">
>
> I wonder if there is some "function" to
> automatically "de-code" such items instead of having to use stuff like
> /netfirms(?:\.|=2e)com/ and how i could use it with SA.
"=3d" is quoted-printable encoding for "=", "=2e" for ".", etc...
SA handles "proper" encoding (it handles a lot of non-proper encoding
as well), but doesn't make guesses if the MIME part says there is no
encoding.
Without samples of the message, it's hard to comment on why something does or
does not work.
--
Randomly Generated Tagline:
"M: Can anyone tell us the lesson that has been learned here?
S: Yes Master, not a single one of us could defeat you.
M: You gain wisdom child ... " - The Frantics
Re: character set / encoding problem?
Posted by wolfgang <me...@gmx.net>.
In an older episode (Sunday 01 May 2005 02:07), Loren Wilton wrote:
> > Again and again, we receive messages that contain stuff like
> > <a href=3d"http://advinc-ma=2enetfirms=2ecom/">
> > instead of
> > <a href="http://advinc-ma.netfirms.com/">
> >
> > That prevents uri / body rules like e.g.
>
> no/yes
>
> > /netfirms\.com/
> > and URIBL rules from being triggered. I wonder if there is some "function"
> to
> > automatically "de-code" such items instead of having to use stuff like
> > /netfirms(?:\.|=2e)com/ and how i could use it with SA.
>
> URI rules should be hitting already; certainly on 3.0.
indeed, URI rules hit, thanks for the hint.
> Body rules on 3.0
> may be failing. But then, I'm not sure that 3.0 will have the uri in the
> body text. Rawbody and full rules will certainly fail. After all, that is
> the whole reason the spammers do that extra extraneous encoding.
>
> But then, it is nice that they put that extra encoding in the uris. Makes
> it easy to add points for useless uri encoding. :-)
>
> Loren