You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Don Saxton <ds...@pacbell.net> on 2003/12/31 23:20:20 UTC

Interpreting Subject line

I just found something that surprised me while adding to my anti spam
tricks.  This subject line below is interpreted as "Pay Pennies on the
Dollar for your Prescrip(tion!"

Can someone please enlighten me?  Where is this written?

Don

Subject:
=?iso-8859-1?B?UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh0a
W9uIQ==?=


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Interpreting Subject line

Posted by Jason Lea <ja...@kumachan.net.nz>.
euphobot wrote:

>Thanks Jason,
>
>The RFC 2047 was exactly what I was missing.  Especially nice that it is so
>short and sweet -- also easy to miss.    Your summary is even more succinct.
>
>It appears that this is a useful rfc for using non latin 1 languages in
>subjects and the 'personal' part of from/to addresses. But
>"=?iso-8859-1?B?...?=" or "=?iso-8859-1?Q?...?=" are both useless and
>obvious attempts to confuse.  So I think they both signal spam.
>  
>
Sounds like a good guess to me :)
It is the default encoding so not many programs would set that encoding.

>Living in NZ you may be closer to the action in character coding and where
>more people are aware of this facility. Are there any useful exceptions to
>ruling these two as spam?
>  
>
I probably aren't the best person to answer this question as I don't 
spend too much time on dealing with spam.  I use JAMES with 
authentication and let Mozilla Thunderbird filter any other spam I have 
(which isn't much).  You were just lucky that I was diving into enabling 
my application to be able to send multi-lingual email messages over the 
last week.  I think I encountered this character set suff more because I 
have a Japanese wife, so I have made the small step of making my 
application multi-lingual too.

>Don
>
>----- Original Message ----- 
>From: "Jason Lea" <ja...@kumachan.net.nz>
>To: "James Users List" <se...@james.apache.org>
>Sent: Wednesday, December 31, 2003 2:50 PM
>Subject: Re: Interpreting Subject line
>
>
>  
>
>>I have been playing around with email encoding in the last few days -
>>trying to send UTF-8 encoded email so that stuff is defined in RFC2047...
>>
>>encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
>>
>>charset=iso-8859-1 (normal Latin-1 encoding)
>>encoding=B (can be either B or Q encoding)
>>
>>    
>>
>encoded-text=UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh0aW9
>uIQ==
>  
>
>>By chosing the 'B' encoding it is causing every character to be encoded
>>using BASE64 encoding.  Normally if you send mostly ASCII characters you
>>would choose type Q where the 8-bit characters are converted into =XX
>>format (eg '=' becomes '=3D', SPACE becomes '=20').
>>
>>see here for more info http://www.faqs.org/rfcs/rfc2047
>>
>>Vinny wrote:
>>
>>    
>>
>>>Sorry if this sounds stupid, but how do you interpret that crazy string
>>>      
>>>
>of
>  
>
>>>characters? Is that sound king of screwed up MD5 sum or something?
>>>
>>>
>>>
>>>      
>>>
>>>=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcC
>>>      
>>>
>h
>  
>
>>>>        
>>>>
>>>0a
>>>
>>>
>>>      
>>>
>>>>W9uIQ==?=
>>>>
>>>>
>>>>        
>>>>
>>>-Vinny
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>>For additional commands, e-mail: server-user-help@james.apache.org
>>>
>>>
>>>
>>>
>>>      
>>>
>>-- 
>>Jason Lea
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>  
>


-- 
Jason Lea



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Interpreting Subject line

Posted by euphobot <eu...@pacbell.net>.
Thanks Jason,

The RFC 2047 was exactly what I was missing.  Especially nice that it is so
short and sweet -- also easy to miss.    Your summary is even more succinct.

It appears that this is a useful rfc for using non latin 1 languages in
subjects and the 'personal' part of from/to addresses. But
"=?iso-8859-1?B?...?=" or "=?iso-8859-1?Q?...?=" are both useless and
obvious attempts to confuse.  So I think they both signal spam.

Living in NZ you may be closer to the action in character coding and where
more people are aware of this facility. Are there any useful exceptions to
ruling these two as spam?

Don

----- Original Message ----- 
From: "Jason Lea" <ja...@kumachan.net.nz>
To: "James Users List" <se...@james.apache.org>
Sent: Wednesday, December 31, 2003 2:50 PM
Subject: Re: Interpreting Subject line


> I have been playing around with email encoding in the last few days -
> trying to send UTF-8 encoded email so that stuff is defined in RFC2047...
>
> encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
>
> charset=iso-8859-1 (normal Latin-1 encoding)
> encoding=B (can be either B or Q encoding)
>
encoded-text=UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh0aW9
uIQ==
>
> By chosing the 'B' encoding it is causing every character to be encoded
> using BASE64 encoding.  Normally if you send mostly ASCII characters you
> would choose type Q where the 8-bit characters are converted into =XX
> format (eg '=' becomes '=3D', SPACE becomes '=20').
>
> see here for more info http://www.faqs.org/rfcs/rfc2047
>
> Vinny wrote:
>
> >Sorry if this sounds stupid, but how do you interpret that crazy string
of
> >characters? Is that sound king of screwed up MD5 sum or something?
> >
> >
> >
>
>>=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcC
h
> >>
> >>
> >0a
> >
> >
> >>W9uIQ==?=
> >>
> >>
> >
> >
> >-Vinny
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >
> >
>
>
> -- 
> Jason Lea
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Interpreting Subject line

Posted by euphobot <eu...@pacbell.net>.
Danny

> I'm a massive fan of Base64 because it lets you safely transfer any
payload through
> every torture known to mail, and would hate for it to become blackballed
just
> because, through lazyness, it allows spammers to obfuscate content.

I can understand B encoding on binary attachements of every kind and text of
every character set except  iso-8859-1 which is the one char set that
ancient email passes completely. When it comes to B encoding the subject
line do you really use =?iso-8859-1?B and do you have an example that would
show a compelling reason?

Thanks,
Don


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: Interpreting Subject line

Posted by Danny Angus <da...@apache.org>.
> By chosing the 'B' encoding it is causing every character to be encoded 
> using BASE64 encoding.  Normally if you send mostly ASCII characters you 
> would choose type Q where the 8-bit characters are converted into =XX 
> format (eg '=' becomes '=3D', SPACE becomes '=20').

That is known as "Quoted Printable" it is more common because it is friendlier for those luddites ;-) who use ascii terminals to read their mail.

>From the POV of spam it would be most sensible to decode headers and content first, that way you'll match your patterns no matter what encoding is used.

I'm a massive fan of Base64 because it lets you safely transfer any payload through every torture known to mail, and would hate for it to become blackballed just because, through lazyness, it allows spammers to obfuscate content.

d.


> 
> see here for more info http://www.faqs.org/rfcs/rfc2047
> 
> Vinny wrote:
> 
> >Sorry if this sounds stupid, but how do you interpret that crazy 
> string of
> >characters? Is that sound king of screwed up MD5 sum or something?
> >
> >  
> >
> >>=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQ
> cmVzY3JpcCh
> >>    
> >>
> >0a
> >  
> >
> >>W9uIQ==?=
> >>    
> >>
> >
> >
> >-Vinny
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >  
> >
> 
> 
> -- 
> Jason Lea
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Interpreting Subject line

Posted by Jason Lea <ja...@kumachan.net.nz>.
I have been playing around with email encoding in the last few days - 
trying to send UTF-8 encoded email so that stuff is defined in RFC2047...

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

charset=iso-8859-1 (normal Latin-1 encoding)
encoding=B (can be either B or Q encoding)
encoded-text=UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh0aW9uIQ==

By chosing the 'B' encoding it is causing every character to be encoded 
using BASE64 encoding.  Normally if you send mostly ASCII characters you 
would choose type Q where the 8-bit characters are converted into =XX 
format (eg '=' becomes '=3D', SPACE becomes '=20').

see here for more info http://www.faqs.org/rfcs/rfc2047

Vinny wrote:

>Sorry if this sounds stupid, but how do you interpret that crazy string of
>characters? Is that sound king of screwed up MD5 sum or something?
>
>  
>
>>=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh
>>    
>>
>0a
>  
>
>>W9uIQ==?=
>>    
>>
>
>
>-Vinny
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>  
>


-- 
Jason Lea



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: Interpreting Subject line

Posted by Vinny <co...@comcast.net>.
Sorry if this sounds stupid, but how do you interpret that crazy string of
characters? Is that sound king of screwed up MD5 sum or something?

>=?iso-8859-1?B?>UGF5IFBlbm5pZXMgb24gdGhlIERvbGxhciBmb3IgeW91ciBQcmVzY3JpcCh
0a
> W9uIQ==?=


-Vinny


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org