You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Vikram Goyal <te...@craftbits.com> on 2003/11/23 00:51:30 UTC

[Codec] RFC2045 Anomaly?

Hello,

I have been losing sleep over this for the past few days and I am now not
sure if there is something wrong with RFC2045 or my interpretation of it. I
am hoping someone can confirm that it is the latter.

I have come across this while writing a chapter on Codec and hence my
detailed examination of the said rfc for Base64 transformations. Here is a
link to this rfc: http://www.ietf.org/rfc/rfc2045.txt

Consider section 6, second paragraph:

"It is necessary, therefore, to define a standard mechanism for encoding
such data into a 7bit short line format".

This line establishes that the document defines a mechanism for encoding
data into 7bit format. Good. It then goes on to describe two such encoding
mechanisms, the quoted-printable and Base64. Thus we can agree that
Quoted-Printable and/or Base64 encoded data is in 7bit format.

Now consider section 2.7 (definition of 7bit data), second sentence:

"No octets with decimal values greater than 127 are allowed and neither are
NULs (octets with decimal value 0).  "

By this definition, 7bit data must not include NUL data, that is an octet
with decimal value 0.

Now, if we look at the Base 64 vocabulary, we can see that a value of 52 is
encoded as 0, which is in opposition to the sentence above.

The easy explanation for this, which does seem to make more sense now that I
think about it, is that the Base64 vocabulary does not include octets,
instead, it has sixtets converted into textual characters. However, the
nagging doubt with this is, that a 0 in an octet, is the same as a 0 in a
sixtet.

Comments?

--
Vikram



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [Codec] RFC2045 Anomaly?

Posted by John Zoetebier <jo...@transparent.co.nz>.
On Sun, 23 Nov 2003 17:26:40 +1000, Vikram Goyal <te...@craftbits.com> 
wrote:

>> There is no problem once you understand that the meaning of "0" is
>> contextual.
>> The first "0" is binary zero.
>> The second "0" is digit zero, a character representing the digit "0".
>>
>
> Yes, but as per the rfc, Base64 encoding results in 7bit short line
> transformations. But if the encoded data contains the character '0' it
> violates this principle as 7bit data should not contain this character. A
> '0' character will be the same as a Nul value.

Character "0" is not the same as binary "0".
Just have a look at the ASCII table at http://www.asciitable.com/ref.html
Character "0" is mapped on a bit serie with decimal value 38
This is in 7-bit notation 0100110
The null character is in 7-bit notation 0000000

-- 
John Zoetebier
Web site: http://www.transparent.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [Codec] RFC2045 Anomaly?

Posted by Vikram Goyal <te...@craftbits.com>.
> There is no problem once you understand that the meaning of "0" is
> contextual.
> The first "0" is binary zero.
> The second "0" is digit zero, a character representing the digit "0".
>

Yes, but as per the rfc, Base64 encoding results in 7bit short line
transformations. But if the encoded data contains the character '0' it
violates this principle as 7bit data should not contain this character. A
'0' character will be the same as a Nul value.

--
Vikram



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [Codec] RFC2045 Anomaly?

Posted by John Zoetebier <jo...@transparent.co.nz>.
On Sun, 23 Nov 2003 09:51:30 +1000, Vikram Goyal <te...@craftbits.com> 
wrote:

[...]
> Now consider section 2.7 (definition of 7bit data), second sentence:
>
> "No octets with decimal values greater than 127 are allowed and neither 
> are
> NULs (octets with decimal value 0).  "
>
> By this definition, 7bit data must not include NUL data, that is an octet
> with decimal value 0.
>
> Now, if we look at the Base 64 vocabulary, we can see that a value of 52 
> is
> encoded as 0, which is in opposition to the sentence above.

There is no problem once you understand that the meaning of "0" is 
contextual.
The first "0" is binary zero.
The second "0" is digit zero, a character representing the digit "0".

The concept of Base64 is very simple: map a bit stream on a character 
stream
1) The character set is equal to the regular expresion [A-Za-z0-9+/]
	This is A to Z, a to z, 0 to 9, + or /
	Total number of characters: 26 + 26 + 10 + 2 = 64
2) Split the binary stream up in 6 bit chunks and map each chunk on a 
member of character set 1
	Total number of characters: 2 ** 6 = 64

As this is a 1 to 1 mapping we can reverse the process and decode to get 
the original bit stream.
The other stuff about identity, 7 or 8 bits is how the stream of 
characters enter a mail server or any other server for that part.

That's it :)

-- 
John Zoetebier
Web site: http://www.transparent.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org