You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "O'brien, Tim" <to...@transolutions.net> on 2003/02/04 18:41:27 UTC

[codec] RE: Base64.java

Here's Martin's post on rpc-dev re: cr/lf: 
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=rpc-dev@xml.apache.org&m
sgNo=713

Here's some observations, I've copied individuals from both the xml-rpc and
the httpclient project.  It all boils down to using Base64 encoding in the
context of two different RFCs, 2045 and 2616.  I believe that we can come to
an agreement here by adding some option flags to the method signatures.

*** XML-RPC facts:

1. I believe that XML-RPC is using Base64 in the context of RFC 2045 which
requires Base64 content to be encoded in 76 character "chunks" separated by
a newline character.  The traling newline character is added to "terminate"
the final chunk.

2. XML-RPC is also adhereing to the requirement to discard all whitespace
when decoding base64 data.

3. XML-RPC is not complying with the requirement to convert text to
canonical form - replacing "text line breaks" with "CRLF sequences".

*** HTTPClient facts:

1. HttpClient's usage of Base64 does not create chunks of 76 characters
separated by newlines - as this would interfere with HTTP headers.

2. HttpClient's Base64 doesn't discard whitespace because in the context of
usage, no whitespace is added to the encoded output - see #1



** Here is RFC 2045 Multipurpose Internet Mail Extensions:
http://www.ietf.org/rfc/rfc2045.txt

2045 requirement 1: RFC 2045 on converting text material to canonical form:
"Care must be taken to use the proper octets for line breaks if base64
encoding is applied directly to text material that has not been converted to
canonical form.  In particular, text line breaks must be converted into CRLF
sequences prior to base64 encoding.  The important thing to note is that
this may be done directly by the encoder rather than in a prior
canonicalization step in some implementations."

2045 requirement 2: In terms of RFC 2045, requirement for "chunking" and
ignoring white space when decoding: "The encoded output stream must be
represented in lines of no more than 76 characters each.  All line breaks or
other characters not found in Table 1 must be ignored by decoding software.
In base64 data, characters other than those in Table 1, line breaks, and
other white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate under some
circumstances."


** Here is RFC 2616 HTTP 1.1 which talks about base64 of an MD5 digest in a
header: http://www.ietf.org/rfc/rfc2616.txt?number=2616

"Conversion of all line breaks to CRLF MUST NOT be done before computing or
checking the digest: the line break convention used in the text actually
transmitted MUST be left unaltered when computing the digest."

"Note: while the definition of Content-MD5 is exactly the same for HTTP as
in RFC 1864 for MIME entity-bodies, there are several ways in which the
application of Content-MD5 to HTTP entity-bodies differs from its
application to MIME entity-bodies. One is that HTTP, unlike MIME, does not
use Content-Transfer-Encoding, and does use Transfer-Encoding and
Content-Encoding. Another is that HTTP more frequently uses binary content
types than MIME, so it is worth noting that, in such cases, the byte order
used to compute the digest is the transmission byte order defined for the
type. Lastly, HTTP allows transmission of text types with any of several
line break conventions and not just the canonical form using CRLF."


--------
Tim O'Brien 

> -----Original Message-----
> From: Jeffrey Dever [mailto:jsdever@sympatico.ca] 
> Sent: Tuesday, February 04, 2003 10:16 AM
> To: O'brien, Tim
> Cc: 'Martin Redington'; rhoegg@isisnetworks.net
> Subject: Re: Base64.java
> 
> 
> Http is very cr/lf aware. We use Base64 for encoding/decoding values 
> that are added to headers which are always appended with a cr/lf as a 
> value is not to contain the line delimiter.
> 
> Where (which) rfc does it state the trailing cr/lf?
> 
> Jandalf.
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] RE: Base64.java

Posted by "O'brien, Tim" <to...@transolutions.net>.
Alright, changes were made to org.apache.commons.codec.binary.Base64

#1 byte[] Base64.encode( byte[] ) 	- No chunking, for HttpClient
#2 byte[] Base64.encodeChunked( byte[] )   - chunking for RFC 2045
compliance
#3 byte[] Base64.encode( byte[], boolean )  - there was no reason to make
this method private

We can make #3 private is someone feels strongly about it, but for now.
We've got a solution for reuse across two projects.

Next step: We need a better way to supply test data and expected encodings
to these Junit tests - anyone have any experience with something like
JXUnit?


--------
Tim O'Brien 

> -----Original Message-----
> From: Jeffrey Dever [mailto:jsdever@sympatico.ca] 
> Sent: Tuesday, February 04, 2003 3:11 PM
> To: Jakarta Commons Developers List
> Subject: Re: [codec] RE: Base64.java
> 
> 
> Well said.  Totally agree.
> 
> Ryan Hoegg wrote:
> 
> > We do as well.  Codec should probably end up with the 
> union, not the 
> > intersection   However, ours tests for the MIME specific 76 
> character 
> > line wrapping.
> >
> > --
> > Ryan Hoegg
> > ISIS Networks
> > http://www.isisnetworks.net
> >
> > Jeffrey Dever wrote:
> >
> >> That sounds good here too.  We only call Base64 methods a couple of
> >> times anyway, so adaptation is a minor issue.
> >>
> >> Just a note about tests, we have a Junit test class (not sure if
> >> xml-rpc has one) that should go along with the main Base64 class.  
> >> You did a grep on the code base so I'm sure you are aware of it.
> >>
> >> Jandalf.
> >>
> >>> I like your plan Tim.  Let's get 1.1 nailed down so we are all on
> >>> the same codebase, and do the design of 2.0 right.  I don't think 
> >>> XML-RPC cares much whether we get the default in 
> decode(byte[]) in 
> >>> 1.1, I think that decision should be made based on RFC 
> interpretation.
> >>>
> >>> --
> >>> Ryan Hoegg
> >>> ISIS Networks
> >>> http://www.isisnetworks.net 
> >>
> >>
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Jeffrey Dever <js...@sympatico.ca>.
Well said.  Totally agree.

Ryan Hoegg wrote:

> We do as well.  Codec should probably end up with the union, not the 
> intersection   However, ours tests for the MIME specific 76 character 
> line wrapping.
>
> --
> Ryan Hoegg
> ISIS Networks
> http://www.isisnetworks.net
>
> Jeffrey Dever wrote:
>
>> That sounds good here too.  We only call Base64 methods a couple of 
>> times anyway, so adaptation is a minor issue.
>>
>> Just a note about tests, we have a Junit test class (not sure if 
>> xml-rpc has one) that should go along with the main Base64 class.  
>> You did a grep on the code base so I'm sure you are aware of it.
>>
>> Jandalf.
>>
>>> I like your plan Tim.  Let's get 1.1 nailed down so we are all on 
>>> the same codebase, and do the design of 2.0 right.  I don't think 
>>> XML-RPC cares much whether we get the default in decode(byte[]) in 
>>> 1.1, I think that decision should be made based on RFC interpretation.
>>>
>>> -- 
>>> Ryan Hoegg
>>> ISIS Networks
>>> http://www.isisnetworks.net 
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Ryan Hoegg <rh...@isisnetworks.net>.
We do as well.  Codec should probably end up with the union, not the 
intersection   However, ours tests for the MIME specific 76 character 
line wrapping.

--
Ryan Hoegg
ISIS Networks
http://www.isisnetworks.net

Jeffrey Dever wrote:

> That sounds good here too.  We only call Base64 methods a couple of 
> times anyway, so adaptation is a minor issue.
>
> Just a note about tests, we have a Junit test class (not sure if 
> xml-rpc has one) that should go along with the main Base64 class.  You 
> did a grep on the code base so I'm sure you are aware of it.
>
> Jandalf.
>
>> I like your plan Tim.  Let's get 1.1 nailed down so we are all on the 
>> same codebase, and do the design of 2.0 right.  I don't think XML-RPC 
>> cares much whether we get the default in decode(byte[]) in 1.1, I 
>> think that decision should be made based on RFC interpretation.
>>
>> -- 
>> Ryan Hoegg
>> ISIS Networks
>> http://www.isisnetworks.net 
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Jeffrey Dever <js...@sympatico.ca>.
That sounds good here too.  We only call Base64 methods a couple of 
times anyway, so adaptation is a minor issue.

Just a note about tests, we have a Junit test class (not sure if xml-rpc 
has one) that should go along with the main Base64 class.  You did a 
grep on the code base so I'm sure you are aware of it.

Jandalf.

> I like your plan Tim.  Let's get 1.1 nailed down so we are all on the 
> same codebase, and do the design of 2.0 right.  I don't think XML-RPC 
> cares much whether we get the default in decode(byte[]) in 1.1, I 
> think that decision should be made based on RFC interpretation.
>
> --
> Ryan Hoegg
> ISIS Networks
> http://www.isisnetworks.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Ryan Hoegg <rh...@isisnetworks.net>.
O'brien, Tim wrote:

>>1) One of the classes in the package set the flags in a static block.
>>2) The flags are set before each call to the Base64 
>>encode/decode methods.
>>    
>>
>>To use 
>>flags safely, you would have to make them instance members and requre 
>>instantiation of Base64 objects.  This does more harm than good.
>>    
>>
>
>No, static flags.  That is harmful.  If you take a look at
>http://cvs.apache.org/~tobrien/codec/evolve.html, you notice that we move
>towards an approach where one could instantiate a Base64 instance and
>configure certain properties of the encoding algorithm.  We'll keep the
>static methods for ease of use, but behind the scenes those two functions
>would be maintaining and configuring two separate "instances" of the class -
>we'll get there, my main concern right now is to achieve a level of reuse
>and then we can go about developing sound OO design behind the scenes
>without violating our contract to both xml-rpc and httpclient.
>
>It cannot be overemphasized that we are not talking about configuring a
>class via static flags.  That's a dangerous proposition, especially in
>multi-threaded environments.
>
>I'm out of contact for 2 days, so I'll let the discussion simmer a bit.
>
>--------
>Tim O'Brien 
>
I like your plan Tim.  Let's get 1.1 nailed down so we are all on the 
same codebase, and do the design of 2.0 right.  I don't think XML-RPC 
cares much whether we get the default in decode(byte[]) in 1.1, I think 
that decision should be made based on RFC interpretation.

--
Ryan Hoegg
ISIS Networks
http://www.isisnetworks.net


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] RE: Base64.java

Posted by "O'brien, Tim" <to...@transolutions.net>.
> 1) One of the classes in the package set the flags in a static block.
> 2) The flags are set before each call to the Base64 
> encode/decode methods.

> To use 
> flags safely, you would have to make them instance members and requre 
> instantiation of Base64 objects.  This does more harm than good.

No, static flags.  That is harmful.  If you take a look at
http://cvs.apache.org/~tobrien/codec/evolve.html, you notice that we move
towards an approach where one could instantiate a Base64 instance and
configure certain properties of the encoding algorithm.  We'll keep the
static methods for ease of use, but behind the scenes those two functions
would be maintaining and configuring two separate "instances" of the class -
we'll get there, my main concern right now is to achieve a level of reuse
and then we can go about developing sound OO design behind the scenes
without violating our contract to both xml-rpc and httpclient.

It cannot be overemphasized that we are not talking about configuring a
class via static flags.  That's a dangerous proposition, especially in
multi-threaded environments.

I'm out of contact for 2 days, so I'll let the discussion simmer a bit.

--------
Tim O'Brien 

> -----Original Message-----
> From: Jeffrey Dever [mailto:jsdever@sympatico.ca] 

> Lets talk about the usage pattern for those flags, there 
> really are only 
> two ways to go:
> 
> In case 1), if you are the only package loaded in the jvm, and you 
> always use Base64 the same way, then you are fine.  But if 
> there other 
> packages loaded in the jvm that also use Base64, but with different 
> flags, then the class loading order determines which flags are used. 
>  You are broken.
> 
> In case 2), if everything is running in one thread, then you are ok. 
>  But if there are multiple threads, then you can have your 
> flags set out 
> from under you while doing an decode/encode.  You are broken.
> 
> As we get more applications built out of commons projects, and more 
> commons projects that depend on other commons projects (like 
> HttpClient), these static issues become more and more likely.  To use 
> flags safely, you would have to make them instance members and requre 
> instantiation of Base64 objects.  This does more harm than good.
> 
> Static Flags Considered Harmful!
> 
> There does not seem to be much choice other than overloading 
> the method 
> signatures:
> public static byte[] decode(byte[] data);
> public static byte[] decode(byte[] data, boolean chunk);
> 
> Jandalf.
> 
> 
> O'brien, Tim wrote:
> 
> >Here's Martin's post on rpc-dev re: cr/lf:
> >http://nagoya.apache.org/eyebrowse/ReadMsg?listName=rpc-dev@x
> ml.apache.org&m
> >sgNo=713
> >
> >Here's some observations, I've copied individuals from both 
> the xml-rpc 
> >and the httpclient project.  It all boils down to using 
> Base64 encoding 
> >in the context of two different RFCs, 2045 and 2616.  I 
> believe that we 
> >can come to an agreement here by adding some option flags to 
> the method 
> >signatures.
> >
> >*** XML-RPC facts:
> >
> >1. I believe that XML-RPC is using Base64 in the context of RFC 2045 
> >which requires Base64 content to be encoded in 76 character "chunks" 
> >separated by a newline character.  The traling newline character is 
> >added to "terminate" the final chunk.
> >
> >2. XML-RPC is also adhereing to the requirement to discard all 
> >whitespace when decoding base64 data.
> >
> >3. XML-RPC is not complying with the requirement to convert text to 
> >canonical form - replacing "text line breaks" with "CRLF sequences".
> >
> >*** HTTPClient facts:
> >
> >1. HttpClient's usage of Base64 does not create chunks of 76 
> characters 
> >separated by newlines - as this would interfere with HTTP headers.
> >
> >2. HttpClient's Base64 doesn't discard whitespace because in the 
> >context of usage, no whitespace is added to the encoded 
> output - see #1
> >
> >
> >
> >** Here is RFC 2045 Multipurpose Internet Mail Extensions: 
> >http://www.ietf.org/rfc/rfc2045.txt
> >
> >2045 requirement 1: RFC 2045 on converting text material to 
> canonical 
> >form: "Care must be taken to use the proper octets for line 
> breaks if 
> >base64 encoding is applied directly to text material that 
> has not been 
> >converted to canonical form.  In particular, text line 
> breaks must be 
> >converted into CRLF sequences prior to base64 encoding.  The 
> important 
> >thing to note is that this may be done directly by the 
> encoder rather 
> >than in a prior canonicalization step in some implementations."
> >
> >2045 requirement 2: In terms of RFC 2045, requirement for "chunking" 
> >and ignoring white space when decoding: "The encoded output 
> stream must 
> >be represented in lines of no more than 76 characters each.  
> All line 
> >breaks or other characters not found in Table 1 must be ignored by 
> >decoding software. In base64 data, characters other than 
> those in Table 
> >1, line breaks, and other white space probably indicate a 
> transmission 
> >error, about which a warning message or even a message 
> rejection might 
> >be appropriate under some circumstances."
> >
> >
> >** Here is RFC 2616 HTTP 1.1 which talks about base64 of an 
> MD5 digest 
> >in a
> >header: http://www.ietf.org/rfc/rfc2616.txt?number=2616
> >
> >"Conversion of all line breaks to CRLF MUST NOT be done before 
> >computing or checking the digest: the line break convention 
> used in the 
> >text actually transmitted MUST be left unaltered when computing the 
> >digest."
> >
> >"Note: while the definition of Content-MD5 is exactly the 
> same for HTTP 
> >as in RFC 1864 for MIME entity-bodies, there are several 
> ways in which 
> >the application of Content-MD5 to HTTP entity-bodies differs 
> from its 
> >application to MIME entity-bodies. One is that HTTP, unlike 
> MIME, does 
> >not use Content-Transfer-Encoding, and does use 
> Transfer-Encoding and 
> >Content-Encoding. Another is that HTTP more frequently uses binary 
> >content types than MIME, so it is worth noting that, in such 
> cases, the 
> >byte order used to compute the digest is the transmission byte order 
> >defined for the type. Lastly, HTTP allows transmission of text types 
> >with any of several line break conventions and not just the 
> canonical 
> >form using CRLF."
> >
> >
> >--------
> >Tim O'Brien
> >
> >  
> >
> >>-----Original Message-----
> >>From: Jeffrey Dever [mailto:jsdever@sympatico.ca]
> >>Sent: Tuesday, February 04, 2003 10:16 AM
> >>To: O'brien, Tim
> >>Cc: 'Martin Redington'; rhoegg@isisnetworks.net
> >>Subject: Re: Base64.java
> >>
> >>
> >>Http is very cr/lf aware. We use Base64 for encoding/decoding values
> >>that are added to headers which are always appended with a 
> cr/lf as a 
> >>value is not to contain the line delimiter.
> >>
> >>Where (which) rfc does it state the trailing cr/lf?
> >>
> >>Jandalf.
> >>
> >>    
> >>
> >
> >
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Jeffrey Dever <js...@sympatico.ca>.
Fair enough.  I guess the normal case would be just 
encode/decode(byte[]) without 76 character chunks, which is what 
HttpClient would be using anyway.  

So I take back any preference to method signature, except that the 
"normal" or "default" case be not chunked.  If xml-rpc is going to be 
using the chunked form, then their prefrence should be given high weight.

Jandalf.


Henri Yandell wrote:

>I agree with you up until the last point.
>
>Rather than an obscure and irritating boolean argument on the end, just
>offer a different name.
>
>public static byte[] decode(byte[] data);
>public static byte[] decodeChunked(byte[] data);
>
>[bear in mind decodeChunked may be a bad name. I'm just copying :) ]
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Jeffrey Dever <js...@sympatico.ca>.
>
>
>Base64 is well understood just as a general encoding scheme outside of RFC
>2045 MIME.   RFC 2045 MIME adds a further requirement that the content be
>put into 76 character chunks.  Past 1.1 we could also talk about Base64.java
>providing the core "base64" encoding, and MIMEBase64.java extending Base64
>and adding "pre" and "post" processing to the algorithm.
>  
>
As long as efficiency is held as a key requirement over OO purity. 
 After all, Base64 is functional in nature.  

There should only be one memory allocation for the output, and one 
iteration over the input.  It would be inefficient to allocate the 
output array, iterate over the input for the general algorithm, then 
have to reallocate another array and iterate over the intermediate 
output just to drop in '\n' every 76 characters.

But in the HttpClient case, we only encode/decode small inputs (few 
hundred bytes) anyway.

Jandalf.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] RE: Base64.java

Posted by "O'brien, Tim" <to...@transolutions.net>.
One would have an encodeChunked, but there would be no need for a
decodeChunked.  Whitespace is discarded in base64 data regardless of the
original encoding scheme.

For 1.1, it looks like a consensus is developing for a Base64 with two
functions, one chunked and one not chunked.  The discussions for 2.0 are
open.  Let me add another possibility. 

Base64 is well understood just as a general encoding scheme outside of RFC
2045 MIME.   RFC 2045 MIME adds a further requirement that the content be
put into 76 character chunks.  Past 1.1 we could also talk about Base64.java
providing the core "base64" encoding, and MIMEBase64.java extending Base64
and adding "pre" and "post" processing to the algorithm.

--------
Tim O'Brien 

> -----Original Message-----
> From: Ryan Hoegg [mailto:rhoegg@isisnetworks.net] 
> Sent: Tuesday, February 04, 2003 2:40 PM
> To: Jakarta Commons Developers List
> Cc: O'brien, Tim; 'Martin Redington'
> Subject: Re: [codec] RE: Base64.java
> 
> 
> If this is the direction in which we are headed, I would nominate 
> Henri's initial idea.  Bear in mind we are not talking about 
> 1.1 here, 
> but 2.0 (i.e. future).
> 
> For 1.1, my vote (as a committer in XML-RPC) is for Jeffrey's 
> solution, 
> with the default being opur consensus on the reading of the 
> relevant RFCs.
> 
> For 2.0, I think the idea of a Base64 interface with different 
> implementations sounds cleaner than either idea.  Reason 
> being, the user 
> of decodeChunked propbably wants to be using encodeChunked as well.
> 
> --
> Ryan Hoegg
> ISIS Networks
> http://www.isisnetworks.net
> 
> Henri Yandell wrote:
> 
> >I agree with you up until the last point.
> >
> >Rather than an obscure and irritating boolean argument on 
> the end, just 
> >offer a different name.
> >
> >public static byte[] decode(byte[] data);
> >public static byte[] decodeChunked(byte[] data);
> >
> >[bear in mind decodeChunked may be a bad name. I'm just copying :) ]
> >
> >Hen
> >
> >On Tue, 4 Feb 2003, Jeffrey Dever wrote:
> >
> >>There does not seem to be much choice other than overloading the 
> >>method
> >>signatures:
> >>public static byte[] decode(byte[] data);
> >>public static byte[] decode(byte[] data, boolean chunk);
> >>
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Ryan Hoegg <rh...@isisnetworks.net>.
If this is the direction in which we are headed, I would nominate 
Henri's initial idea.  Bear in mind we are not talking about 1.1 here, 
but 2.0 (i.e. future).

For 1.1, my vote (as a committer in XML-RPC) is for Jeffrey's solution, 
with the default being opur consensus on the reading of the relevant RFCs.

For 2.0, I think the idea of a Base64 interface with different 
implementations sounds cleaner than either idea.  Reason being, the user 
of decodeChunked propbably wants to be using encodeChunked as well.

--
Ryan Hoegg
ISIS Networks
http://www.isisnetworks.net

Henri Yandell wrote:

>I agree with you up until the last point.
>
>Rather than an obscure and irritating boolean argument on the end, just
>offer a different name.
>
>public static byte[] decode(byte[] data);
>public static byte[] decodeChunked(byte[] data);
>
>[bear in mind decodeChunked may be a bad name. I'm just copying :) ]
>
>Hen
>
>On Tue, 4 Feb 2003, Jeffrey Dever wrote:
>
>>There does not seem to be much choice other than overloading the method
>>signatures:
>>public static byte[] decode(byte[] data);
>>public static byte[] decode(byte[] data, boolean chunk);
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Henri Yandell <ba...@generationjava.com>.
I agree with you up until the last point.

Rather than an obscure and irritating boolean argument on the end, just
offer a different name.

public static byte[] decode(byte[] data);
public static byte[] decodeChunked(byte[] data);

[bear in mind decodeChunked may be a bad name. I'm just copying :) ]

Hen

On Tue, 4 Feb 2003, Jeffrey Dever wrote:

> There does not seem to be much choice other than overloading the method
> signatures:
> public static byte[] decode(byte[] data);
> public static byte[] decode(byte[] data, boolean chunk);
>
> Jandalf.
>
>
> O'brien, Tim wrote:
>
> >Here's Martin's post on rpc-dev re: cr/lf:
> >http://nagoya.apache.org/eyebrowse/ReadMsg?listName=rpc-dev@xml.apache.org&m
> >sgNo=713
> >
> >Here's some observations, I've copied individuals from both the xml-rpc and
> >the httpclient project.  It all boils down to using Base64 encoding in the
> >context of two different RFCs, 2045 and 2616.  I believe that we can come to
> >an agreement here by adding some option flags to the method signatures.
> >
> >*** XML-RPC facts:
> >
> >1. I believe that XML-RPC is using Base64 in the context of RFC 2045 which
> >requires Base64 content to be encoded in 76 character "chunks" separated by
> >a newline character.  The traling newline character is added to "terminate"
> >the final chunk.
> >
> >2. XML-RPC is also adhereing to the requirement to discard all whitespace
> >when decoding base64 data.
> >
> >3. XML-RPC is not complying with the requirement to convert text to
> >canonical form - replacing "text line breaks" with "CRLF sequences".
> >
> >*** HTTPClient facts:
> >
> >1. HttpClient's usage of Base64 does not create chunks of 76 characters
> >separated by newlines - as this would interfere with HTTP headers.
> >
> >2. HttpClient's Base64 doesn't discard whitespace because in the context of
> >usage, no whitespace is added to the encoded output - see #1
> >
> >
> >
> >** Here is RFC 2045 Multipurpose Internet Mail Extensions:
> >http://www.ietf.org/rfc/rfc2045.txt
> >
> >2045 requirement 1: RFC 2045 on converting text material to canonical form:
> >"Care must be taken to use the proper octets for line breaks if base64
> >encoding is applied directly to text material that has not been converted to
> >canonical form.  In particular, text line breaks must be converted into CRLF
> >sequences prior to base64 encoding.  The important thing to note is that
> >this may be done directly by the encoder rather than in a prior
> >canonicalization step in some implementations."
> >
> >2045 requirement 2: In terms of RFC 2045, requirement for "chunking" and
> >ignoring white space when decoding: "The encoded output stream must be
> >represented in lines of no more than 76 characters each.  All line breaks or
> >other characters not found in Table 1 must be ignored by decoding software.
> >In base64 data, characters other than those in Table 1, line breaks, and
> >other white space probably indicate a transmission error, about which a
> >warning message or even a message rejection might be appropriate under some
> >circumstances."
> >
> >
> >** Here is RFC 2616 HTTP 1.1 which talks about base64 of an MD5 digest in a
> >header: http://www.ietf.org/rfc/rfc2616.txt?number=2616
> >
> >"Conversion of all line breaks to CRLF MUST NOT be done before computing or
> >checking the digest: the line break convention used in the text actually
> >transmitted MUST be left unaltered when computing the digest."
> >
> >"Note: while the definition of Content-MD5 is exactly the same for HTTP as
> >in RFC 1864 for MIME entity-bodies, there are several ways in which the
> >application of Content-MD5 to HTTP entity-bodies differs from its
> >application to MIME entity-bodies. One is that HTTP, unlike MIME, does not
> >use Content-Transfer-Encoding, and does use Transfer-Encoding and
> >Content-Encoding. Another is that HTTP more frequently uses binary content
> >types than MIME, so it is worth noting that, in such cases, the byte order
> >used to compute the digest is the transmission byte order defined for the
> >type. Lastly, HTTP allows transmission of text types with any of several
> >line break conventions and not just the canonical form using CRLF."
> >
> >
> >--------
> >Tim O'Brien
> >
> >
> >
> >>-----Original Message-----
> >>From: Jeffrey Dever [mailto:jsdever@sympatico.ca]
> >>Sent: Tuesday, February 04, 2003 10:16 AM
> >>To: O'brien, Tim
> >>Cc: 'Martin Redington'; rhoegg@isisnetworks.net
> >>Subject: Re: Base64.java
> >>
> >>
> >>Http is very cr/lf aware. We use Base64 for encoding/decoding values
> >>that are added to headers which are always appended with a cr/lf as a
> >>value is not to contain the line delimiter.
> >>
> >>Where (which) rfc does it state the trailing cr/lf?
> >>
> >>Jandalf.
> >>
> >>
> >>
> >
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Jeffrey Dever <js...@sympatico.ca>.
Right now the Base64 class is entirely static, with static method and 
static members.  The constructor is private, it cannot be instantiated. 
 I don't think we are looking for any OO here, its functional in nature. 
We are not looking for radical redesign, changes or improvements, just a 
nice logical home in commons for a class that is currently widely used, 
replicated and forked.

Using static flags to configure Base64 behaviour are *not* going to 
work.  Particularly with the concept of commons: small re-useable 
components.  Here is why:

Lets talk about the usage pattern for those flags, there really are only 
two ways to go:
1) One of the classes in the package set the flags in a static block.
2) The flags are set before each call to the Base64 encode/decode methods.

In case 1), if you are the only package loaded in the jvm, and you 
always use Base64 the same way, then you are fine.  But if there other 
packages loaded in the jvm that also use Base64, but with different 
flags, then the class loading order determines which flags are used. 
 You are broken.

In case 2), if everything is running in one thread, then you are ok. 
 But if there are multiple threads, then you can have your flags set out 
from under you while doing an decode/encode.  You are broken.

As we get more applications built out of commons projects, and more 
commons projects that depend on other commons projects (like 
HttpClient), these static issues become more and more likely.  To use 
flags safely, you would have to make them instance members and requre 
instantiation of Base64 objects.  This does more harm than good.

Static Flags Considered Harmful!

There does not seem to be much choice other than overloading the method 
signatures:
public static byte[] decode(byte[] data);
public static byte[] decode(byte[] data, boolean chunk);

Jandalf.


O'brien, Tim wrote:

>Here's Martin's post on rpc-dev re: cr/lf: 
>http://nagoya.apache.org/eyebrowse/ReadMsg?listName=rpc-dev@xml.apache.org&m
>sgNo=713
>
>Here's some observations, I've copied individuals from both the xml-rpc and
>the httpclient project.  It all boils down to using Base64 encoding in the
>context of two different RFCs, 2045 and 2616.  I believe that we can come to
>an agreement here by adding some option flags to the method signatures.
>
>*** XML-RPC facts:
>
>1. I believe that XML-RPC is using Base64 in the context of RFC 2045 which
>requires Base64 content to be encoded in 76 character "chunks" separated by
>a newline character.  The traling newline character is added to "terminate"
>the final chunk.
>
>2. XML-RPC is also adhereing to the requirement to discard all whitespace
>when decoding base64 data.
>
>3. XML-RPC is not complying with the requirement to convert text to
>canonical form - replacing "text line breaks" with "CRLF sequences".
>
>*** HTTPClient facts:
>
>1. HttpClient's usage of Base64 does not create chunks of 76 characters
>separated by newlines - as this would interfere with HTTP headers.
>
>2. HttpClient's Base64 doesn't discard whitespace because in the context of
>usage, no whitespace is added to the encoded output - see #1
>
>
>
>** Here is RFC 2045 Multipurpose Internet Mail Extensions:
>http://www.ietf.org/rfc/rfc2045.txt
>
>2045 requirement 1: RFC 2045 on converting text material to canonical form:
>"Care must be taken to use the proper octets for line breaks if base64
>encoding is applied directly to text material that has not been converted to
>canonical form.  In particular, text line breaks must be converted into CRLF
>sequences prior to base64 encoding.  The important thing to note is that
>this may be done directly by the encoder rather than in a prior
>canonicalization step in some implementations."
>
>2045 requirement 2: In terms of RFC 2045, requirement for "chunking" and
>ignoring white space when decoding: "The encoded output stream must be
>represented in lines of no more than 76 characters each.  All line breaks or
>other characters not found in Table 1 must be ignored by decoding software.
>In base64 data, characters other than those in Table 1, line breaks, and
>other white space probably indicate a transmission error, about which a
>warning message or even a message rejection might be appropriate under some
>circumstances."
>
>
>** Here is RFC 2616 HTTP 1.1 which talks about base64 of an MD5 digest in a
>header: http://www.ietf.org/rfc/rfc2616.txt?number=2616
>
>"Conversion of all line breaks to CRLF MUST NOT be done before computing or
>checking the digest: the line break convention used in the text actually
>transmitted MUST be left unaltered when computing the digest."
>
>"Note: while the definition of Content-MD5 is exactly the same for HTTP as
>in RFC 1864 for MIME entity-bodies, there are several ways in which the
>application of Content-MD5 to HTTP entity-bodies differs from its
>application to MIME entity-bodies. One is that HTTP, unlike MIME, does not
>use Content-Transfer-Encoding, and does use Transfer-Encoding and
>Content-Encoding. Another is that HTTP more frequently uses binary content
>types than MIME, so it is worth noting that, in such cases, the byte order
>used to compute the digest is the transmission byte order defined for the
>type. Lastly, HTTP allows transmission of text types with any of several
>line break conventions and not just the canonical form using CRLF."
>
>
>--------
>Tim O'Brien 
>
>  
>
>>-----Original Message-----
>>From: Jeffrey Dever [mailto:jsdever@sympatico.ca] 
>>Sent: Tuesday, February 04, 2003 10:16 AM
>>To: O'brien, Tim
>>Cc: 'Martin Redington'; rhoegg@isisnetworks.net
>>Subject: Re: Base64.java
>>
>>
>>Http is very cr/lf aware. We use Base64 for encoding/decoding values 
>>that are added to headers which are always appended with a cr/lf as a 
>>value is not to contain the line delimiter.
>>
>>Where (which) rfc does it state the trailing cr/lf?
>>
>>Jandalf.
>>
>>    
>>
>
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] RE: Base64.java

Posted by "O'brien, Tim" <to...@transolutions.net>.
All,

I think the only blocker to HttpClient using your Base64 class as is, is the
chunking.  I do agree that setting flags on an instance of Baset64 is
preferable to adding a flag in a method signature - but I think that adding
a flag to a method signature would provide get us to a short-term goal and
allow for more immediate reuse.    

To this end, I drew up a plan for Base64 -
http://cvs.apache.org/~tobrien/codec/evolve.html

I'm trying to set very small defined and attainable goals for a 1.1 release
which will include Base64 to be shared across XML-RPC and HttpClient.  There
are a lot of good discussions, but I think that "step2" would at least get
us to a point of release.


--------
Tim O'Brien 


> -----Original Message-----
> From: Martin Redington [mailto:m.redington@ucl.ac.uk] 
> Sent: Tuesday, February 04, 2003 12:24 PM
> To: tobrien@transolutions.net
> Cc: 'Jeffrey Dever'; commons-dev; rhoegg
> Subject: Re: [codec] RE: Base64.java
> 
> 
> 
> Hi all,
> 
>      personally I favour Ryan's suggestion of setting flags (and/or 
> system properties) separately to obtain non-RFC compliant 
> behaviour (or 
> to specify which RFC to follow, where they conflict), or to specify 
> that exceptions should be raised when encountering a non-Base64 char, 
> rather than adding additional args to method signatures.
> 
> Given the wide usage of this code, and the need to inter-operate 
> smoothly with other implementations that may or may not comply with a 
> particular RFC, giving the end-user as much flexibility as 
> possible is 
> probably a good thing and shouldn't add too much complexity to the 
> code. Maybe both approaches would be appropriate.
> 
>     cheers,
>            m.
> 
> >
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] RE: Base64.java

Posted by "O'brien, Tim" <to...@transolutions.net>.
You example below would make sense if the Base64 implementations were
dramatically different.  It should be noted that the only sticking point is
chunked encoding - in essence, XML-RPC needs an implementation that adds
"\n" every 76 characters, and HttpClient does not.  Making two different
implementation classes in this case might be overkill.

--------
Tim O'Brien

> -----Original Message-----
> From: Henri Yandell [mailto:bayard@generationjava.com] 
> Sent: Tuesday, February 04, 2003 12:56 PM
> To: Jakarta Commons Developers List
> Cc: tobrien@transolutions.net; 'Jeffrey Dever'; rhoegg
> Subject: Re: [codec] RE: Base64.java
> 
> 
> 
> The concept of setting flags etc seems to be quite poor OO. 
> Maybe I'm not understanding things properly though.
> 
> Shouldn't it be a classic FactoryMethod pattern?
> 
> Base64Utils
> Base64 interface
> 
> hidden classes:
> 
> RFCBase64
> OtherBase64
> JimsBase64
> 
> and then:
> 
> 
> Base64Utils->
> 
> public static Base64 RFCBase64 = new RFCBase64().
> ...etc..
> 
> ??
> 
> Hen
> 
> 
> On Tue, 4 Feb 2003, Martin Redington wrote:
> 
> >
> > Hi all,
> >
> >      personally I favour Ryan's suggestion of setting flags (and/or 
> > system properties) separately to obtain non-RFC compliant behaviour 
> > (or to specify which RFC to follow, where they conflict), or to 
> > specify that exceptions should be raised when encountering a 
> > non-Base64 char, rather than adding additional args to method 
> > signatures.
> >
> > Given the wide usage of this code, and the need to inter-operate 
> > smoothly with other implementations that may or may not 
> comply with a 
> > particular RFC, giving the end-user as much flexibility as 
> possible is 
> > probably a good thing and shouldn't add too much complexity to the 
> > code. Maybe both approaches would be appropriate.
> >
> >     cheers,
> >            m.
> >
> > >
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Henri Yandell <ba...@generationjava.com>.
The concept of setting flags etc seems to be quite poor OO. Maybe I'm not
understanding things properly though.

Shouldn't it be a classic FactoryMethod pattern?

Base64Utils
Base64 interface

hidden classes:

RFCBase64
OtherBase64
JimsBase64

and then:


Base64Utils->

public static Base64 RFCBase64 = new RFCBase64().
...etc..

??

Hen


On Tue, 4 Feb 2003, Martin Redington wrote:

>
> Hi all,
>
>      personally I favour Ryan's suggestion of setting flags (and/or
> system properties) separately to obtain non-RFC compliant behaviour (or
> to specify which RFC to follow, where they conflict), or to specify
> that exceptions should be raised when encountering a non-Base64 char,
> rather than adding additional args to method signatures.
>
> Given the wide usage of this code, and the need to inter-operate
> smoothly with other implementations that may or may not comply with a
> particular RFC, giving the end-user as much flexibility as possible is
> probably a good thing and shouldn't add too much complexity to the
> code. Maybe both approaches would be appropriate.
>
>     cheers,
>            m.
>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [codec] RE: Base64.java

Posted by Martin Redington <m....@ucl.ac.uk>.
Hi all,

     personally I favour Ryan's suggestion of setting flags (and/or 
system properties) separately to obtain non-RFC compliant behaviour (or 
to specify which RFC to follow, where they conflict), or to specify 
that exceptions should be raised when encountering a non-Base64 char, 
rather than adding additional args to method signatures.

Given the wide usage of this code, and the need to inter-operate 
smoothly with other implementations that may or may not comply with a 
particular RFC, giving the end-user as much flexibility as possible is 
probably a good thing and shouldn't add too much complexity to the 
code. Maybe both approaches would be appropriate.

    cheers,
           m.

>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org