You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Julius Davies (JIRA)" <ji...@apache.org> on 2010/06/02 23:39:41 UTC
[jira] Issue Comment Edited: (CODEC-91) Handling of embedded padding in base64 encoded data changed in 1.4

    [ https://issues.apache.org/jira/browse/CODEC-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874811#action_12874811 ] 

Julius Davies edited comment on CODEC-91 at 6/2/10 5:39 PM:
------------------------------------------------------------

I carefully analyzed the codec-1.3 logic.  It doesn't quite handle this scenario correctly.  There's a bug.  Result of running the test case originally supplied by Chris Pimlott against codec-1.3 is the following:

{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}
{code}

That NULL shouldn't be there (actually a zero since it's a primitive, but you know what I mean), so the decode in the commons-codec-1.3 logic is actually causing a corruption of the data in this scenario.

----------------------
So what should we do for 1.4.X?  Here's what's on the table at the moment:

Option 0:  Treat first PAD as EOF and stop all processing (current 1.4 behaviour).

Option 1:  Restart decode state when PAD is encountered in any location.

Option 2:  Only restart decode state if PAD's are correctly placed in the stream (ie. AB==, ABC=).  This is closest to the codec-1.3.jar behaviour, but without the NULL bug.


Personally I'm +1 to option 1, for the following reasons:

- Probably easier to program.  The implementation doesn't really need to know much about the PAD when it encounters one (ie. no need to ask is this a good PAD or a bad PAD?).

- Correctly decodes all Base64 inputs where PADs are in the correct place, so in a way we're already implementing option 2 here.

- Also properly decodes the invalid input "AB=AB=" if stream corruption happened because of: s/==/=/g 

- For all other inputs it's garbage-in garbage-out





      was (Author: juliusdavies):
    I carefully analyzed the codec-1.3 logic.  It doesn't quite handle this scenario correctly.  There's a bug.  Result of running the test case originally supplied by Chris Pimlott against codec-1.3 is the following:

{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}

That NULL shouldn't be there (actually a zero since it's a primitive, but you know what I mean), so the decode in the commons-codec-1.3 logic is actually causing a corruption of the data in this scenario.

----------------------
So what should we do for 1.4.X?  Here's what's on the table at the moment:

Option 0:  Treat first PAD as EOF and stop all processing (current 1.4 behaviour).

Option 1:  Restart decode state when PAD is encountered in any location.

Option 2:  Only restart decode state if PAD's are correctly placed in the stream (ie. AB==, ABC=).  This is closest to the codec-1.3.jar behaviour, but without the NULL bug.


Personally I'm +1 to option 1, for the following reasons:

- Probably easier to program.  The implementation doesn't really need to know much about the PAD when it encounters one (ie. no need to ask is this a good PAD or a bad PAD?).

- Correctly decodes all Base64 inputs where PADs are in the correct place, so in a way we're already implementing option 2 here.

- Also properly decodes the invalid input "AB=AB=" if stream corruption happened because of: s/==/=/g 

- For all other inputs it's garbage-in garbage-out




  
> Handling of embedded padding in base64 encoded data changed in 1.4
> ------------------------------------------------------------------
>
>                 Key: CODEC-91
>                 URL: https://issues.apache.org/jira/browse/CODEC-91
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Chris Pimlott
>         Attachments: codec-91-actually-works-and-tests-yay.patch
>
>
> 1.4 changed the way that embedded padding characters (i.e. "=") were handled when decoding base64 data.  Previously, the decoder ignored them and decoded all the data.  Now it stops upon encountering the first padding byte.  This breaks compatibility with previous versions.
> For example, in 1.4,
> b64.decode("Y29tbW9ucwo=".getBytes());
> gives the same result as
> b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.