You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Laurent Goujon (JIRA)" <ji...@apache.org> on 2014/01/18 06:52:19 UTC

[jira] [Resolved] (HDFS-5798) DFSClient uses non-valid data when computing file checksum

     [ https://issues.apache.org/jira/browse/HDFS-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Laurent Goujon resolved HDFS-5798.
----------------------------------

    Resolution: Duplicate

> DFSClient uses non-valid data when computing file checksum
> ----------------------------------------------------------
>
>                 Key: HDFS-5798
>                 URL: https://issues.apache.org/jira/browse/HDFS-5798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.1.2, 2.0.5-alpha
>            Reporter: Laurent Goujon
>
> In DFSClient.java, when computing the checksum, all md5 checksums are fetched for each block and added to a DataOutputStream instance (md5out), and later final checksum is computed this way:
> {code:title=DFSClient.java}
> final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
> {code}
> The problem is that getData() return you a buffer valid until md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a bunch of random values (here, buffer is not reused so it should be 0) which depends on the Java implementation of the ByteArrayOutputStream.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)