You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sangjin Lee <sj...@apache.org> on 2013/08/31 04:24:46 UTC

issues with distcp copying from 1.0 to 2.0

This may have been discussed in the past, but I haven't been able to find
one...

It seems as though much work has been done to make distcp from 1.0 to 2.0
work with checksum enabled (
https://issues.apache.org/jira/browse/HADOOP-8060). And I do see all the
work has been merged to the 2.0 releases. However, it seems that distcp
from 1.0 to 2.0 still doesn't work if the CRC check is enabled. Is that a
correct understanding?

I took a quick look at the distcp code (mostly around CopyMapper and
RetriableFileCopyCommand), and I don't see how the source checksum type is
passed into creating the file with DFSClient. And also it doesn't look like
dfs.checksum.type is being set upon discovering the source checksum type
(which would have been another mechanism). And this is consistent with my
testing. And I can also confirm that it works if I pass in command line
option "-Ddfs.checksum.type=CRC32".

Is this understanding accurate? If so, is there a reason this was not done
in distcp? Curious...

Thanks,
Sangjin