You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Raymond Jennings III <ra...@yahoo.com> on 2010/08/14 05:12:22 UTC

Is hdfs reliable? Very odd error

I copied a 230GB file into my hadoop cluster.  After my MR job kept failing I 
tracked down the error to one line of formatted text.

I copied the file back out of hdfs and when I compare it to the original file 
there are about 20 bytes on one line (out of 230GB) that are different.

Is there no CRC or checksum done when copying files into hdfs?

(Just to be clear, I copied the original file out of hdfs - not the output of my 
MR job.)