You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by W S Chung <qp...@gmail.com> on 2011/08/17 17:26:21 UTC
org.apache.hadoop.fs.ChecksumException: Checksum error:

After I load some 15 files each of about 150M in size into a partition
of a table and run a select count(*) on the table, I keep getting an
error. In the JobTracker Web Interface, this turns out to be due to a
number checksum error, like this:

      org.apache.hadoop.fs.ChecksumException: Checksum error:
/blk_8155249261522439492:of:/user/hive/warehouse/att_log/collect_time=1313592519963/load.dat
at 51794944
	at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
	at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1660)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2257)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2307)
	at java.io.DataInputStream.read(DataInputStream.java:83)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
	at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:159)

I have tried to reformat the filesystem and reload, but the problem
persists, although the number of corrupted block varies each time. I
have also tried setting "io.skip.checksum.errors" to true, but it
stills make no difference.

I use fsck to see when the file corruption happens. Oddly after the
data is loaded, fsck still detect no corrupted block. It is after the
select count(*) that fsck detect corrupted block.

If I just hadoop fs -cat on the hadoop file, I get an error like this:

org.apache.hadoop.fs.ChecksumException: Checksum error:
/blk_6876231585863639009:of:/user/hive/warehouse/att_log/collect_time=1313592542265/load.dat
at 376832
        at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1660)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2257)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2307)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:53)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:118)
        at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
        at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:356)
        at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1934)
        at org.apache.hadoop.fs.FsShell.cat(FsShell.java:350)
        at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1568)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1790)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1916)
11/08/17 11:22:41 WARN hdfs.DFSClient: Found Checksum error for
blk_6876231585863639009_1004 from 192.168.50.192:50010 at 376832
11/08/17 11:22:41 INFO hdfs.DFSClient: Could not obtain block
blk_6876231585863639009_1004 from node:   java.io.IOException: No live
nodes contain current block
11/08/17 11:22:41 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1
IOException, will wait for 1879.3346505085124 msec.
cat: Checksum error:
/blk_6876231585863639009:of:/user/hive/warehouse/att_log/collect_time=1313592542265/load.dat
at 376832


Does anyone know how to get around this issue? Thanks.