You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rakhi Khatwani <ra...@gmail.com> on 2009/04/15 09:45:01 UTC

Map task gets hanged

Hi,

I was running a mapreduce job which takes data from table ContentTable,
processes it, and store the results into another table.
my mapreduce program had 20 maps out of which 19 maps completed successfully
the last map however took ages to complete.... after 10 hrs we had to kill
the task (at 15-Apr-2009 04:59:39 (10hrs, 30mins, 3sec)).


here are the regionserver logs around that time and its really weird....
there were no logs for 3 hrs!!! :(

2009-04-15 02:21:43,417 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
check on ContentTable,
http://www.dnaindia.com/report.asp?newsid=1243858,1239719376495
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
        at
org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
        at
org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
2009-04-15 02:21:43,417 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
check on ContentTable,http://www.cnbc.com//id/29864724,1239692396718
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
        at
org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
        at
org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
2009-04-15 05:08:23,414 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
check on ContentTable,
http://blog.taragana.com/n/lovelorn-fiza-to-act-in-desh-drohi-sequel-24445/,1239692371324
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
        at
org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
        at
org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
2009-04-15 05:08:23,414 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
check on ContentTable,
http://www.modernghana.com/news/208936/1/past-present-and-future-of-the-indian-national-con.html,1239718472792
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
        at
org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
        at
org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
        at
org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)


But still, the entire log is filled with this warning! is it serious?? or
can it be ignored?


the datanode logs are fine uptill 2009-04-15 05:07:12 where i get the
following exception.

2009-04-15 05:07:12,093 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-1660273199073776411_91663 received exception java.io.IOException: Block
blk_-1660273199073776411_91663 is valid, and cannot be written to.
2009-04-15 05:07:12,093 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.255.127.31:50010,
storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-1660273199073776411_91663 is valid, and
cannot be written to.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:958)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:98)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:258)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
        at java.lang.Thread.run(Thread.java:619)
2009-04-15 05:07:13,671 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.255.127.31:50010,
storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
ipcPort=50020) Starting thread to transfer block
blk_5200295531482229843_91665 to 10.254.22.255:50010
2009-04-15 05:07:13,672 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.255.127.31:50010,
storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
ipcPort=50020) Starting thread to transfer block
blk_-1660273199073776411_91663 to 10.255.107.224:50010
2009-04-15 05:07:14,161 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.255.127.31:50010,
storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
ipcPort=50020):Transmitted block blk_5200295531482229843_91665 to /
10.254.22.255:50010

and i have set dataxceivers to 2048

What could be the issue?

Thanks
Raakhi

Re: Map task gets hanged

Posted by jason hadoop <ja...@gmail.com>.
The no log messages for 3 hours reminds me of an odd OS level failure that
would happen on some machines.
The underlying host file system would get into a deadlock state, and the
hadoop processes would attempt to write a log message and hang.
The first noticeable symptom of this is that the machines had multiple
instances of update-db running (a once per day scan of the file system to
prime the locate command's cache).
This was not resolved by the time I left, and the monitoring was modified to
catch the failure earlier.

Sagar, did this ever get resolved?


On Wed, Apr 15, 2009 at 12:45 AM, Rakhi Khatwani
<ra...@gmail.com>wrote:

> Hi,
>
> I was running a mapreduce job which takes data from table ContentTable,
> processes it, and store the results into another table.
> my mapreduce program had 20 maps out of which 19 maps completed
> successfully
> the last map however took ages to complete.... after 10 hrs we had to kill
> the task (at 15-Apr-2009 04:59:39 (10hrs, 30mins, 3sec)).
>
>
> here are the regionserver logs around that time and its really weird....
> there were no logs for 3 hrs!!! :(
>
> 2009-04-15 02:21:43,417 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
> check on ContentTable,
> http://www.dnaindia.com/report.asp?newsid=1243858,1239719376495
> java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
>        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
> 2009-04-15 02:21:43,417 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
> check on ContentTable,http://www.cnbc.com//id/29864724,1239692396718
> java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
>        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
> 2009-04-15 05:08:23,414 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
> check on ContentTable,
>
> http://blog.taragana.com/n/lovelorn-fiza-to-act-in-desh-drohi-sequel-24445/,1239692371324
> java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
>        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
> 2009-04-15 05:08:23,414 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed major compaction
> check on ContentTable,
>
> http://www.modernghana.com/news/208936/1/past-present-and-future-of-the-indian-national-con.html,1239718472792
> java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
>        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:567)
>        at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:226)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.getLowestTimestamp(HStore.java:785)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:988)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.isMajorCompaction(HStore.java:976)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2585)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:843)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:65)
>
>
> But still, the entire log is filled with this warning! is it serious?? or
> can it be ignored?
>
>
> the datanode logs are fine uptill 2009-04-15 05:07:12 where i get the
> following exception.
>
> 2009-04-15 05:07:12,093 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-1660273199073776411_91663 received exception java.io.IOException:
> Block
> blk_-1660273199073776411_91663 is valid, and cannot be written to.
> 2009-04-15 05:07:12,093 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.255.127.31:50010,
> storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-1660273199073776411_91663 is valid, and
> cannot be written to.
>        at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:958)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:98)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:258)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
>        at java.lang.Thread.run(Thread.java:619)
> 2009-04-15 05:07:13,671 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.255.127.31:50010,
> storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
> ipcPort=50020) Starting thread to transfer block
> blk_5200295531482229843_91665 to 10.254.22.255:50010
> 2009-04-15 05:07:13,672 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.255.127.31:50010,
> storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
> ipcPort=50020) Starting thread to transfer block
> blk_-1660273199073776411_91663 to 10.255.107.224:50010
> 2009-04-15 05:07:14,161 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.255.127.31:50010,
> storageID=DS-1366610166-10.255.127.31-50010-1239371098677, infoPort=50075,
> ipcPort=50020):Transmitted block blk_5200295531482229843_91665 to /
> 10.254.22.255:50010
>
> and i have set dataxceivers to 2048
>
> What could be the issue?
>
> Thanks
> Raakhi
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422