You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jeff Whiting <je...@qualtrics.com> on 2011/11/23 08:04:28 UTC

What should happen if I restart hmaster?

If I run /etc/init.d/hadoop-hbase-master restart what would the expected behavior be?  Are the 
region servers supposed to stay up until the master is back?  Are they supposed to shutdown?  What 
is _supposed_ to happen?

My understanding was that the region servers should continue to server without problems until the 
master is comes backup.

Instead the master came up and started splitting logs (which I thought it would only do if a region 
server failed).  After it finished splitting the logs the master then got an unhandled exception and 
shutdown. Additionally 2 of my 3 regions servers went down.

Here are the logs from the time period (sorry they are only info, I thought I had debug turned on):

11/11/22 23:45:36 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
11/11/22 23:45:36 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321590507016 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321590507016
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321630125564 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321630125564
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321633728319 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321633728319
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321640931494 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321640931494
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321644533186 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321644533186
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321648134841 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321648134841
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321651736541 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321651736541
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321655338141 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321655338141
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321658939788 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321658939788
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321662541492 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321662541492
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321889464413 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321889464413
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321896668344 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321896668344
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321900269976 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321900269976
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321903871591 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321903871591
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321907473263 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321907473263
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321911074930 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321911074930
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321914676822 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321914676822
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321918278454 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321918278454
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321921880141 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321921880141
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321975907623 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321975907623
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321979509266 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321979509266
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321983110846 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321983110846
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321986712498 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321986712498
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321990314213 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321990314213
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321993915869 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321993915869
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321997517488 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321997517488
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322001119216 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322001119216
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322004720867 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322004720867
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322008322499 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322008322499
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322011924103 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322011924103
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322019127458 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322019127458
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322022729045 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322022729045
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322026330965 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322026330965
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322029932557 
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322029932557
11/11/22 23:45:36 INFO wal.HLogSplitter: Waiting for split writer threads to finish
11/11/22 23:45:36 INFO wal.HLogSplitter: Split writers finished
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/contact-contactGroup/001da4163bee0b274b3750a17f81f90f/recovered.edits/0000000000032711862 
(wrote 2034 edits in 44ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/.META./1028785192/recovered.edits/0000000000032796239 (wrote 31 edits in 16ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/objectStore-index/2816f23f249dfb174957634347999de0/recovered.edits/0000000000032711323 
(wrote 83 edits in 17ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/response-recordedCounts/3d9d3e311bf4debcadf65ec616a36b12/recovered.edits/0000000000032711321 
(wrote 1387 edits in 36ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/responseset/56e6dade0de8b5947db6a366c9b77825/recovered.edits/0000000000032796293 
(wrote 376 edits in 22ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/contact-globalContact/58598adca3ec0bff2b35071b2884fcf7/recovered.edits/0000000000032711863 
(wrote 1 edits in 8ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/contact-index/6312eedeedefb5ffa7c5eea73382d8bd/recovered.edits/0000000000032711865 
(wrote 653 edits in 31ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/-ROOT-/70236052/recovered.edits/0000000000032712155 (wrote 2 edits in 12ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/response/8a39da5db90d97aa1ea05955f9b3a1c9/recovered.edits/0000000000032711319 
(wrote 1459 edits in 51ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/auditLog/c996e677c5acbbb5021f0c26deecae83/recovered.edits/0000000000032796260 
(wrote 78 edits in 49ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/form-data-rows/d2ddf039c6ac8b4bcbfe1d314c448413/recovered.edits/0000000000032711330 
(wrote 13 edits in 48ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/response-index/f47b1b30cc91dced075434d59e807fab/recovered.edits/0000000000032711320 
(wrote 2020 edits in 60ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path 
hdfs://ds1:9000/hbase/form-revision/f49f3851905a39dc2173bc9468fc2f4f/recovered.edits/0000000000032711328 
(wrote 195 edits in 39ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: hlog file splitting completed in 35196 ms for 
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993
11/11/22 23:45:37 INFO master.HMaster: -ROOT- assigned=0, rit=false, 
location=ds4.dev.qualtrics.com:60020
11/11/22 23:45:37 INFO master.HMaster: .META. assigned=0, rit=false, 
location=ds4.dev.qualtrics.com:60020
11/11/22 23:45:37 INFO master.HMaster: Master startup proceeding: master failover
11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting shutdown.
java.lang.NullPointerException
     at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
     at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
     at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
11/11/22 23:45:37 INFO master.HMaster: Aborting
11/11/22 23:45:37 INFO ipc.HBaseServer: Stopping server on 60000
11/11/22 23:45:37 INFO ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
11/11/22 23:45:37 INFO ipc.HBaseServer: IPC Server handler 27 on 60000: exiting


Region server logs around that time:
11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block blk_6275991319376720851_37314 
failed  because recovery from primary datanode 10.1.37.4:50010 failed 6 times.  Pipeline was 
10.1.37.4:50010. Aborting...
11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed  because recovery 
from primary datanode 10.1.37.4:50010 failed 6 times.  Pipeline was 10.1.37.4:50010. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
java.io.IOException: Reflection
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
     at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
     at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
     ... 2 more
Caused by: java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed  
because recovery from primary datanode 10.1.37.4:50010 failed 6 times.  Pipeline was 
10.1.37.4:50010. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 ERROR wal.HLog: Error while syncing, requesting close of hlog
java.io.IOException: Reflection
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
     at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
     at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
     ... 2 more
Caused by: java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed  
because recovery from primary datanode 10.1.37.4:50010 failed 6 times.  Pipeline was 
10.1.37.4:50010. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 INFO wal.HLog: regionserver60020.logSyncer exiting
11/11/22 23:45:46 FATAL regionserver.HRegionServer: ABORTING region server 
serverName=ds4.dev.qualtrics.com,60020,1321385209993, load=(requests=0, regions=16, usedHeap=96, 
maxHeap=795): IOE in log roller
java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed  because recovery 
from primary datanode 10.1.37.4:50010 failed 6 times.  Pipeline was 10.1.37.4:50010. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 INFO regionserver.HRegionServer: Dump of metrics: requests=0, regions=16, 
stores=24, storefiles=28, storefileIndexSize=1, memstoreSize=3, compactionQueueSize=0, 
flushQueueSize=0, usedHeap=91, maxHeap=795, blockCacheSize=21610976, blockCacheFree=145296112, 
blockCacheCount=347, blockCacheHitCount=79827, blockCacheMissCount=21630, blockCacheEvictedCount=0, 
blockCacheHitRatio=78, blockCacheHitCachingRatio=99
11/11/22 23:45:46 WARN regionserver.HRegionServer: Unable to report fatal error to master
java.lang.reflect.UndeclaredThrowableException
     at $Proxy6.reportRSFatalError(Unknown Source)
     at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1418)
     at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:104)
Caused by: java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
     at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
     at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
     at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
     ... 3 more


Please note the "hadoop fsck /" says the file system is "healthy".
~Jeff

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

Re: What should happen if I restart hmaster?

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Filed an issue about the NPE here:

https://issues.apache.org/jira/browse/HBASE-4866

The line numbers are slightly different in CD3u2 but root cause is looks
possible in the 0.90 apache branch as well.

Jon.

On Wed, Nov 23, 2011 at 10:39 AM, Jeff Whiting <je...@qualtrics.com> wrote:

> We are running cdh3u2 with hbase 0.90.4-cdh3u2.  I'm guessing the logs
> were left over from a previous crash.  I also found some errand /etc/hosts
> entries that I'm guessing is the cause of the some of the weirdness.  So
> lets just say my problems are due to a misconfiguration and some previous
> crashes that showed up during the restart.
>
> However I'm thinking the null pointer is a really problem that shouldn't
> make the master die.
>
> ~Jeff
>
>
> On 11/23/2011 12:29 AM, Stack wrote:
>
>> On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting<je...@qualtrics.com>
>>  wrote:
>>
>>> If I run /etc/init.d/hadoop-hbase-**master restart what would the
>>> expected
>>> behavior be?  Are the region servers supposed to stay up until the
>>> master is
>>> back?  Are they supposed to shutdown?  What is _supposed_ to happen?
>>>
>>> My understanding was that the region servers should continue to server
>>> without problems until the master is comes backup.
>>>
>>>  Yes.  That is what is supposed to happen.
>>
>>
>>  Instead the master came up and started splitting logs (which I thought it
>>> would only do if a region server failed).
>>>
>> That is correct.
>>
>> Nothing had failed at the time.  Were these old logs left over from a
>> previous crash?
>>
>>
>>
>>  11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
>>> shutdown.
>>> java.lang.NullPointerException
>>>    at
>>> org.apache.hadoop.hbase.**master.AssignmentManager.**regionOnline(**
>>> AssignmentManager.java:731)
>>>    at
>>>
>>>  Which version of hbase?   I don't see a fix for this one in 0.90 branch.
>>
>>
>>
>>  Region server logs around that time:
>>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
>>> blk_6275991319376720851_37314 failed  because recovery from primary
>>> datanode
>>> 10.1.37.4:50010 failed 6 times.  Pipeline was 10.1.37.4:50010.
>>> Aborting...
>>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
>>> java.io.IOException: Error Recovery for block
>>> blk_6275991319376720851_37314
>>> failed  because recovery from primary datanode 10.1.37.4:50010 failed 6
>>> times.  Pipeline was 10.1.37.4:50010. Aborting...
>>>    at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>> processDatanodeError(**DFSClient.java:2833)
>>>    at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>> access$1600(DFSClient.java:**2305)
>>>    at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>> DataStreamer.run(DFSClient.**java:2477)
>>> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of
>>> hlog
>>> java.io.IOException: Reflection
>>>    at
>>>
>>>
>> This looks unrelated.  HDFS went away around this time?  What version of
>> hdfs?
>>
>> St.Ack
>>
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: What should happen if I restart hmaster?

Posted by Jeff Whiting <je...@qualtrics.com>.

We are running cdh3u2 with hbase 0.90.4-cdh3u2.  I'm guessing the logs were left over from a 
previous crash.  I also found some errand /etc/hosts entries that I'm guessing is the cause of the 
some of the weirdness.  So lets just say my problems are due to a misconfiguration and some previous 
crashes that showed up during the restart.

However I'm thinking the null pointer is a really problem that shouldn't make the master die.

~Jeff

On 11/23/2011 12:29 AM, Stack wrote:
> On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting<je...@qualtrics.com>  wrote:
>> If I run /etc/init.d/hadoop-hbase-master restart what would the expected
>> behavior be?  Are the region servers supposed to stay up until the master is
>> back?  Are they supposed to shutdown?  What is _supposed_ to happen?
>>
>> My understanding was that the region servers should continue to server
>> without problems until the master is comes backup.
>>
> Yes.  That is what is supposed to happen.
>
>
>> Instead the master came up and started splitting logs (which I thought it
>> would only do if a region server failed).
> That is correct.
>
> Nothing had failed at the time.  Were these old logs left over from a
> previous crash?
>
>
>
>> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
>> shutdown.
>> java.lang.NullPointerException
>>     at
>> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
>>     at
>>
> Which version of hbase?   I don't see a fix for this one in 0.90 branch.
>
>
>
>> Region server logs around that time:
>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
>> blk_6275991319376720851_37314 failed  because recovery from primary datanode
>> 10.1.37.4:50010 failed 6 times.  Pipeline was 10.1.37.4:50010. Aborting...
>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
>> java.io.IOException: Error Recovery for block blk_6275991319376720851_37314
>> failed  because recovery from primary datanode 10.1.37.4:50010 failed 6
>> times.  Pipeline was 10.1.37.4:50010. Aborting...
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
>>     at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
>> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
>> java.io.IOException: Reflection
>>     at
>>
>
> This looks unrelated.  HDFS went away around this time?  What version of hdfs?
>
> St.Ack

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

Re: What should happen if I restart hmaster?

Posted by Stack <st...@duboce.net>.

On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting <je...@qualtrics.com> wrote:
> If I run /etc/init.d/hadoop-hbase-master restart what would the expected
> behavior be?  Are the region servers supposed to stay up until the master is
> back?  Are they supposed to shutdown?  What is _supposed_ to happen?
>
> My understanding was that the region servers should continue to server
> without problems until the master is comes backup.
>

Yes.  That is what is supposed to happen.


> Instead the master came up and started splitting logs (which I thought it
> would only do if a region server failed).

That is correct.

Nothing had failed at the time.  Were these old logs left over from a
previous crash?



> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
> shutdown.
> java.lang.NullPointerException
>    at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
>    at
>

Which version of hbase?   I don't see a fix for this one in 0.90 branch.



> Region server logs around that time:
> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
> blk_6275991319376720851_37314 failed  because recovery from primary datanode
> 10.1.37.4:50010 failed 6 times.  Pipeline was 10.1.37.4:50010. Aborting...
> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
> java.io.IOException: Error Recovery for block blk_6275991319376720851_37314
> failed  because recovery from primary datanode 10.1.37.4:50010 failed 6
> times.  Pipeline was 10.1.37.4:50010. Aborting...
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
> java.io.IOException: Reflection
>    at
>


This looks unrelated.  HDFS went away around this time?  What version of hdfs?

St.Ack