You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jeff Whiting <je...@qualtrics.com> on 2011/11/23 08:04:28 UTC
What should happen if I restart hmaster?
If I run /etc/init.d/hadoop-hbase-master restart what would the expected behavior be? Are the
region servers supposed to stay up until the master is back? Are they supposed to shutdown? What
is _supposed_ to happen?
My understanding was that the region servers should continue to server without problems until the
master is comes backup.
Instead the master came up and started splitting logs (which I thought it would only do if a region
server failed). After it finished splitting the logs the master then got an unhandled exception and
shutdown. Additionally 2 of my 3 regions servers went down.
Here are the logs from the time period (sorry they are only info, I thought I had debug turned on):
11/11/22 23:45:36 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
11/11/22 23:45:36 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321590507016
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321590507016
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321630125564
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321630125564
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321633728319
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321633728319
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321640931494
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321640931494
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321644533186
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321644533186
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321648134841
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321648134841
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321651736541
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321651736541
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321655338141
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321655338141
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321658939788
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321658939788
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321662541492
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321662541492
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321889464413
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321889464413
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321896668344
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321896668344
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321900269976
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321900269976
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321903871591
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321903871591
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321907473263
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321907473263
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321911074930
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321911074930
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321914676822
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321914676822
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321918278454
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321918278454
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321921880141
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321921880141
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321975907623
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321975907623
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321979509266
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321979509266
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321983110846
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321983110846
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321986712498
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321986712498
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321990314213
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321990314213
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321993915869
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321993915869
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1321997517488
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1321997517488
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322001119216
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322001119216
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322004720867
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322004720867
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322008322499
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322008322499
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322011924103
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322011924103
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322019127458
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322019127458
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322022729045
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322022729045
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322026330965
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322026330965
11/11/22 23:45:36 INFO wal.HLogSplitter: Archived processed log
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993/ds4.dev.qualtrics.com%3A60020.1322029932557
to hdfs://ds1:9000/hbase/.oldlogs/ds4.dev.qualtrics.com%3A60020.1322029932557
11/11/22 23:45:36 INFO wal.HLogSplitter: Waiting for split writer threads to finish
11/11/22 23:45:36 INFO wal.HLogSplitter: Split writers finished
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/contact-contactGroup/001da4163bee0b274b3750a17f81f90f/recovered.edits/0000000000032711862
(wrote 2034 edits in 44ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/.META./1028785192/recovered.edits/0000000000032796239 (wrote 31 edits in 16ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/objectStore-index/2816f23f249dfb174957634347999de0/recovered.edits/0000000000032711323
(wrote 83 edits in 17ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/response-recordedCounts/3d9d3e311bf4debcadf65ec616a36b12/recovered.edits/0000000000032711321
(wrote 1387 edits in 36ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/responseset/56e6dade0de8b5947db6a366c9b77825/recovered.edits/0000000000032796293
(wrote 376 edits in 22ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/contact-globalContact/58598adca3ec0bff2b35071b2884fcf7/recovered.edits/0000000000032711863
(wrote 1 edits in 8ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/contact-index/6312eedeedefb5ffa7c5eea73382d8bd/recovered.edits/0000000000032711865
(wrote 653 edits in 31ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/-ROOT-/70236052/recovered.edits/0000000000032712155 (wrote 2 edits in 12ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/response/8a39da5db90d97aa1ea05955f9b3a1c9/recovered.edits/0000000000032711319
(wrote 1459 edits in 51ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/auditLog/c996e677c5acbbb5021f0c26deecae83/recovered.edits/0000000000032796260
(wrote 78 edits in 49ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/form-data-rows/d2ddf039c6ac8b4bcbfe1d314c448413/recovered.edits/0000000000032711330
(wrote 13 edits in 48ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/response-index/f47b1b30cc91dced075434d59e807fab/recovered.edits/0000000000032711320
(wrote 2020 edits in 60ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: Closed path
hdfs://ds1:9000/hbase/form-revision/f49f3851905a39dc2173bc9468fc2f4f/recovered.edits/0000000000032711328
(wrote 195 edits in 39ms)
11/11/22 23:45:37 INFO wal.HLogSplitter: hlog file splitting completed in 35196 ms for
hdfs://ds1:9000/hbase/.logs/ds4.dev.qualtrics.com,60020,1321385209993
11/11/22 23:45:37 INFO master.HMaster: -ROOT- assigned=0, rit=false,
location=ds4.dev.qualtrics.com:60020
11/11/22 23:45:37 INFO master.HMaster: .META. assigned=0, rit=false,
location=ds4.dev.qualtrics.com:60020
11/11/22 23:45:37 INFO master.HMaster: Master startup proceeding: master failover
11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting shutdown.
java.lang.NullPointerException
at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
11/11/22 23:45:37 INFO master.HMaster: Aborting
11/11/22 23:45:37 INFO ipc.HBaseServer: Stopping server on 60000
11/11/22 23:45:37 INFO ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
11/11/22 23:45:37 INFO ipc.HBaseServer: IPC Server handler 27 on 60000: exiting
Region server logs around that time:
11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block blk_6275991319376720851_37314
failed because recovery from primary datanode 10.1.37.4:50010 failed 6 times. Pipeline was
10.1.37.4:50010. Aborting...
11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed because recovery
from primary datanode 10.1.37.4:50010 failed 6 times. Pipeline was 10.1.37.4:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
java.io.IOException: Reflection
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
... 2 more
Caused by: java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed
because recovery from primary datanode 10.1.37.4:50010 failed 6 times. Pipeline was
10.1.37.4:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 ERROR wal.HLog: Error while syncing, requesting close of hlog
java.io.IOException: Reflection
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
... 2 more
Caused by: java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed
because recovery from primary datanode 10.1.37.4:50010 failed 6 times. Pipeline was
10.1.37.4:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 INFO wal.HLog: regionserver60020.logSyncer exiting
11/11/22 23:45:46 FATAL regionserver.HRegionServer: ABORTING region server
serverName=ds4.dev.qualtrics.com,60020,1321385209993, load=(requests=0, regions=16, usedHeap=96,
maxHeap=795): IOE in log roller
java.io.IOException: Error Recovery for block blk_6275991319376720851_37314 failed because recovery
from primary datanode 10.1.37.4:50010 failed 6 times. Pipeline was 10.1.37.4:50010. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
11/11/22 23:45:46 INFO regionserver.HRegionServer: Dump of metrics: requests=0, regions=16,
stores=24, storefiles=28, storefileIndexSize=1, memstoreSize=3, compactionQueueSize=0,
flushQueueSize=0, usedHeap=91, maxHeap=795, blockCacheSize=21610976, blockCacheFree=145296112,
blockCacheCount=347, blockCacheHitCount=79827, blockCacheMissCount=21630, blockCacheEvictedCount=0,
blockCacheHitRatio=78, blockCacheHitCachingRatio=99
11/11/22 23:45:46 WARN regionserver.HRegionServer: Unable to report fatal error to master
java.lang.reflect.UndeclaredThrowableException
at $Proxy6.reportRSFatalError(Unknown Source)
at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1418)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:104)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
... 3 more
Please note the "hadoop fsck /" says the file system is "healthy".
~Jeff
--
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com
Re: What should happen if I restart hmaster?
Posted by Jonathan Hsieh <jo...@cloudera.com>.
Filed an issue about the NPE here:
https://issues.apache.org/jira/browse/HBASE-4866
The line numbers are slightly different in CD3u2 but root cause is looks
possible in the 0.90 apache branch as well.
Jon.
On Wed, Nov 23, 2011 at 10:39 AM, Jeff Whiting <je...@qualtrics.com> wrote:
> We are running cdh3u2 with hbase 0.90.4-cdh3u2. I'm guessing the logs
> were left over from a previous crash. I also found some errand /etc/hosts
> entries that I'm guessing is the cause of the some of the weirdness. So
> lets just say my problems are due to a misconfiguration and some previous
> crashes that showed up during the restart.
>
> However I'm thinking the null pointer is a really problem that shouldn't
> make the master die.
>
> ~Jeff
>
>
> On 11/23/2011 12:29 AM, Stack wrote:
>
>> On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting<je...@qualtrics.com>
>> wrote:
>>
>>> If I run /etc/init.d/hadoop-hbase-**master restart what would the
>>> expected
>>> behavior be? Are the region servers supposed to stay up until the
>>> master is
>>> back? Are they supposed to shutdown? What is _supposed_ to happen?
>>>
>>> My understanding was that the region servers should continue to server
>>> without problems until the master is comes backup.
>>>
>>> Yes. That is what is supposed to happen.
>>
>>
>> Instead the master came up and started splitting logs (which I thought it
>>> would only do if a region server failed).
>>>
>> That is correct.
>>
>> Nothing had failed at the time. Were these old logs left over from a
>> previous crash?
>>
>>
>>
>> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
>>> shutdown.
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hbase.**master.AssignmentManager.**regionOnline(**
>>> AssignmentManager.java:731)
>>> at
>>>
>>> Which version of hbase? I don't see a fix for this one in 0.90 branch.
>>
>>
>>
>> Region server logs around that time:
>>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
>>> blk_6275991319376720851_37314 failed because recovery from primary
>>> datanode
>>> 10.1.37.4:50010 failed 6 times. Pipeline was 10.1.37.4:50010.
>>> Aborting...
>>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
>>> java.io.IOException: Error Recovery for block
>>> blk_6275991319376720851_37314
>>> failed because recovery from primary datanode 10.1.37.4:50010 failed 6
>>> times. Pipeline was 10.1.37.4:50010. Aborting...
>>> at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>> processDatanodeError(**DFSClient.java:2833)
>>> at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>> access$1600(DFSClient.java:**2305)
>>> at
>>> org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>> DataStreamer.run(DFSClient.**java:2477)
>>> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of
>>> hlog
>>> java.io.IOException: Reflection
>>> at
>>>
>>>
>> This looks unrelated. HDFS went away around this time? What version of
>> hdfs?
>>
>> St.Ack
>>
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>
--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com
Re: What should happen if I restart hmaster?
Posted by Jeff Whiting <je...@qualtrics.com>.
We are running cdh3u2 with hbase 0.90.4-cdh3u2. I'm guessing the logs were left over from a
previous crash. I also found some errand /etc/hosts entries that I'm guessing is the cause of the
some of the weirdness. So lets just say my problems are due to a misconfiguration and some previous
crashes that showed up during the restart.
However I'm thinking the null pointer is a really problem that shouldn't make the master die.
~Jeff
On 11/23/2011 12:29 AM, Stack wrote:
> On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting<je...@qualtrics.com> wrote:
>> If I run /etc/init.d/hadoop-hbase-master restart what would the expected
>> behavior be? Are the region servers supposed to stay up until the master is
>> back? Are they supposed to shutdown? What is _supposed_ to happen?
>>
>> My understanding was that the region servers should continue to server
>> without problems until the master is comes backup.
>>
> Yes. That is what is supposed to happen.
>
>
>> Instead the master came up and started splitting logs (which I thought it
>> would only do if a region server failed).
> That is correct.
>
> Nothing had failed at the time. Were these old logs left over from a
> previous crash?
>
>
>
>> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
>> shutdown.
>> java.lang.NullPointerException
>> at
>> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
>> at
>>
> Which version of hbase? I don't see a fix for this one in 0.90 branch.
>
>
>
>> Region server logs around that time:
>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
>> blk_6275991319376720851_37314 failed because recovery from primary datanode
>> 10.1.37.4:50010 failed 6 times. Pipeline was 10.1.37.4:50010. Aborting...
>> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
>> java.io.IOException: Error Recovery for block blk_6275991319376720851_37314
>> failed because recovery from primary datanode 10.1.37.4:50010 failed 6
>> times. Pipeline was 10.1.37.4:50010. Aborting...
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
>> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
>> java.io.IOException: Reflection
>> at
>>
>
> This looks unrelated. HDFS went away around this time? What version of hdfs?
>
> St.Ack
--
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com
Re: What should happen if I restart hmaster?
Posted by Stack <st...@duboce.net>.
On Tue, Nov 22, 2011 at 11:04 PM, Jeff Whiting <je...@qualtrics.com> wrote:
> If I run /etc/init.d/hadoop-hbase-master restart what would the expected
> behavior be? Are the region servers supposed to stay up until the master is
> back? Are they supposed to shutdown? What is _supposed_ to happen?
>
> My understanding was that the region servers should continue to server
> without problems until the master is comes backup.
>
Yes. That is what is supposed to happen.
> Instead the master came up and started splitting logs (which I thought it
> would only do if a region server failed).
That is correct.
Nothing had failed at the time. Were these old logs left over from a
previous crash?
> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting
> shutdown.
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
> at
>
Which version of hbase? I don't see a fix for this one in 0.90 branch.
> Region server logs around that time:
> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error Recovery for block
> blk_6275991319376720851_37314 failed because recovery from primary datanode
> 10.1.37.4:50010 failed 6 times. Pipeline was 10.1.37.4:50010. Aborting...
> 11/11/22 23:45:46 WARN hdfs.DFSClient: Error while syncing
> java.io.IOException: Error Recovery for block blk_6275991319376720851_37314
> failed because recovery from primary datanode 10.1.37.4:50010 failed 6
> times. Pipeline was 10.1.37.4:50010. Aborting...
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
> 11/11/22 23:45:46 FATAL wal.HLog: Could not append. Requesting close of hlog
> java.io.IOException: Reflection
> at
>
This looks unrelated. HDFS went away around this time? What version of hdfs?
St.Ack