You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Konstantinos Kougios <ko...@googlemail.com> on 2015/09/30 14:47:08 UTC

aggregate query makes all region servers crash

I have 3xregion servers, 8GB mem each, and running this query via 
sqlline.py:

select count(*),word from words group by word limit 10;

So far 3 region servers died, the 1st one with no error in the log, the 
second one with this (some race condition with an other region server? 
as I have been restarting the 1st crashed server):

2015-09-30 13:26:45,429 INFO  [RS_OPEN_REGION-d1:16020-1] 
coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
e211961cd190cf57f8c5a691bd3f265f, NAME => 
'PERFORMANCE_1000,EUSalesforce\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1442843941238.e211961cd190cf57f8c5a691bd3f265f.', 
STARTKEY => 'EUSalesforce\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 
ENDKEY => 'NAApple\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'} failed, 
transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 2
2015-09-30 13:26:47,786 INFO 
[regionserver/d1.lan/192.168.0.29:16020.logRoller] 
regionserver.LogRoller: LogRoller exiting.
2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.CompactSplitThread: Waiting for Large Compaction Thread to 
finish...
2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.CompactSplitThread: Waiting for Small Compaction Thread to 
finish...
2015-09-30 13:26:48,282 INFO [regionserver/d1.lan/192.168.0.29:16020] 
client.ConnectionManager$HConnectionImplementation: Closing zookeeper 
sessionid=0x1501e1145c90002
2015-09-30 13:26:48,299 INFO [regionserver/d1.lan/192.168.0.29:16020] 
zookeeper.ZooKeeper: Session: 0x1501e1145c90002 closed
2015-09-30 13:26:48,299 INFO 
[regionserver/d1.lan/192.168.0.29:16020-EventThread] 
zookeeper.ClientCnxn: EventThread shut down
2015-09-30 13:26:48,300 INFO [regionserver/d1.lan/192.168.0.29:16020] 
ipc.RpcServer: Stopping server on 16020
2015-09-30 13:26:48,300 INFO  [RpcServer.listener,port=16020] 
ipc.RpcServer: RpcServer.listener,port=16020: stopping
2015-09-30 13:26:48,301 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopped
2015-09-30 13:26:48,335 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopping
2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] 
zookeeper.ZooKeeper: Session: 0x1501e1145c90000 closed
2015-09-30 13:26:48,387 INFO  [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down
2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.HRegionServer: stopping server d1.lan,16020,1443613463226; 
zookeeper connection closed.
2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] 
regionserver.HRegionServer: regionserver/d1.lan/192.168.0.29:16020 exiting
2015-09-30 13:26:48,388 ERROR [main] 
regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
         at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
         at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
         at 
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)
2015-09-30 13:26:48,390 INFO  [Thread-6] regionserver.ShutdownHook: 
Shutdown hook starting; hbase.shutdown.hook=true; 
fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@31dadd46
2015-09-30 13:26:48,390 INFO  [Thread-6] regionserver.ShutdownHook: 
Starting fs shutdown hook thread.
2015-09-30 13:26:48,391 INFO  [Thread-6] regionserver.ShutdownHook: 
Shutdown hook finished.

I am keeping an eye on the region servers via jmx and they didn't seem 
to have any memory pressure.

sqlline exceptions:

15/09/30 12:38:56 ERROR zookeeper.ZooKeeperWatcher: 
hconnection-0x358c99f5-0x501df0e3cf000f, quorum=nn.lan:2181, 
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /hbase/meta-region-server
     at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
     at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:360)
     at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:745)
     at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:482)
     at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
     at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:600)
     at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
     at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
     at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
     at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185)
     at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152)
     at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
     at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
     at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
     at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
     at 
org.apache.hadoop.hbase.client.StatsTrackingRpcRetryingCaller.callWithoutRetries(StatsTrackingRpcRetryingCaller.java:56)
     at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
     at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
     at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1249)
     at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
     at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
     at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
     at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
     at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
     at 
org.apache.hadoop.hbase.client.StatsTrackingRpcRetryingCaller.callWithoutRetries(StatsTrackingRpcRetryingCaller.java:56)
     at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
     at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
     at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
     at 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
     at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:809)
     at 
org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:67)
     at 
org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:88)
     at 
org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:79)
     at 
org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:105)
     at 
org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:100)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at 
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)