You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Serega Sheypak <se...@gmail.com> on 2015/01/11 20:47:02 UTC
Strange HBase failure
Hi, I have PoC HBase cluster running on 3 VM
deployment schema is:
NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
NODE02 RegionServer, DN
NODE03 RegionServer, DN
Suddenly ONLY HBase went offline, all services: HM RS
HDFS was working, no alerts were there
ZK server was working, no alerts there.
VMWare didn't publish any alerts.
Only restart of HBase service helped.
We are using this:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
hbase-0.94.15+113
I made a deep dive into logs and found this stuff:
08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
regionserver60020.periodicFlusher requesting flush for region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
after a delay of 3026
08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
ROW, CompoundBloomFilterWriter
08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
CompoundBloomFilterWriter
08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb
08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb
08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
entries=8, sequenceid=229362, filesize=2.7 K
08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
in 80ms, sequenceid=229362, compaction requested=true
08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
Starting compaction on CF in region
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
Starting compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
seqid=229362, totalSize=76.6 M
08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
ROW, CompoundBloomFilterWriter
08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
CompoundBloomFilterWriter
08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2
08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
Renaming compacted file at
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
to hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2
08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
Completed major compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
store is 76.6 M
08:15:59.283INFO
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
completed compaction:
regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
time=8079772165508216; duration=4sec
08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
Roll /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
entries=3, filesize=7313. for
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
moving old hlog file
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
whose highest sequenceid is 229359 to
/hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
regionserver60020.periodicFlusher requesting flush for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
delay of 14844
09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
regionserver60020.periodicFlusher requesting flush for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
delay of 21848
09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
ROW, CompoundBloomFilterWriter
09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
Delete Family Bloom filter type for
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
CompoundBloomFilterWriter
09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
General Bloom and NO DeleteFamily was added to HFile
(hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d
09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d
09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
entries=16, sequenceid=229369, filesize=4.5 K
09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
90ms, sequenceid=229369, compaction requested=false
09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
Client session timed out, have not heard from server in 40003ms for
sessionid 0x1493f19431b0005, closing socket connection and attempting
reconnect
09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
Client session timed out, have not heard from server in 40020ms for
sessionid 0x1493f19431b0000, closing socket connection and attempting
reconnect
09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
Opening socket connection to server
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
authenticate using SASL (java.lang.SecurityException: Unable to locate
a login configuration)
09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
Socket connection established to
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
Opening socket connection to server
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
authenticate using SASL (java.lang.SecurityException: Unable to locate
a login configuration)
09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
Socket connection established to
etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
has expired, closing socket connection
09:17:20.582INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
This client just lost it's session with ZooKeeper, will automatically
reconnect when needed.
09:17:20.582INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
ZK session expired. This disconnect could have been caused by a
network partition or a long-running GC pause, either way it's
recommended that you verify your environment.
09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
EventThread shut down
09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
has expired, closing socket connection
09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
ABORTING region server
etp-hdfs-n1-sg.passport.local,60020,1414102905372:
regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
RegionServer abort: loaded coprocessors are: []
09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
readRequestsCount=279141, writeRequestsCount=91793,
compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
blockCacheCount=7, blockCacheHitCount=596347,
blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
fsReadLatencyHistogramMean=330098.85,
fsReadLatencyHistogramCount=2970773.00,
fsReadLatencyHistogramMedian=312261.50,
fsReadLatencyHistogram75th=357708.50,
fsReadLatencyHistogram95th=3147993.00,
fsReadLatencyHistogram99th=4431904.00,
fsReadLatencyHistogram999th=4431904.00,
fsPreadLatencyHistogramMean=3245622.44,
fsPreadLatencyHistogramCount=36742.00,
fsPreadLatencyHistogramMedian=3180486.50,
fsPreadLatencyHistogram75th=3733068.50,
fsPreadLatencyHistogram95th=6424537.75,
fsPreadLatencyHistogram99th=24206578.00,
fsPreadLatencyHistogram999th=24206578.00,
fsWriteLatencyHistogramMean=356278.05,
fsWriteLatencyHistogramCount=2983888.00,
fsWriteLatencyHistogramMedian=538147.00,
fsWriteLatencyHistogram75th=570668.00,
fsWriteLatencyHistogram95th=570668.00,
fsWriteLatencyHistogram99th=570668.00,
fsWriteLatencyHistogram999th=570668.00
09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
STOPPED: regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting
09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
EventThread shut down
09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
Stopping server on 60020
09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 0 on 60020: exiting
09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 1 on 60020: exiting
09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 0 on 60020: exiting
09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
Stopping IPC Server listener on 60020
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 3 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 8 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 4 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 3 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.
HBaseServer
PRI IPC Server handler 6 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 2 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 6 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 5 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 7 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 2 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
Sending interrupt to stop the worker thread
09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
Stopping infoServer
09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 4 on 60020: exiting
09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
Stopping IPC Server Responder
09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 7 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 8 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 1 on 60020: exiting
09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
PRI IPC Server handler 9 on 60020: exiting
09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 5 on 60020: exiting
09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
Stopping IPC Server Responder
09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
IPC Server handler 9 on 60020: exiting
09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
SplitLogWorker interrupted while waiting for task, exiting:
java.lang.InterruptedException
09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
REPL IPC Server handler 0 on 60020: exiting
09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting
09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
REPL IPC Server handler 2 on 60020: exiting
09:17:20.768INFOorg.mortbay.log
Stopped SelectChannelConnector@0.0.0.0:60030
09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
REPL IPC Server handler 1 on 60020: exiting
09:17:20.908INFO
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
Stopping RegionServerSnapshotManager abruptly.
09:17:20.908INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
regionserver60020.compactionChecker exiting
09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
LogRoller exiting.
09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
regionserver60020.cacheFlusher exiting
09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
Waiting on 3 regions to close
09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
Closed CF
09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
Closed info
09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
Closed .META.,,1.1028785192
09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
Closed CF
09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
Memstore size is 13376
09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
Closed epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
regions closed.
09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
regionserver60020.logSyncer exiting
09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
regionserver60020 closing leases
09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
regionserver60020 closed leases
09:17:22.021INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
regionserver60020.periodicFlusher exiting
09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
Sleeping 2000ms before retry #1...
09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
regionserver60020.leaseChecker closing leases
09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
regionserver60020.leaseChecker closed leases
09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
Sleeping 4000ms before retry #2...
09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
Sleeping 8000ms before retry #3...
09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
Possibly transient ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
ZooKeeper delete failed after 3 retries
09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
at java.lang.Thread.run(Thread.java:662)
09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
zookeeper connection closed.
09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
regionserver60020 exiting
09:17:36.051ERROR
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
STOPPED: Shutdown hook
09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
Starting fs shutdown hook thread.
09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
Shutdown hook finished.
10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
HBase 0.94.15-cdh4.7.0
10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
-r Unknown
10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
Inc., vmVersion=20.6-b01
10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
-Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
-XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
-XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
-XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
-Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
-Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
-Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
-Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
-Dhbase.security.logger=INFO,DRFAS]
10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
Starting Thread-2
10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
Starting Thread-2
this FATAL
09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
ABORTING region server
etp-hdfs-n1-sg.passport.local,60020,1414102905372:
regionserver:60020-0x1493f19431b0000
regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
found on each RS
I didn' t see any problems with ZK or memory usage. The load is extremely
low, just few put/get in a second, you can see file sizes in logs.
What it could be how can i catch the root cause?
Re: Strange HBase failure
Posted by Serega Sheypak <se...@gmail.com>.
Ok, thanks, we'll check it.
2015-01-12 11:28 GMT+03:00 Esteban Gutierrez <es...@cloudera.com>:
> Hi Serega,
>
> Do you have enough resources allocated for each VM? Just some swapping on
> the VMs or the host can make things unstable. Also from the number of
> services on each VM sounds like your host should have at least 12GB of free
> RAM just for running things smoothly otherwise you might want to try with
> less VMs and with some RAM each.
>
> cheers,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
> On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak <serega.sheypak@gmail.com
> >
> wrote:
>
> > Hi, HBase was down during 08:25 to 09:15
> > I was looking into logs, and thinking. I've tried to find something more
> > clever. than dummy restart.
> > We are using Cloudera distro, each of daemons run in it's own JVM.
> > I'll try to find CPU load logs.
> > There is really low load,
> > Finished memstore flush of ~7.7 K/7840,
> >
> > Flushed , sequenceid=229369, memsize=16.3 K
> >
> >
> > Completed major compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > store is 76.6 M
> >
> >
> > See there are less than 100 MB of data for 3 VMs. It's nothing.
> >
> >
> >
> > 2015-01-12 6:38 GMT+03:00 Ted Yu <yu...@gmail.com>:
> >
> > > Serega:
> > > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> > under
> > > heavy load - considering the number of daemons running on that node.
> > >
> > > Please check GC log.
> > >
> > > Cheers
> > >
> > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <li...@gmail.com>
> > wrote:
> > >
> > > > From the log I see no log was produced during 08:25 to 09:15, why did
> > > this
> > > > happen?
> > > >
> > > > 08:25:06.274INFOorg.apache.
> > > > hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > moving old hlog file
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > whose highest sequenceid is 229359 to
> > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > >
> > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Regards,
> > > > Shuai
> > > >
> > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > > serega.sheypak@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > > deployment schema is:
> > > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> > (ZK),
> > > DN
> > > > > NODE02 RegionServer, DN
> > > > > NODE03 RegionServer, DN
> > > > >
> > > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > > HDFS was working, no alerts were there
> > > > > ZK server was working, no alerts there.
> > > > > VMWare didn't publish any alerts.
> > > > > Only restart of HBase service helped.
> > > > >
> > > > > We are using this:
> > > > >
> > >
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > > hbase-0.94.15+113
> > > > >
> > > > > I made a deep dive into logs and found this stuff:
> > > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > after a delay of 3026
> > > > >
> > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > >
> 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > >
> 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Added
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > > > entries=8, sequenceid=229362, filesize=2.7 K
> > > > >
> > > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > in 80ms, sequenceid=229362, compaction requested=true
> > > > >
> > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Starting compaction on CF in region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > >
> > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Starting compaction of 4 file(s) in CF of
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > into
> > > > >
> > > >
> > >
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > > > > seqid=229362, totalSize=76.6 M
> > > > >
> > > > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> > > > >
> > > > >
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Renaming compacted file at
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > > > > to
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > >
> 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Completed major compaction of 4 file(s) in CF of
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > > > > store is 76.6 M
> > > > >
> > > > > 08:15:59.283INFO
> > > > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> > > > >
> > > > > completed compaction:
> > > > >
> > > > >
> > > >
> > >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > > > > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > > > > time=8079772165508216; duration=4sec
> > > > >
> > > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > Roll
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > > > > entries=3, filesize=7313. for
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> > > > >
> > > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > moving old hlog file
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > > whose highest sequenceid is 229359 to
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > >
> > > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> after
> > a
> > > > > delay of 14844
> > > > >
> > > > > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> after
> > a
> > > > > delay of 21848
> > > > >
> > > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> > > > >
> > > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> > > > >
> > > > >
> 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > a148cdce49964965b7b6f4728881427d
> > > > >
> > > > >
> 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > a148cdce49964965b7b6f4728881427d
> > > > >
> > > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Added
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > > > > entries=16, sequenceid=229369, filesize=4.5 K
> > > > >
> > > > > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for
> region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > > > > 90ms, sequenceid=229369, compaction requested=false
> > > > >
> > > > > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Client session timed out, have not heard from server in 40003ms for
> > > > > sessionid 0x1493f19431b0005, closing socket connection and
> attempting
> > > > > reconnect
> > > > >
> > > > > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Client session timed out, have not heard from server in 40020ms for
> > > > > sessionid 0x1493f19431b0000, closing socket connection and
> attempting
> > > > > reconnect
> > > > >
> > > > > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Opening socket connection to server
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt
> to
> > > > > authenticate using SASL (java.lang.SecurityException: Unable to
> > locate
> > > > > a login configuration)
> > > > >
> > > > > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Socket connection established to
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating
> session
> > > > >
> > > > > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Opening socket connection to server
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt
> to
> > > > > authenticate using SASL (java.lang.SecurityException: Unable to
> > locate
> > > > > a login configuration)
> > > > >
> > > > > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Socket connection established to
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating
> session
> > > > >
> > > > > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > > > > has expired, closing socket connection
> > > > >
> > > > > 09:17:20.582INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > > >
> > > > > This client just lost it's session with ZooKeeper, will
> automatically
> > > > > reconnect when needed.
> > > > >
> > > > > 09:17:20.582INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > > >
> > > > > ZK session expired. This disconnect could have been caused by a
> > > > > network partition or a long-running GC pause, either way it's
> > > > > recommended that you verify your environment.
> > > > >
> > > > > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > EventThread shut down
> > > > >
> > > > > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > > > > has expired, closing socket connection
> > > > >
> > > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > ABORTING region server
> > > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > > regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > > at
> > > > >
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > > >
> > > > > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > RegionServer abort: loaded coprocessors are: []
> > > > >
> > > > > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > > > > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > > > > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > > > > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > > > > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > > > > readRequestsCount=279141, writeRequestsCount=91793,
> > > > > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > > > > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > > > > blockCacheCount=7, blockCacheHitCount=596347,
> > > > > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > > > > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > > > > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > > > > fsReadLatencyHistogramMean=330098.85,
> > > > > fsReadLatencyHistogramCount=2970773.00,
> > > > > fsReadLatencyHistogramMedian=312261.50,
> > > > > fsReadLatencyHistogram75th=357708.50,
> > > > > fsReadLatencyHistogram95th=3147993.00,
> > > > > fsReadLatencyHistogram99th=4431904.00,
> > > > > fsReadLatencyHistogram999th=4431904.00,
> > > > > fsPreadLatencyHistogramMean=3245622.44,
> > > > > fsPreadLatencyHistogramCount=36742.00,
> > > > > fsPreadLatencyHistogramMedian=3180486.50,
> > > > > fsPreadLatencyHistogram75th=3733068.50,
> > > > > fsPreadLatencyHistogram95th=6424537.75,
> > > > > fsPreadLatencyHistogram99th=24206578.00,
> > > > > fsPreadLatencyHistogram999th=24206578.00,
> > > > > fsWriteLatencyHistogramMean=356278.05,
> > > > > fsWriteLatencyHistogramCount=2983888.00,
> > > > > fsWriteLatencyHistogramMedian=538147.00,
> > > > > fsWriteLatencyHistogram75th=570668.00,
> > > > > fsWriteLatencyHistogram95th=570668.00,
> > > > > fsWriteLatencyHistogram99th=570668.00,
> > > > > fsWriteLatencyHistogram999th=570668.00
> > > > >
> > > > > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > STOPPED: regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > >
> > > > > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > EventThread shut down
> > > > >
> > > > > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping server on 60020
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server listener on 60020
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 3 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 8 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 4 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 3 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > > > > HBaseServer
> > > > >
> > > > > PRI IPC Server handler 6 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 6 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 5 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 7 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > Sending interrupt to stop the worker thread
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Stopping infoServer
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 4 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server Responder
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 7 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 8 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 9 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 5 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server Responder
> > > > >
> > > > > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 9 on 60020: exiting
> > > > >
> > > > > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > SplitLogWorker interrupted while waiting for task, exiting:
> > > > > java.lang.InterruptedException
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > exiting
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.768INFOorg.mortbay.log
> > > > >
> > > > > Stopped SelectChannelConnector@0.0.0.0:60030
> > > > >
> > > > > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.908INFO
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> > > > >
> > > > > Stopping RegionServerSnapshotManager abruptly.
> > > > >
> > > > > 09:17:20.908INFO
> > > > >
> org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> > > > >
> > > > > regionserver60020.compactionChecker exiting
> > > > >
> > > > > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> > > > >
> > > > > LogRoller exiting.
> > > > >
> > > > >
> 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> > > > >
> > > > > regionserver60020.cacheFlusher exiting
> > > > >
> > > > > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Waiting on 3 regions to close
> > > > >
> > > > > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed CF
> > > > >
> > > > > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed
> epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> > > > >
> > > > > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed info
> > > > >
> > > > > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed .META.,,1.1028785192
> > > > >
> > > > > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed CF
> > > > >
> > > > > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Memstore size is 13376
> > > > >
> > > > > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > >
> > > > > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > all
> > > > > regions closed.
> > > > >
> > > > > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > regionserver60020.logSyncer exiting
> > > > >
> > > > > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020 closing leases
> > > > >
> > > > > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020 closed leases
> > > > >
> > > > > 09:17:22.021INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> > > > >
> > > > > regionserver60020.periodicFlusher exiting
> > > > >
> > > > >
> > 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 2000ms before retry #1...
> > > > >
> > > > > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020.leaseChecker closing leases
> > > > >
> > > > > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020.leaseChecker closed leases
> > > > >
> > > > >
> > 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 4000ms before retry #2...
> > > > >
> > > > >
> > 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 8000ms before retry #3...
> > > > >
> > > > >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > >
> > 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > ZooKeeper delete failed after 3 retries
> > > > >
> > > > > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Failed deleting my ephemeral node
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > > at
> > > > >
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > > > > at
> > > > >
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > > at
> org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > > > > at
> > > > >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > > > > at
> > > > >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > > > > at java.lang.Thread.run(Thread.java:662)
> > > > >
> > > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > > > > zookeeper connection closed.
> > > > >
> > > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020 exiting
> > > > >
> > > > > 09:17:36.051ERROR
> > > > > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> > > > >
> > > > > Region server exiting
> > > > > java.lang.RuntimeException: HRegionServer Aborted
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > > > > at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> > > > >
> > > > > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Shutdown hook starting; hbase.shutdown.hook=true;
> > > > >
> > > > >
> > > >
> > >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> > > > >
> > > > > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > STOPPED: Shutdown hook
> > > > >
> > > > > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Starting fs shutdown hook thread.
> > > > >
> > > > > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Shutdown hook finished.
> > > > >
> > > > > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > HBase 0.94.15-cdh4.7.0
> > > > >
> > > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > Subversion
> > > > >
> > > >
> > >
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > > > > -r Unknown
> > > > >
> > > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> > > > >
> > > > > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > > >
> > > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > > > > Inc., vmVersion=20.6-b01
> > > > >
> > > > > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > > >
> > > > > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > > > > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > > > > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > > > > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > > > > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> > > > >
> > > > >
> > > >
> > >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> > > > >
> > > > >
> > > >
> > >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > > > > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> > > > >
> > > > >
> > > >
> > >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > > > > -Dhbase.security.logger=INFO,DRFAS]
> > > > >
> > > > > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Starting Thread-2
> > > > >
> > > > > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> > > > >
> > > > > Starting Thread-2
> > > > >
> > > > >
> > > > > this FATAL
> > > > >
> > > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > ABORTING region server
> > > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > > regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > > at
> > > > >
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > > >
> > > > > found on each RS
> > > > > I didn' t see any problems with ZK or memory usage. The load is
> > > extremely
> > > > > low, just few put/get in a second, you can see file sizes in logs.
> > > > > What it could be how can i catch the root cause?
> > > > >
> > > >
> > >
> >
>
Re: Strange HBase failure
Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Serega,
Do you have enough resources allocated for each VM? Just some swapping on
the VMs or the host can make things unstable. Also from the number of
services on each VM sounds like your host should have at least 12GB of free
RAM just for running things smoothly otherwise you might want to try with
less VMs and with some RAM each.
cheers,
esteban.
--
Cloudera, Inc.
On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak <se...@gmail.com>
wrote:
> Hi, HBase was down during 08:25 to 09:15
> I was looking into logs, and thinking. I've tried to find something more
> clever. than dummy restart.
> We are using Cloudera distro, each of daemons run in it's own JVM.
> I'll try to find CPU load logs.
> There is really low load,
> Finished memstore flush of ~7.7 K/7840,
>
> Flushed , sequenceid=229369, memsize=16.3 K
>
>
> Completed major compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> store is 76.6 M
>
>
> See there are less than 100 MB of data for 3 VMs. It's nothing.
>
>
>
> 2015-01-12 6:38 GMT+03:00 Ted Yu <yu...@gmail.com>:
>
> > Serega:
> > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> under
> > heavy load - considering the number of daemons running on that node.
> >
> > Please check GC log.
> >
> > Cheers
> >
> > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <li...@gmail.com>
> wrote:
> >
> > > From the log I see no log was produced during 08:25 to 09:15, why did
> > this
> > > happen?
> > >
> > > 08:25:06.274INFOorg.apache.
> > > hadoop.hbase.regionserver.wal.HLog
> > >
> > > moving old hlog file
> > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > whose highest sequenceid is 229359 to
> > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > >
> > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Regards,
> > > Shuai
> > >
> > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > serega.sheypak@gmail.com>
> > > wrote:
> > >
> > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > deployment schema is:
> > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> (ZK),
> > DN
> > > > NODE02 RegionServer, DN
> > > > NODE03 RegionServer, DN
> > > >
> > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > HDFS was working, no alerts were there
> > > > ZK server was working, no alerts there.
> > > > VMWare didn't publish any alerts.
> > > > Only restart of HBase service helped.
> > > >
> > > > We are using this:
> > > >
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > hbase-0.94.15+113
> > > >
> > > > I made a deep dive into logs and found this stuff:
> > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > after a delay of 3026
> > > >
> > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Added
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > > entries=8, sequenceid=229362, filesize=2.7 K
> > > >
> > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > in 80ms, sequenceid=229362, compaction requested=true
> > > >
> > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Starting compaction on CF in region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > >
> > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Starting compaction of 4 file(s) in CF of
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > into
> > > >
> > >
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > > > seqid=229362, totalSize=76.6 M
> > > >
> > > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> > > >
> > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Renaming compacted file at
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > > > to
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8bf8e92031834676b5d40b352120c5f2
> > > >
> > > > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Completed major compaction of 4 file(s) in CF of
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > > > store is 76.6 M
> > > >
> > > > 08:15:59.283INFO
> > > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> > > >
> > > > completed compaction:
> > > >
> > > >
> > >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > > > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > > > time=8079772165508216; duration=4sec
> > > >
> > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > Roll
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > > > entries=3, filesize=7313. for
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> > > >
> > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > moving old hlog file
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > whose highest sequenceid is 229359 to
> > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > >
> > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after
> a
> > > > delay of 14844
> > > >
> > > > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after
> a
> > > > delay of 21848
> > > >
> > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> > > >
> > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > a148cdce49964965b7b6f4728881427d
> > > >
> > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Added
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > > > entries=16, sequenceid=229369, filesize=4.5 K
> > > >
> > > > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > > > 90ms, sequenceid=229369, compaction requested=false
> > > >
> > > > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Client session timed out, have not heard from server in 40003ms for
> > > > sessionid 0x1493f19431b0005, closing socket connection and attempting
> > > > reconnect
> > > >
> > > > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Client session timed out, have not heard from server in 40020ms for
> > > > sessionid 0x1493f19431b0000, closing socket connection and attempting
> > > > reconnect
> > > >
> > > > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Opening socket connection to server
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > > authenticate using SASL (java.lang.SecurityException: Unable to
> locate
> > > > a login configuration)
> > > >
> > > > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Socket connection established to
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > > >
> > > > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Opening socket connection to server
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > > authenticate using SASL (java.lang.SecurityException: Unable to
> locate
> > > > a login configuration)
> > > >
> > > > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Socket connection established to
> > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > > >
> > > > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > > > has expired, closing socket connection
> > > >
> > > > 09:17:20.582INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > >
> > > > This client just lost it's session with ZooKeeper, will automatically
> > > > reconnect when needed.
> > > >
> > > > 09:17:20.582INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > >
> > > > ZK session expired. This disconnect could have been caused by a
> > > > network partition or a long-running GC pause, either way it's
> > > > recommended that you verify your environment.
> > > >
> > > > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > EventThread shut down
> > > >
> > > > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > > > has expired, closing socket connection
> > > >
> > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > ABORTING region server
> > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > at
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > >
> > > > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > RegionServer abort: loaded coprocessors are: []
> > > >
> > > > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > > > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > > > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > > > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > > > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > > > readRequestsCount=279141, writeRequestsCount=91793,
> > > > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > > > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > > > blockCacheCount=7, blockCacheHitCount=596347,
> > > > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > > > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > > > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > > > fsReadLatencyHistogramMean=330098.85,
> > > > fsReadLatencyHistogramCount=2970773.00,
> > > > fsReadLatencyHistogramMedian=312261.50,
> > > > fsReadLatencyHistogram75th=357708.50,
> > > > fsReadLatencyHistogram95th=3147993.00,
> > > > fsReadLatencyHistogram99th=4431904.00,
> > > > fsReadLatencyHistogram999th=4431904.00,
> > > > fsPreadLatencyHistogramMean=3245622.44,
> > > > fsPreadLatencyHistogramCount=36742.00,
> > > > fsPreadLatencyHistogramMedian=3180486.50,
> > > > fsPreadLatencyHistogram75th=3733068.50,
> > > > fsPreadLatencyHistogram95th=6424537.75,
> > > > fsPreadLatencyHistogram99th=24206578.00,
> > > > fsPreadLatencyHistogram999th=24206578.00,
> > > > fsWriteLatencyHistogramMean=356278.05,
> > > > fsWriteLatencyHistogramCount=2983888.00,
> > > > fsWriteLatencyHistogramMedian=538147.00,
> > > > fsWriteLatencyHistogram75th=570668.00,
> > > > fsWriteLatencyHistogram95th=570668.00,
> > > > fsWriteLatencyHistogram99th=570668.00,
> > > > fsWriteLatencyHistogram999th=570668.00
> > > >
> > > > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > STOPPED: regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > >
> > > > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> > > >
> > > > EventThread shut down
> > > >
> > > > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping server on 60020
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server listener on 60020
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 3 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 8 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 4 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 3 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > > > HBaseServer
> > > >
> > > > PRI IPC Server handler 6 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 6 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 5 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 7 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > Sending interrupt to stop the worker thread
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Stopping infoServer
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 4 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server Responder
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 7 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 8 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > PRI IPC Server handler 9 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 5 on 60020: exiting
> > > >
> > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Stopping IPC Server Responder
> > > >
> > > > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > IPC Server handler 9 on 60020: exiting
> > > >
> > > > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > SplitLogWorker interrupted while waiting for task, exiting:
> > > > java.lang.InterruptedException
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 0 on 60020: exiting
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > >
> > > > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > exiting
> > > >
> > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 2 on 60020: exiting
> > > >
> > > > 09:17:20.768INFOorg.mortbay.log
> > > >
> > > > Stopped SelectChannelConnector@0.0.0.0:60030
> > > >
> > > > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > REPL IPC Server handler 1 on 60020: exiting
> > > >
> > > > 09:17:20.908INFO
> > > >
> > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> > > >
> > > > Stopping RegionServerSnapshotManager abruptly.
> > > >
> > > > 09:17:20.908INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> > > >
> > > > regionserver60020.compactionChecker exiting
> > > >
> > > > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> > > >
> > > > LogRoller exiting.
> > > >
> > > > 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> > > >
> > > > regionserver60020.cacheFlusher exiting
> > > >
> > > > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Waiting on 3 regions to close
> > > >
> > > > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed CF
> > > >
> > > > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> > > >
> > > > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed info
> > > >
> > > > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed .META.,,1.1028785192
> > > >
> > > > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Closed CF
> > > >
> > > > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Memstore size is 13376
> > > >
> > > > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > >
> > > > Closed
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > >
> > > > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> all
> > > > regions closed.
> > > >
> > > > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > regionserver60020.logSyncer exiting
> > > >
> > > > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020 closing leases
> > > >
> > > > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020 closed leases
> > > >
> > > > 09:17:22.021INFO
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> > > >
> > > > regionserver60020.periodicFlusher exiting
> > > >
> > > >
> 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 2000ms before retry #1...
> > > >
> > > > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020.leaseChecker closing leases
> > > >
> > > > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > >
> > > > regionserver60020.leaseChecker closed leases
> > > >
> > > >
> 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 4000ms before retry #2...
> > > >
> > > >
> 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > >
> > > > Sleeping 8000ms before retry #3...
> > > >
> > > >
> 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > Possibly transient ZooKeeper exception:
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > >
> > > >
> 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > >
> > > > ZooKeeper delete failed after 3 retries
> > > >
> > > > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Failed deleting my ephemeral node
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired for
> > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > > > at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > > > at
> > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > > > at
> > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > > > at java.lang.Thread.run(Thread.java:662)
> > > >
> > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > > > zookeeper connection closed.
> > > >
> > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020 exiting
> > > >
> > > > 09:17:36.051ERROR
> > > > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> > > >
> > > > Region server exiting
> > > > java.lang.RuntimeException: HRegionServer Aborted
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> > > >
> > > > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Shutdown hook starting; hbase.shutdown.hook=true;
> > > >
> > > >
> > >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> > > >
> > > > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > STOPPED: Shutdown hook
> > > >
> > > > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Starting fs shutdown hook thread.
> > > >
> > > > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > >
> > > > Shutdown hook finished.
> > > >
> > > > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > HBase 0.94.15-cdh4.7.0
> > > >
> > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > Subversion
> > > >
> > >
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > > > -r Unknown
> > > >
> > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > >
> > > > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> > > >
> > > > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > >
> > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > > > Inc., vmVersion=20.6-b01
> > > >
> > > > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > >
> > > > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > > > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > > > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > > > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > > > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> > > >
> > > >
> > >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> > > >
> > > >
> > >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > > > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> > > >
> > > >
> > >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > > > -Dhbase.security.logger=INFO,DRFAS]
> > > >
> > > > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> > > >
> > > > Starting Thread-2
> > > >
> > > > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> > > >
> > > > Starting Thread-2
> > > >
> > > >
> > > > this FATAL
> > > >
> > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > ABORTING region server
> > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > regionserver:60020-0x1493f19431b0000
> > > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > > aborting
> > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > KeeperErrorCode = Session expired
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > at
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > at
> > > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > >
> > > > found on each RS
> > > > I didn' t see any problems with ZK or memory usage. The load is
> > extremely
> > > > low, just few put/get in a second, you can see file sizes in logs.
> > > > What it could be how can i catch the root cause?
> > > >
> > >
> >
>
Re: Strange HBase failure
Posted by Serega Sheypak <se...@gmail.com>.
Hi, HBase was down during 08:25 to 09:15
I was looking into logs, and thinking. I've tried to find something more
clever. than dummy restart.
We are using Cloudera distro, each of daemons run in it's own JVM.
I'll try to find CPU load logs.
There is really low load,
Finished memstore flush of ~7.7 K/7840,
Flushed , sequenceid=229369, memsize=16.3 K
Completed major compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
store is 76.6 M
See there are less than 100 MB of data for 3 VMs. It's nothing.
2015-01-12 6:38 GMT+03:00 Ted Yu <yu...@gmail.com>:
> Serega:
> Was the snippet of log from NODE01 ? Looks like NODE01 may have been under
> heavy load - considering the number of daemons running on that node.
>
> Please check GC log.
>
> Cheers
>
> On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <li...@gmail.com> wrote:
>
> > From the log I see no log was produced during 08:25 to 09:15, why did
> this
> > happen?
> >
> > 08:25:06.274INFOorg.apache.
> > hadoop.hbase.regionserver.wal.HLog
> >
> > moving old hlog file
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > whose highest sequenceid is 229359 to
> >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> >
> > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Regards,
> > Shuai
> >
> > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> serega.sheypak@gmail.com>
> > wrote:
> >
> > > Hi, I have PoC HBase cluster running on 3 VM
> > > deployment schema is:
> > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK),
> DN
> > > NODE02 RegionServer, DN
> > > NODE03 RegionServer, DN
> > >
> > > Suddenly ONLY HBase went offline, all services: HM RS
> > > HDFS was working, no alerts were there
> > > ZK server was working, no alerts there.
> > > VMWare didn't publish any alerts.
> > > Only restart of HBase service helped.
> > >
> > > We are using this:
> > >
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > hbase-0.94.15+113
> > >
> > > I made a deep dive into logs and found this stuff:
> > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > regionserver60020.periodicFlusher requesting flush for region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > after a delay of 3026
> > >
> > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > ROW, CompoundBloomFilterWriter
> > >
> > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Delete Family Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > CompoundBloomFilterWriter
> > >
> > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > General Bloom and NO DeleteFamily was added to HFile
> > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > >
> > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Added
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > entries=8, sequenceid=229362, filesize=2.7 K
> > >
> > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > in 80ms, sequenceid=229362, compaction requested=true
> > >
> > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Starting compaction on CF in region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > >
> > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Starting compaction of 4 file(s) in CF of
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > into
> > >
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > > seqid=229362, totalSize=76.6 M
> > >
> > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > ROW, CompoundBloomFilterWriter
> > >
> > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Delete Family Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > CompoundBloomFilterWriter
> > >
> > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > General Bloom and NO DeleteFamily was added to HFile
> > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> > >
> > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8bf8e92031834676b5d40b352120c5f2
> > >
> > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Renaming compacted file at
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > > to
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> > >
> > > 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8bf8e92031834676b5d40b352120c5f2
> > >
> > > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Completed major compaction of 4 file(s) in CF of
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > > store is 76.6 M
> > >
> > > 08:15:59.283INFO
> > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> > >
> > > completed compaction:
> > >
> > >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > > time=8079772165508216; duration=4sec
> > >
> > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > >
> > > Roll
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > > entries=3, filesize=7313. for
> > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> > >
> > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > >
> > > moving old hlog file
> > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > whose highest sequenceid is 229359 to
> > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > >
> > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > regionserver60020.periodicFlusher requesting flush for region
> > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > > delay of 14844
> > >
> > > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > regionserver60020.periodicFlusher requesting flush for region
> > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > > delay of 21848
> > >
> > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > ROW, CompoundBloomFilterWriter
> > >
> > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Delete Family Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > CompoundBloomFilterWriter
> > >
> > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > General Bloom and NO DeleteFamily was added to HFile
> > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> > >
> > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> > >
> > > 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > a148cdce49964965b7b6f4728881427d
> > >
> > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > a148cdce49964965b7b6f4728881427d
> > >
> > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Added
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > > entries=16, sequenceid=229369, filesize=4.5 K
> > >
> > > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > > 90ms, sequenceid=229369, compaction requested=false
> > >
> > > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Client session timed out, have not heard from server in 40003ms for
> > > sessionid 0x1493f19431b0005, closing socket connection and attempting
> > > reconnect
> > >
> > > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Client session timed out, have not heard from server in 40020ms for
> > > sessionid 0x1493f19431b0000, closing socket connection and attempting
> > > reconnect
> > >
> > > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Opening socket connection to server
> > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > > a login configuration)
> > >
> > > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Socket connection established to
> > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > >
> > > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Opening socket connection to server
> > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > > a login configuration)
> > >
> > > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Socket connection established to
> > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> > >
> > > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > > has expired, closing socket connection
> > >
> > > 09:17:20.582INFO
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > >
> > > This client just lost it's session with ZooKeeper, will automatically
> > > reconnect when needed.
> > >
> > > 09:17:20.582INFO
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > >
> > > ZK session expired. This disconnect could have been caused by a
> > > network partition or a long-running GC pause, either way it's
> > > recommended that you verify your environment.
> > >
> > > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > EventThread shut down
> > >
> > > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > > has expired, closing socket connection
> > >
> > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > ABORTING region server
> > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > regionserver:60020-0x1493f19431b0000
> > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > aborting
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired
> > > at
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > at
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > at
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > at
> > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > >
> > > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > RegionServer abort: loaded coprocessors are: []
> > >
> > > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > > readRequestsCount=279141, writeRequestsCount=91793,
> > > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > > blockCacheCount=7, blockCacheHitCount=596347,
> > > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > > fsReadLatencyHistogramMean=330098.85,
> > > fsReadLatencyHistogramCount=2970773.00,
> > > fsReadLatencyHistogramMedian=312261.50,
> > > fsReadLatencyHistogram75th=357708.50,
> > > fsReadLatencyHistogram95th=3147993.00,
> > > fsReadLatencyHistogram99th=4431904.00,
> > > fsReadLatencyHistogram999th=4431904.00,
> > > fsPreadLatencyHistogramMean=3245622.44,
> > > fsPreadLatencyHistogramCount=36742.00,
> > > fsPreadLatencyHistogramMedian=3180486.50,
> > > fsPreadLatencyHistogram75th=3733068.50,
> > > fsPreadLatencyHistogram95th=6424537.75,
> > > fsPreadLatencyHistogram99th=24206578.00,
> > > fsPreadLatencyHistogram999th=24206578.00,
> > > fsWriteLatencyHistogramMean=356278.05,
> > > fsWriteLatencyHistogramCount=2983888.00,
> > > fsWriteLatencyHistogramMedian=538147.00,
> > > fsWriteLatencyHistogram75th=570668.00,
> > > fsWriteLatencyHistogram95th=570668.00,
> > > fsWriteLatencyHistogram99th=570668.00,
> > > fsWriteLatencyHistogram999th=570668.00
> > >
> > > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > STOPPED: regionserver:60020-0x1493f19431b0000
> > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > aborting
> > >
> > > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> > >
> > > EventThread shut down
> > >
> > > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > Stopping server on 60020
> > >
> > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 0 on 60020: exiting
> > >
> > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 1 on 60020: exiting
> > >
> > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 0 on 60020: exiting
> > >
> > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > Stopping IPC Server listener on 60020
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 3 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 8 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 4 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 3 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > > HBaseServer
> > >
> > > PRI IPC Server handler 6 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 2 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 6 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 5 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 7 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 2 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > >
> > > Sending interrupt to stop the worker thread
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Stopping infoServer
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 4 on 60020: exiting
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > Stopping IPC Server Responder
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 7 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 8 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 1 on 60020: exiting
> > >
> > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > PRI IPC Server handler 9 on 60020: exiting
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 5 on 60020: exiting
> > >
> > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > Stopping IPC Server Responder
> > >
> > > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > IPC Server handler 9 on 60020: exiting
> > >
> > > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > >
> > > SplitLogWorker interrupted while waiting for task, exiting:
> > > java.lang.InterruptedException
> > >
> > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > REPL IPC Server handler 0 on 60020: exiting
> > >
> > > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > >
> > > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372
> exiting
> > >
> > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > REPL IPC Server handler 2 on 60020: exiting
> > >
> > > 09:17:20.768INFOorg.mortbay.log
> > >
> > > Stopped SelectChannelConnector@0.0.0.0:60030
> > >
> > > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > REPL IPC Server handler 1 on 60020: exiting
> > >
> > > 09:17:20.908INFO
> > >
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> > >
> > > Stopping RegionServerSnapshotManager abruptly.
> > >
> > > 09:17:20.908INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> > >
> > > regionserver60020.compactionChecker exiting
> > >
> > > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> > >
> > > LogRoller exiting.
> > >
> > > 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> > >
> > > regionserver60020.cacheFlusher exiting
> > >
> > > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > >
> > > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Waiting on 3 regions to close
> > >
> > > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Closed CF
> > >
> > > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> > >
> > > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Closed info
> > >
> > > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Closed .META.,,1.1028785192
> > >
> > > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Closed CF
> > >
> > > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Memstore size is 13376
> > >
> > > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Closed
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > >
> > > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
> > > regions closed.
> > >
> > > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > >
> > > regionserver60020.logSyncer exiting
> > >
> > > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> > >
> > > regionserver60020 closing leases
> > >
> > > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> > >
> > > regionserver60020 closed leases
> > >
> > > 09:17:22.021INFO
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> > >
> > > regionserver60020.periodicFlusher exiting
> > >
> > > 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > >
> > > Possibly transient ZooKeeper exception:
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for
> > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > >
> > > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> > >
> > > Sleeping 2000ms before retry #1...
> > >
> > > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> > >
> > > regionserver60020.leaseChecker closing leases
> > >
> > > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> > >
> > > regionserver60020.leaseChecker closed leases
> > >
> > > 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > >
> > > Possibly transient ZooKeeper exception:
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for
> > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > >
> > > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> > >
> > > Sleeping 4000ms before retry #2...
> > >
> > > 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > >
> > > Possibly transient ZooKeeper exception:
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for
> > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > >
> > > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> > >
> > > Sleeping 8000ms before retry #3...
> > >
> > > 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > >
> > > Possibly transient ZooKeeper exception:
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for
> > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > >
> > > 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > >
> > > ZooKeeper delete failed after 3 retries
> > >
> > > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Failed deleting my ephemeral node
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for
> > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > at
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > > at
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > > at
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > > at
> > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > > at
> > > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > > at java.lang.Thread.run(Thread.java:662)
> > >
> > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > > zookeeper connection closed.
> > >
> > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > regionserver60020 exiting
> > >
> > > 09:17:36.051ERROR
> > > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> > >
> > > Region server exiting
> > > java.lang.RuntimeException: HRegionServer Aborted
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > at
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> > >
> > > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > >
> > > Shutdown hook starting; hbase.shutdown.hook=true;
> > >
> > >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> > >
> > > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > STOPPED: Shutdown hook
> > >
> > > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > >
> > > Starting fs shutdown hook thread.
> > >
> > > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > >
> > > Shutdown hook finished.
> > >
> > > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> > >
> > > HBase 0.94.15-cdh4.7.0
> > >
> > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > >
> > > Subversion
> > >
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > > -r Unknown
> > >
> > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > >
> > > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> > >
> > > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > >
> > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > > Inc., vmVersion=20.6-b01
> > >
> > > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > >
> > > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> > >
> > >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> > >
> > >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> > >
> > >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > > -Dhbase.security.logger=INFO,DRFAS]
> > >
> > > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> > >
> > > Starting Thread-2
> > >
> > > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> > >
> > > Starting Thread-2
> > >
> > >
> > > this FATAL
> > >
> > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > ABORTING region server
> > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > regionserver:60020-0x1493f19431b0000
> > > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > > aborting
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired
> > > at
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > at
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > at
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > at
> > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > >
> > > found on each RS
> > > I didn' t see any problems with ZK or memory usage. The load is
> extremely
> > > low, just few put/get in a second, you can see file sizes in logs.
> > > What it could be how can i catch the root cause?
> > >
> >
>
Re: Strange HBase failure
Posted by Ted Yu <yu...@gmail.com>.
Serega:
Was the snippet of log from NODE01 ? Looks like NODE01 may have been under
heavy load - considering the number of daemons running on that node.
Please check GC log.
Cheers
On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <li...@gmail.com> wrote:
> From the log I see no log was produced during 08:25 to 09:15, why did this
> happen?
>
> 08:25:06.274INFOorg.apache.
> hadoop.hbase.regionserver.wal.HLog
>
> moving old hlog file
>
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> whose highest sequenceid is 229359 to
>
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
>
> 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Regards,
> Shuai
>
> On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <se...@gmail.com>
> wrote:
>
> > Hi, I have PoC HBase cluster running on 3 VM
> > deployment schema is:
> > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
> > NODE02 RegionServer, DN
> > NODE03 RegionServer, DN
> >
> > Suddenly ONLY HBase went offline, all services: HM RS
> > HDFS was working, no alerts were there
> > ZK server was working, no alerts there.
> > VMWare didn't publish any alerts.
> > Only restart of HBase service helped.
> >
> > We are using this:
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > hbase-0.94.15+113
> >
> > I made a deep dive into logs and found this stuff:
> > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > after a delay of 3026
> >
> > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > CompoundBloomFilterWriter
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Added
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > entries=8, sequenceid=229362, filesize=2.7 K
> >
> > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > in 80ms, sequenceid=229362, compaction requested=true
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Starting compaction on CF in region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Starting compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > seqid=229362, totalSize=76.6 M
> >
> > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > CompoundBloomFilterWriter
> >
> > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> >
> > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Renaming compacted file at
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > to
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8bf8e92031834676b5d40b352120c5f2
> >
> > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Completed major compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > store is 76.6 M
> >
> > 08:15:59.283INFO
> > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> >
> > completed compaction:
> >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > time=8079772165508216; duration=4sec
> >
> > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > Roll
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > entries=3, filesize=7313. for
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> >
> > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > moving old hlog file
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > whose highest sequenceid is 229359 to
> >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> >
> > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > delay of 14844
> >
> > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> > delay of 21848
> >
> > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > ROW, CompoundBloomFilterWriter
> >
> > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > CompoundBloomFilterWriter
> >
> > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> >
> > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > a148cdce49964965b7b6f4728881427d
> >
> > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Added
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > entries=16, sequenceid=229369, filesize=4.5 K
> >
> > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > 90ms, sequenceid=229369, compaction requested=false
> >
> > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> >
> > Client session timed out, have not heard from server in 40003ms for
> > sessionid 0x1493f19431b0005, closing socket connection and attempting
> > reconnect
> >
> > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> >
> > Client session timed out, have not heard from server in 40020ms for
> > sessionid 0x1493f19431b0000, closing socket connection and attempting
> > reconnect
> >
> > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> >
> > Opening socket connection to server
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > a login configuration)
> >
> > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> >
> > Socket connection established to
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> >
> > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> >
> > Opening socket connection to server
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> > authenticate using SASL (java.lang.SecurityException: Unable to locate
> > a login configuration)
> >
> > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> >
> > Socket connection established to
> > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
> >
> > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> >
> > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > has expired, closing socket connection
> >
> > 09:17:20.582INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> >
> > This client just lost it's session with ZooKeeper, will automatically
> > reconnect when needed.
> >
> > 09:17:20.582INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> >
> > ZK session expired. This disconnect could have been caused by a
> > network partition or a long-running GC pause, either way it's
> > recommended that you verify your environment.
> >
> > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> >
> > EventThread shut down
> >
> > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> >
> > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > has expired, closing socket connection
> >
> > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > ABORTING region server
> > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> >
> > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > RegionServer abort: loaded coprocessors are: []
> >
> > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > readRequestsCount=279141, writeRequestsCount=91793,
> > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > blockCacheCount=7, blockCacheHitCount=596347,
> > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > fsReadLatencyHistogramMean=330098.85,
> > fsReadLatencyHistogramCount=2970773.00,
> > fsReadLatencyHistogramMedian=312261.50,
> > fsReadLatencyHistogram75th=357708.50,
> > fsReadLatencyHistogram95th=3147993.00,
> > fsReadLatencyHistogram99th=4431904.00,
> > fsReadLatencyHistogram999th=4431904.00,
> > fsPreadLatencyHistogramMean=3245622.44,
> > fsPreadLatencyHistogramCount=36742.00,
> > fsPreadLatencyHistogramMedian=3180486.50,
> > fsPreadLatencyHistogram75th=3733068.50,
> > fsPreadLatencyHistogram95th=6424537.75,
> > fsPreadLatencyHistogram99th=24206578.00,
> > fsPreadLatencyHistogram999th=24206578.00,
> > fsWriteLatencyHistogramMean=356278.05,
> > fsWriteLatencyHistogramCount=2983888.00,
> > fsWriteLatencyHistogramMedian=538147.00,
> > fsWriteLatencyHistogram75th=570668.00,
> > fsWriteLatencyHistogram95th=570668.00,
> > fsWriteLatencyHistogram99th=570668.00,
> > fsWriteLatencyHistogram999th=570668.00
> >
> > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > STOPPED: regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> >
> > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> >
> > EventThread shut down
> >
> > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping server on 60020
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server listener on 60020
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 3 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 8 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 4 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 3 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > HBaseServer
> >
> > PRI IPC Server handler 6 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 6 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 5 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 7 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > Sending interrupt to stop the worker thread
> >
> > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Stopping infoServer
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 4 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server Responder
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 7 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 8 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > PRI IPC Server handler 9 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 5 on 60020: exiting
> >
> > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Stopping IPC Server Responder
> >
> > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > IPC Server handler 9 on 60020: exiting
> >
> > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > SplitLogWorker interrupted while waiting for task, exiting:
> > java.lang.InterruptedException
> >
> > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 0 on 60020: exiting
> >
> > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> >
> > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting
> >
> > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 2 on 60020: exiting
> >
> > 09:17:20.768INFOorg.mortbay.log
> >
> > Stopped SelectChannelConnector@0.0.0.0:60030
> >
> > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > REPL IPC Server handler 1 on 60020: exiting
> >
> > 09:17:20.908INFO
> > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> >
> > Stopping RegionServerSnapshotManager abruptly.
> >
> > 09:17:20.908INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> >
> > regionserver60020.compactionChecker exiting
> >
> > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> >
> > LogRoller exiting.
> >
> > 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> >
> > regionserver60020.cacheFlusher exiting
> >
> > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Waiting on 3 regions to close
> >
> > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed CF
> >
> > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> >
> > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed info
> >
> > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed .META.,,1.1028785192
> >
> > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Closed CF
> >
> > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Memstore size is 13376
> >
> > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Closed
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> >
> > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
> > regions closed.
> >
> > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> >
> > regionserver60020.logSyncer exiting
> >
> > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020 closing leases
> >
> > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020 closed leases
> >
> > 09:17:22.021INFO
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> >
> > regionserver60020.periodicFlusher exiting
> >
> > 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 2000ms before retry #1...
> >
> > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020.leaseChecker closing leases
> >
> > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> >
> > regionserver60020.leaseChecker closed leases
> >
> > 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 4000ms before retry #2...
> >
> > 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> >
> > Sleeping 8000ms before retry #3...
> >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > Possibly transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> >
> > 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >
> > ZooKeeper delete failed after 3 retries
> >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Failed deleting my ephemeral node
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for
> > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > at
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > at java.lang.Thread.run(Thread.java:662)
> >
> > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > zookeeper connection closed.
> >
> > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020 exiting
> >
> > 09:17:36.051ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> >
> > Region server exiting
> > java.lang.RuntimeException: HRegionServer Aborted
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> >
> > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Shutdown hook starting; hbase.shutdown.hook=true;
> >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> >
> > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > STOPPED: Shutdown hook
> >
> > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Starting fs shutdown hook thread.
> >
> > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> >
> > Shutdown hook finished.
> >
> > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > HBase 0.94.15-cdh4.7.0
> >
> > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > Subversion
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > -r Unknown
> >
> > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> >
> > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> >
> > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> >
> > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > Inc., vmVersion=20.6-b01
> >
> > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> >
> > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > -Dhbase.security.logger=INFO,DRFAS]
> >
> > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> >
> > Starting Thread-2
> >
> > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> >
> > Starting Thread-2
> >
> >
> > this FATAL
> >
> > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > ABORTING region server
> > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > regionserver:60020-0x1493f19431b0000
> > regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> >
> > found on each RS
> > I didn' t see any problems with ZK or memory usage. The load is extremely
> > low, just few put/get in a second, you can see file sizes in logs.
> > What it could be how can i catch the root cause?
> >
>
Re: Strange HBase failure
Posted by Shuai Lin <li...@gmail.com>.
>From the log I see no log was produced during 08:25 to 09:15, why did this
happen?
08:25:06.274INFOorg.apache.
hadoop.hbase.regionserver.wal.HLog
moving old hlog file
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
whose highest sequenceid is 229359 to
/hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
Regards,
Shuai
On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <se...@gmail.com>
wrote:
> Hi, I have PoC HBase cluster running on 3 VM
> deployment schema is:
> NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
> NODE02 RegionServer, DN
> NODE03 RegionServer, DN
>
> Suddenly ONLY HBase went offline, all services: HM RS
> HDFS was working, no alerts were there
> ZK server was working, no alerts there.
> VMWare didn't publish any alerts.
> Only restart of HBase service helped.
>
> We are using this:
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> hbase-0.94.15+113
>
> I made a deep dive into logs and found this stuff:
> 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> regionserver60020.periodicFlusher requesting flush for region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> after a delay of 3026
>
> 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> ROW, CompoundBloomFilterWriter
>
> 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Delete Family Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> CompoundBloomFilterWriter
>
> 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> General Bloom and NO DeleteFamily was added to HFile
>
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
>
> 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Added
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> entries=8, sequenceid=229362, filesize=2.7 K
>
> 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> in 80ms, sequenceid=229362, compaction requested=true
>
> 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Starting compaction on CF in region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
>
> 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Starting compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> seqid=229362, totalSize=76.6 M
>
> 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> ROW, CompoundBloomFilterWriter
>
> 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Delete Family Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> CompoundBloomFilterWriter
>
> 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> General Bloom and NO DeleteFamily was added to HFile
>
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
>
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8bf8e92031834676b5d40b352120c5f2
>
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Renaming compacted file at
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> to
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
>
> 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8bf8e92031834676b5d40b352120c5f2
>
> 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Completed major compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> store is 76.6 M
>
> 08:15:59.283INFO
> org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
>
> completed compaction:
>
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> time=8079772165508216; duration=4sec
>
> 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
>
> Roll
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> entries=3, filesize=7313. for
>
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
>
> 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
>
> moving old hlog file
>
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> whose highest sequenceid is 229359 to
>
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
>
> 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> regionserver60020.periodicFlusher requesting flush for region
> epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> delay of 14844
>
> 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> regionserver60020.periodicFlusher requesting flush for region
> epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a
> delay of 21848
>
> 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> ROW, CompoundBloomFilterWriter
>
> 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Delete Family Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> CompoundBloomFilterWriter
>
> 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> General Bloom and NO DeleteFamily was added to HFile
>
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
>
> 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
>
> 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> a148cdce49964965b7b6f4728881427d
>
> 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> a148cdce49964965b7b6f4728881427d
>
> 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Added
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> entries=16, sequenceid=229369, filesize=4.5 K
>
> 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region
> epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> 90ms, sequenceid=229369, compaction requested=false
>
> 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
>
> Client session timed out, have not heard from server in 40003ms for
> sessionid 0x1493f19431b0005, closing socket connection and attempting
> reconnect
>
> 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
>
> Client session timed out, have not heard from server in 40020ms for
> sessionid 0x1493f19431b0000, closing socket connection and attempting
> reconnect
>
> 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
>
> Opening socket connection to server
> etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> authenticate using SASL (java.lang.SecurityException: Unable to locate
> a login configuration)
>
> 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
>
> Socket connection established to
> etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
>
> 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
>
> Opening socket connection to server
> etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to
> authenticate using SASL (java.lang.SecurityException: Unable to locate
> a login configuration)
>
> 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
>
> Socket connection established to
> etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session
>
> 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
>
> Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> has expired, closing socket connection
>
> 09:17:20.582INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
>
> This client just lost it's session with ZooKeeper, will automatically
> reconnect when needed.
>
> 09:17:20.582INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
>
> ZK session expired. This disconnect could have been caused by a
> network partition or a long-running GC pause, either way it's
> recommended that you verify your environment.
>
> 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
>
> EventThread shut down
>
> 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
>
> Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> has expired, closing socket connection
>
> 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> ABORTING region server
> etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> regionserver:60020-0x1493f19431b0000
> regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> RegionServer abort: loaded coprocessors are: []
>
> 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=279141, writeRequestsCount=91793,
> compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> blockCacheCount=7, blockCacheHitCount=596347,
> blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> fsReadLatencyHistogramMean=330098.85,
> fsReadLatencyHistogramCount=2970773.00,
> fsReadLatencyHistogramMedian=312261.50,
> fsReadLatencyHistogram75th=357708.50,
> fsReadLatencyHistogram95th=3147993.00,
> fsReadLatencyHistogram99th=4431904.00,
> fsReadLatencyHistogram999th=4431904.00,
> fsPreadLatencyHistogramMean=3245622.44,
> fsPreadLatencyHistogramCount=36742.00,
> fsPreadLatencyHistogramMedian=3180486.50,
> fsPreadLatencyHistogram75th=3733068.50,
> fsPreadLatencyHistogram95th=6424537.75,
> fsPreadLatencyHistogram99th=24206578.00,
> fsPreadLatencyHistogram999th=24206578.00,
> fsWriteLatencyHistogramMean=356278.05,
> fsWriteLatencyHistogramCount=2983888.00,
> fsWriteLatencyHistogramMedian=538147.00,
> fsWriteLatencyHistogram75th=570668.00,
> fsWriteLatencyHistogram95th=570668.00,
> fsWriteLatencyHistogram99th=570668.00,
> fsWriteLatencyHistogram999th=570668.00
>
> 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> STOPPED: regionserver:60020-0x1493f19431b0000
> regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> aborting
>
> 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
>
> EventThread shut down
>
> 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
>
> Stopping server on 60020
>
> 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 0 on 60020: exiting
>
> 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 1 on 60020: exiting
>
> 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 0 on 60020: exiting
>
> 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
>
> Stopping IPC Server listener on 60020
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 3 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 8 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 4 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 3 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.
> HBaseServer
>
> PRI IPC Server handler 6 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 2 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 6 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 5 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 7 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 2 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
>
> Sending interrupt to stop the worker thread
>
> 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Stopping infoServer
>
> 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 4 on 60020: exiting
>
> 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
>
> Stopping IPC Server Responder
>
> 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 7 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 8 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 1 on 60020: exiting
>
> 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
>
> PRI IPC Server handler 9 on 60020: exiting
>
> 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 5 on 60020: exiting
>
> 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
>
> Stopping IPC Server Responder
>
> 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
>
> IPC Server handler 9 on 60020: exiting
>
> 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
>
> SplitLogWorker interrupted while waiting for task, exiting:
> java.lang.InterruptedException
>
> 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
>
> REPL IPC Server handler 0 on 60020: exiting
>
> 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
>
> SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting
>
> 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
>
> REPL IPC Server handler 2 on 60020: exiting
>
> 09:17:20.768INFOorg.mortbay.log
>
> Stopped SelectChannelConnector@0.0.0.0:60030
>
> 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
>
> REPL IPC Server handler 1 on 60020: exiting
>
> 09:17:20.908INFO
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
>
> Stopping RegionServerSnapshotManager abruptly.
>
> 09:17:20.908INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
>
> regionserver60020.compactionChecker exiting
>
> 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
>
> LogRoller exiting.
>
> 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
>
> regionserver60020.cacheFlusher exiting
>
> 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
>
> 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Waiting on 3 regions to close
>
> 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Closed CF
>
> 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
>
> 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Closed info
>
> 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Closed .META.,,1.1028785192
>
> 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Closed CF
>
> 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
>
> Memstore size is 13376
>
> 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Closed
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
>
> 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all
> regions closed.
>
> 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
>
> regionserver60020.logSyncer exiting
>
> 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
>
> regionserver60020 closing leases
>
> 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
>
> regionserver60020 closed leases
>
> 09:17:22.021INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
>
> regionserver60020.periodicFlusher exiting
>
> 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
>
> Possibly transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
>
> 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
>
> Sleeping 2000ms before retry #1...
>
> 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
>
> regionserver60020.leaseChecker closing leases
>
> 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
>
> regionserver60020.leaseChecker closed leases
>
> 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
>
> Possibly transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
>
> 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
>
> Sleeping 4000ms before retry #2...
>
> 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
>
> Possibly transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
>
> 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
>
> Sleeping 8000ms before retry #3...
>
> 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
>
> Possibly transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
>
> 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
>
> ZooKeeper delete failed after 3 retries
>
> 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Failed deleting my ephemeral node
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> at java.lang.Thread.run(Thread.java:662)
>
> 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> zookeeper connection closed.
>
> 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> regionserver60020 exiting
>
> 09:17:36.051ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
>
> Region server exiting
> java.lang.RuntimeException: HRegionServer Aborted
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
>
> 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
>
> Shutdown hook starting; hbase.shutdown.hook=true;
>
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
>
> 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> STOPPED: Shutdown hook
>
> 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
>
> Starting fs shutdown hook thread.
>
> 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
>
> Shutdown hook finished.
>
> 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
>
> HBase 0.94.15-cdh4.7.0
>
> 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
>
> Subversion
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> -r Unknown
>
> 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
>
> Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
>
> 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
>
> vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> Inc., vmVersion=20.6-b01
>
> 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
>
> vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
>
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
>
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
>
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> -Dhbase.security.logger=INFO,DRFAS]
>
> 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
>
> Starting Thread-2
>
> 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
>
> Starting Thread-2
>
>
> this FATAL
>
> 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> ABORTING region server
> etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> regionserver:60020-0x1493f19431b0000
> regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper,
> aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> found on each RS
> I didn' t see any problems with ZK or memory usage. The load is extremely
> low, just few put/get in a second, you can see file sizes in logs.
> What it could be how can i catch the root cause?
>