You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Rajeshbabu Chintaguntla (Jira)" <ji...@apache.org> on 2022/11/25 06:01:00 UTC

[jira] [Work started] (HBASE-27509) Possible region gets stuck in CLOSING state

     [ https://issues.apache.org/jira/browse/HBASE-27509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-27509 started by Rajeshbabu Chintaguntla.
-------------------------------------------------------
> Possible region gets stuck in CLOSING state
> -------------------------------------------
>
>                 Key: HBASE-27509
>                 URL: https://issues.apache.org/jira/browse/HBASE-27509
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.3.4
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 3.0.0-alpha-4
>
>
> There is a possible chance of region gets stuck in closing state could be because of race between the flush and close or some where the readlock acquired on the region is not getting released.
> {noformat}
> "MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x00005639266a4000 nid=0x296e waiting on condition [0x00007fdc48a63000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>     at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>     at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> "MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x00005639266a2800 nid=0x296d waiting on condition [0x00007fdc48b64000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>     at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>     at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
>     at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> {noformat} 
> {noformat}
> "RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 os_prio=0 tid=0x00007fdc05448800 nid=0x15d1 waiting on condition [0x00007fdc1befd000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>     at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
>     at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
>     - locked <0x00007fdf42ddf358> (a java.lang.Object)
>     at org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
>     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> {noformat}
> From one of the region server logs flushed has started and replay edits of flush added then close requested then it got stuck in the transition and no further processing of the requests on the region.
> {noformat}
>   2022-11-23 05:51:00,503 INFO  [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed memstore data size=232.14 MB at sequenceid=29163855 (bloomFilter=true), to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/.tmp/cf1/c20f27a7407643b18558331d95f7a67f
>   2022-11-23 05:51:00,530 INFO  [MemStoreFlusher.1] regionserver.HStore: Added hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/cf1/c20f27a7407643b18558331d95f7a67f, entries=231413, sequenceid=29163855, filesize=233.7 M
>   2022-11-23 05:51:00,536 INFO  [MemStoreFlusher.1] regionserver.HRegion: Finished flush of dataSize ~232.14 MB/243420836, heapSize ~256.00 MB/268439136, currentSize=719.06 KB/736320 for 8f342bca97c115f8bce460a998e4afbc in 2146ms, sequenceid=29163855, compaction requested=true
>   2022-11-23 05:51:00,541 INFO  [MemStoreFlusher.1] regionserver.HRegion: Flushing d92c7546be62225859dd641aa88992ea 1/1 column families, dataSize=232.62 MB heapSize=256.53 MB
>   2022-11-23 05:51:00,645 INFO  [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed memstore data size=232.15 MB at sequenceid=29037517 (bloomFilter=true), to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/.tmp/cf1/eb542141078e4f879128498d01517e98
>   2022-11-23 05:51:00,705 INFO  [MemStoreFlusher.0] regionserver.HStore: Added hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/cf1/eb542141078e4f879128498d01517e98, entries=231402, sequenceid=29037517, filesize=233.7 M
>   2022-11-23 05:51:00,706 INFO  [MemStoreFlusher.0] regionserver.HRegion: Finished flush of dataSize ~232.15 MB/243424206, heapSize ~256.01 MB/268442768, currentSize=935.82 KB/958277 for 7beadd786ffc23edc074238c873f800b in 2324ms, sequenceid=29037517, compaction requested=true
>   2022-11-23 05:51:00,708 INFO  [MemStoreFlusher.0] regionserver.HRegion: Flushing 6c6694dce02a13f8109ecc3dd70009d5 1/1 column families, dataSize=232.68 MB heapSize=256.60 MB
>   2022-11-23 05:51:00,771 INFO  [RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0] handler.UnassignRegionHandler: Close 8f342bca97c115f8bce460a998e4afbc
> {noformat}
> {noformat}
> 2022-11-23 07:37:06,887 WARN  [RpcServer.default.FPBQ.Fifo.handler=28,queue=8,port=16020] regionserver.HRegion: Region is too busy to allow lock acquisition.
> org.apache.hadoop.hbase.RegionTooBusyException: Failed to obtain lock; regionName=hbase_perf_wl2_1000m,user8104,1668039117539.8f342bca97c115f8bce460a998e4afbc., server=sl73tskrnsqln00107.visa.com,16020,1669131710569
> 	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8726)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8705)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8610)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3161)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2985)
> 	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45517)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)