You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2014/09/15 20:42:35 UTC
[jira] [Commented] (HBASE-11145) UNEXPECTED!!! when HLog sync:
Queue full
[ https://issues.apache.org/jira/browse/HBASE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134289#comment-14134289 ]
stack commented on HBASE-11145:
-------------------------------
I just saw this in trunk build https://builds.apache.org/job/HBase-TRUNK/5506/testReport/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/. Here is log snippet:
{code}
...
2014-09-15 17:32:48,061 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-16] master.RegionStateStore(207): Updating row tableRestart,t\xE6\xAD\xCA3,1410802367309.27875e0987d3cef64f1ed9d4e7d27984. with state=PENDING_CLOSE
2014-09-15 17:32:48,061 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-17] master.RegionStateStore(207): Updating row tableRestart,vK\xA0\xF6B,1410802367312.078a7e49e3cd6d0fdf4b41d90fb7c533. with state=PENDING_CLOSE
2014-09-15 17:32:48,063 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,065 DEBUG [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.AssignmentManager(1268): Starting unassign of tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5. (offlining), current state: {3d23208c70ac74d7c22ad61a6d9db6f5 state=OPEN, ts=1410802367906, server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO [PriorityRpcServer.handler=7,queue=0,port=54953] regionserver.RSRpcServices(929): Close 26dfe46c64d29c8dfe378efdbe31d81f, moving to null
2014-09-15 17:32:48,065 DEBUG [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.AssignmentManager(1268): Starting unassign of tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946. (offlining), current state: {d1983f1b70a305ffd02421b5d6ae4946 state=OPEN, ts=1410802367949, server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,065 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.RegionStates(951): Transition {3d23208c70ac74d7c22ad61a6d9db6f5 state=OPEN, ts=1410802367906, server=asf907.gq1.ygridcore.net,54953,1410802364919} to {3d23208c70ac74d7c22ad61a6d9db6f5 state=PENDING_CLOSE, ts=1410802368065, server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO [PriorityRpcServer.handler=4,queue=0,port=55026] regionserver.RSRpcServices(929): Close 1a8f62b0ca9197d9bbb923f02198eccb, moving to null
2014-09-15 17:32:48,066 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.RegionStateStore(207): Updating row tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5. with state=PENDING_CLOSE
2014-09-15 17:32:48,065 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.RegionStates(951): Transition {d1983f1b70a305ffd02421b5d6ae4946 state=OPEN, ts=1410802367949, server=asf907.gq1.ygridcore.net,53414,1410802365106} to {d1983f1b70a305ffd02421b5d6ae4946 state=PENDING_CLOSE, ts=1410802368065, server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,067 INFO [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.RegionStateStore(207): Updating row tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946. with state=PENDING_CLOSE
2014-09-15 17:32:48,067 INFO [PriorityRpcServer.handler=5,queue=0,port=55026] regionserver.RSRpcServices(929): Close a6b10ca180218b7fab4c70ca2be7cffc, moving to null
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,067 DEBUG [Thread-1] client.HBaseAdmin(995): Sleeping= 100ms, waiting for all regions to be disabled in tableRestart
2014-09-15 17:32:48,068 DEBUG [RS_CLOSE_REGION-asf907:55026-0] handler.CloseRegionHandler(90): Processing close of tableRestart,k$\x07\x95\xCA,1410802367278.a6b10ca180218b7fab4c70ca2be7cffc.
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1839)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,069 DEBUG [RS_CLOSE_REGION-asf907:55026-0] regionserve
...
{code}
We so the queue full and then index out of bounds dance.
Let me do as [~anoop.hbase] suggests.
> UNEXPECTED!!! when HLog sync: Queue full
> ----------------------------------------
>
> Key: HBASE-11145
> URL: https://issues.apache.org/jira/browse/HBASE-11145
> Project: HBase
> Issue Type: Bug
> Reporter: Anoop Sam John
> Assignee: stack
> Priority: Critical
> Fix For: 0.99.1
>
> Attachments: 11145.txt
>
>
> Got the below Exceptions Log in case of a write heavy test
> {code}
> 2014-05-07 11:29:56,417 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.IllegalStateException: Queue full
> at java.util.AbstractQueue.add(Unknown Source)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1227)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1878)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,418 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 5
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 6
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 7
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {code}
> In FSHLog$SyncRunner.offer we do BlockingQueue.add() which throws Exception as it is full. The problem is the below shown catch() we do not do any cleanup.
> {code}
> this.syncRunners[index].offer(sequence, this.syncFutures, this.syncFuturesCount);
> attainSafePoint(sequence);
> this.syncFuturesCount = 0;
> } catch (Throwable t) {
> LOG.error("UNEXPECTED!!!", t);
> }
> {code}
> syncFuturesCount is not getting reset to 0 and so the subsequent onEvent() handling throws ArrayIndexOutOfBoundsException.
> I think we should do the below
> 1. Handle the Exception and call cleanupOutstandingSyncsOnException() as in other cases of Exception handling
> 2. Instead of BlockingQueue.add() use offer() (?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)