You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2014/09/15 20:42:35 UTC

[jira] [Commented] (HBASE-11145) UNEXPECTED!!! when HLog sync: Queue full

    [ https://issues.apache.org/jira/browse/HBASE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134289#comment-14134289 ] 

stack commented on HBASE-11145:
-------------------------------

I just saw this in trunk build https://builds.apache.org/job/HBase-TRUNK/5506/testReport/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/.  Here is log snippet:

{code}
...
2014-09-15 17:32:48,061 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-16] master.RegionStateStore(207): Updating row tableRestart,t\xE6\xAD\xCA3,1410802367309.27875e0987d3cef64f1ed9d4e7d27984. with state=PENDING_CLOSE
2014-09-15 17:32:48,061 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-17] master.RegionStateStore(207): Updating row tableRestart,vK\xA0\xF6B,1410802367312.078a7e49e3cd6d0fdf4b41d90fb7c533. with state=PENDING_CLOSE
2014-09-15 17:32:48,063 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,065 DEBUG [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.AssignmentManager(1268): Starting unassign of tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5. (offlining), current state: {3d23208c70ac74d7c22ad61a6d9db6f5 state=OPEN, ts=1410802367906, server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO  [PriorityRpcServer.handler=7,queue=0,port=54953] regionserver.RSRpcServices(929): Close 26dfe46c64d29c8dfe378efdbe31d81f, moving to null
2014-09-15 17:32:48,065 DEBUG [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.AssignmentManager(1268): Starting unassign of tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946. (offlining), current state: {d1983f1b70a305ffd02421b5d6ae4946 state=OPEN, ts=1410802367949, server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,065 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.RegionStates(951): Transition {3d23208c70ac74d7c22ad61a6d9db6f5 state=OPEN, ts=1410802367906, server=asf907.gq1.ygridcore.net,54953,1410802364919} to {3d23208c70ac74d7c22ad61a6d9db6f5 state=PENDING_CLOSE, ts=1410802368065, server=asf907.gq1.ygridcore.net,54953,1410802364919}
2014-09-15 17:32:48,065 INFO  [PriorityRpcServer.handler=4,queue=0,port=55026] regionserver.RSRpcServices(929): Close 1a8f62b0ca9197d9bbb923f02198eccb, moving to null
2014-09-15 17:32:48,066 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-18] master.RegionStateStore(207): Updating row tableRestart,w\xB0\x94"Q,1410802367316.3d23208c70ac74d7c22ad61a6d9db6f5. with state=PENDING_CLOSE
2014-09-15 17:32:48,065 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.RegionStates(951): Transition {d1983f1b70a305ffd02421b5d6ae4946 state=OPEN, ts=1410802367949, server=asf907.gq1.ygridcore.net,53414,1410802365106} to {d1983f1b70a305ffd02421b5d6ae4946 state=PENDING_CLOSE, ts=1410802368065, server=asf907.gq1.ygridcore.net,53414,1410802365106}
2014-09-15 17:32:48,067 INFO  [asf907.gq1.ygridcore.net,57548,1410802364544-org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler-19] master.RegionStateStore(207): Updating row tableRestart,y\x15\x87N`,1410802367319.d1983f1b70a305ffd02421b5d6ae4946. with state=PENDING_CLOSE
2014-09-15 17:32:48,067 INFO  [PriorityRpcServer.handler=5,queue=0,port=55026] regionserver.RSRpcServices(929): Close a6b10ca180218b7fab4c70ca2be7cffc, moving to null
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1215)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1879)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,067 DEBUG [Thread-1] client.HBaseAdmin(995): Sleeping= 100ms, waiting for all regions to be disabled in tableRestart
2014-09-15 17:32:48,068 DEBUG [RS_CLOSE_REGION-asf907:55026-0] handler.CloseRegionHandler(90): Processing close of tableRestart,k$\x07\x95\xCA,1410802367278.a6b10ca180218b7fab4c70ca2be7cffc.
2014-09-15 17:32:48,067 ERROR [RS_OPEN_META-asf907:57548-0.append-pool13-t1] wal.FSHLog$RingBufferEventHandler(1883): UNEXPECTED!!! syncFutures.length=5
java.lang.ArrayIndexOutOfBoundsException: 5
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1839)
	at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1798)
	at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
2014-09-15 17:32:48,069 DEBUG [RS_CLOSE_REGION-asf907:55026-0] regionserve
...
{code}

We so the queue full and then index out of bounds dance.

Let me do as [~anoop.hbase] suggests.

> UNEXPECTED!!! when HLog sync: Queue full
> ----------------------------------------
>
>                 Key: HBASE-11145
>                 URL: https://issues.apache.org/jira/browse/HBASE-11145
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anoop Sam John
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.99.1
>
>         Attachments: 11145.txt
>
>
> Got the below Exceptions Log in case of a write heavy test
> {code}
> 2014-05-07 11:29:56,417 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.IllegalStateException: Queue full
>  at java.util.AbstractQueue.add(Unknown Source)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.offer(FSHLog.java:1227)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1878)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,418 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 5
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 6
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> 2014-05-07 11:29:56,419 ERROR [main.append-pool1-t1] wal.FSHLog$RingBufferEventHandler(1882): UNEXPECTED!!!
> java.lang.ArrayIndexOutOfBoundsException: 7
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1838)
>  at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1)
>  at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
>  {code}
> In FSHLog$SyncRunner.offer we do BlockingQueue.add() which throws Exception as it is full. The problem is the below shown catch() we do not do any cleanup.
> {code}
> this.syncRunners[index].offer(sequence, this.syncFutures, this.syncFuturesCount);
>         attainSafePoint(sequence);
>         this.syncFuturesCount = 0;
>       } catch (Throwable t) {
>         LOG.error("UNEXPECTED!!!", t);
>       }
> {code}
> syncFuturesCount is not getting reset to 0 and so the subsequent onEvent() handling throws ArrayIndexOutOfBoundsException.
> I think we should do the below 
> 1. Handle the Exception and call cleanupOutstandingSyncsOnException() as in other cases of Exception handling
> 2. Instead of BlockingQueue.add() use offer() (?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)