You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/23 08:08:58 UTC

[jira] [Comment Edited] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test

    [ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772269#comment-15772269 ] 

Paul Rogers edited comment on DRILL-5156 at 12/23/16 8:08 AM:
--------------------------------------------------------------

The problem appears to be a bug in {{BootStrapContext}} which creates two thread pools, but does not close them. The two pools are for the "BitClient-n" and "BitServer-n" threads. During close, the {{BootStrapContext.close()}} method closes the allocator but leaves the threads running.

Since they are left running, the BitClient thread attempts to use the (now closed) allocator and triggers the {{IllegalStateException}}. This behavior is easy to see by setting the breakpoint described above. Leave the thread stopped at that breakpoint. The rest of the Drillbit shuts down around the suspended thread, showing that the Drillbit did not wait for the thread.

The fix is simple:

{code}
  public void close() {
    try {
      loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Client shutdown.", e);
    }
    try {
      loop.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Server shutdown.", e);
    }
    ...
{code}

After this fix, the test case runs fine with no {{IllegalStateException}}s.


was (Author: paul-rogers):
The problem appears to be a bug in {{BootStrapContext}} which creates two thread pools, but does not close them. The two pools are for the "BitClient-n" and "BitServer-n" threads. During close, the {{BootStrapContext.close()}} method closes the allocator but leaves the threads running.

Since they are left running, the BitClient thread attempts to use the (now closed) allocator and triggers the {{IllegalStateException}}. This behavior is easy to see by setting the breakpoint described above. Leave the thread stopped at that breakpoint. The rest of the Drillbit shuts down around the suspended thread, showing that the Drillbit did not wait for the thread.

The fix is simple:

{code}
  public void close() {
    try {
      loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Client shutdown.", e);
    }
    try {
      loop.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Server shutdown.", e);
    }
    ...
{code}

After this fix, the test case runs fine with no {{IllegalStateExceptions}}.

> Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5156
>                 URL: https://issues.apache.org/jira/browse/DRILL-5156
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> RPC thread attempts to access a closed allocator during the {{TestDrillbitResilience}} unit test.
> Set a Java exception breakpoint for {{IllegalStateException}}. Run the {{TestDrillbitResilience}} unit tests.
> You will see quite a few exceptions, including the following in a thread called BitClient-1:
> {code}
> RootAllocator(BaseAllocator).assertOpen() line 109
> RootAllocator(BaseAllocator).buffer(int) line 191
> DrillByteBufAllocator.buffer(int) line 49
> DrillByteBufAllocator.ioBuffer(int) line 64
> AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104
> NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117
> ...
> NioEventLoop.run() line 354
> {code}
> The test continues (then fails for some other reason), which is why this is marked as minor. Still, it seems odd that the client thread should attempt to access a closed allocator.
> At this point, it is not clear how we got into this state. The test itself is waiting for a response from the server in the {{tailsAfterMSorterSorting}} test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)