You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Daniel Barclay <db...@maprtech.com> on 2015/07/10 22:10:02 UTC

Threads left after Drillbit shutdown (in dev./unit tests)

Is Drill terminating threads correctly?

In running jstack on a JVM running a dev. test run that ended up hung
after getting about three test timeout errors, I see that there are
409 threads.

Although 138 of those are not-unexpected ShutdownHook threads (since
many tests are run in one VM), there are:
- 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
-   7 "Client-1" threads
-   4 "UserServer-1" threads
-  21 "BitClient-1" threads
-   4 "BitClient-2" threads
-   3 "BitClient-3" threads
-   8 "BitServer-1" threads
-   8 "BitServer-2" threads
-   7 "BitServer-3" threads
-   7 "BitServer-4" threads
-   7 "BitServer-5" threads
-   6 "BitServer-6" threads
-   6 "BitServer-7" threads
-   6 "BitServer-8" threads
-   5 "BitServer-9" threads
-   5 "BitServer-10" threads
(Other thread names have only 1 or 2 occurrences.)

Regarding the 4 for the number of "UserServer-1" threads:  Three test
methods had timeout failures plus one got hung.


Here's the tail end of the output from the test running, including
all the timeout errors and including the hang (except for repeated
query-results data lines).



dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install

<TRIMMED>

Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
00:44:34.017 [globalEventExecutor-1-523] ERROR o.a.z.server.NIOServerCnxnFactory - Thread Thread[globalEventExecutor-1-523,5,main] died
java.lang.AssertionError: null
	at io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
	at io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
	at io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
	at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
10000
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec <<< FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange
TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)  Time elapsed: 50.063 sec  <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
	at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
	at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)

TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)  Time elapsed: 50.014 sec  <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
	at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
	at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
	at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)

Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
ok	summary
true	planner.slice_target updated.
Total rows returned : 1.  Returned in 38ms.
Jul 10, 2015 12:47:20 AM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' not found
Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 64 to line 1, column 66: Table 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' not found
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec <<< FAILURE! - in org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender)  Time elapsed: 50.023 sec  <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
	at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
	at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
	at com.google.common.io.Closeables.close(Closeables.java:77)
	at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
	at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
	at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
	at org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
	at org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)

Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
blue	red	green
2147483647	9223372036854775807	2147483647
2147483647	9223372036854775807	2147483647

<TRIMMED>

2147483647	9223372036854775807	2147483647
Total rows returned : 50.  Returned in 181ms.
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
blue	red	green
true	-9223372036854775808	-2147483648
false	9223372036854775807	null

<TRIMMED>

true	-9223372036854775808	-2147483648
false	9223372036854775807	null
Total rows returned : 100.  Returned in 54ms.
   C-c C-c
real	708m38.962s
user	11m3.332s
sys	1m8.068s
[130]dbarclay@dev-linux2 ~/work/git/incubator-drill $





Daniel
-- 
Daniel Barclay
MapR Technologies

Re: Threads left after Drillbit shutdown (in dev./unit tests)

Posted by Hanifi Gunes <hg...@maprtech.com>.
Is there any way to re-produce this at a smaller scale? Have you tried
failing a couple of tests and dumping threads?

-Hanifi
Thanks

On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay <db...@maprtech.com>
wrote:

> Is Drill terminating threads correctly?
>
> In running jstack on a JVM running a dev. test run that ended up hung
> after getting about three test timeout errors, I see that there are
> 409 threads.
>
> Although 138 of those are not-unexpected ShutdownHook threads (since
> many tests are run in one VM), there are:
> - 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
> -   7 "Client-1" threads
> -   4 "UserServer-1" threads
> -  21 "BitClient-1" threads
> -   4 "BitClient-2" threads
> -   3 "BitClient-3" threads
> -   8 "BitServer-1" threads
> -   8 "BitServer-2" threads
> -   7 "BitServer-3" threads
> -   7 "BitServer-4" threads
> -   7 "BitServer-5" threads
> -   6 "BitServer-6" threads
> -   6 "BitServer-7" threads
> -   6 "BitServer-8" threads
> -   5 "BitServer-9" threads
> -   5 "BitServer-10" threads
> (Other thread names have only 1 or 2 occurrences.)
>
> Regarding the 4 for the number of "UserServer-1" threads:  Three test
> methods had timeout failures plus one got hung.
>
>
> Here's the tail end of the output from the test running, including
> all the timeout errors and including the hang (except for repeated
> query-results data lines).
>
>
>
> dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install
>
> <TRIMMED>
>
> Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec
> - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
> 00:44:34.017 [globalEventExecutor-1-523] ERROR
> o.a.z.server.NIOServerCnxnFactory - Thread
> Thread[globalEventExecutor-1-523,5,main] died
> java.lang.AssertionError: null
>         at
> io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
> 10000
> Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599
> sec <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.TestBroadcastExchange
> TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.063 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
>
> TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.014 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)
>
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
> ok      summary
> true    planner.slice_target updated.
> Total rows returned : 1.  Returned in 38ms.
> Jul 10, 2015 12:47:20 AM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
> column 64 to line 1, column 66: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec
> <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender)
> Time elapsed: 50.023 sec  <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
>         at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
>         at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
>         at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
>         at com.google.common.io.Closeables.close(Closeables.java:77)
>         at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
>         at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
>         at
> org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
>         at
> org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
>         at
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)
>
> Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in
> org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
> blue    red     green
> 2147483647        9223372036854775807     2147483647
> 2147483647        9223372036854775807     2147483647
>
> <TRIMMED>
>
> 2147483647        9223372036854775807     2147483647
> Total rows returned : 50.  Returned in 181ms.
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
> blue    red     green
> true    -9223372036854775808    -2147483648
> false   9223372036854775807     null
>
> <TRIMMED>
>
> true    -9223372036854775808    -2147483648
> false   9223372036854775807     null
> Total rows returned : 100.  Returned in 54ms.
>   C-c C-c
> real    708m38.962s
> user    11m3.332s
> sys     1m8.068s
> [130]dbarclay@dev-linux2 ~/work/git/incubator-drill $
>
>
>
>
>
> Daniel
> --
> Daniel Barclay
> MapR Technologies
>