You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Daniel Barclay <db...@maprtech.com> on 2015/07/10 22:10:02 UTC
Threads left after Drillbit shutdown (in dev./unit tests)
Is Drill terminating threads correctly?
In running jstack on a JVM running a dev. test run that ended up hung
after getting about three test timeout errors, I see that there are
409 threads.
Although 138 of those are not-unexpected ShutdownHook threads (since
many tests are run in one VM), there are:
- 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
- 7 "Client-1" threads
- 4 "UserServer-1" threads
- 21 "BitClient-1" threads
- 4 "BitClient-2" threads
- 3 "BitClient-3" threads
- 8 "BitServer-1" threads
- 8 "BitServer-2" threads
- 7 "BitServer-3" threads
- 7 "BitServer-4" threads
- 7 "BitServer-5" threads
- 6 "BitServer-6" threads
- 6 "BitServer-7" threads
- 6 "BitServer-8" threads
- 5 "BitServer-9" threads
- 5 "BitServer-10" threads
(Other thread names have only 1 or 2 occurrences.)
Regarding the 4 for the number of "UserServer-1" threads: Three test
methods had timeout failures plus one got hung.
Here's the tail end of the output from the test running, including
all the timeout errors and including the hang (except for repeated
query-results data lines).
dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install
<TRIMMED>
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
00:44:34.017 [globalEventExecutor-1-523] ERROR o.a.z.server.NIOServerCnxnFactory - Thread Thread[globalEventExecutor-1-523,5,main] died
java.lang.AssertionError: null
at io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
10000
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec <<< FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange
TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange) Time elapsed: 50.063 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange) Time elapsed: 50.014 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
ok summary
true planner.slice_target updated.
Total rows returned : 1. Returned in 38ms.
Jul 10, 2015 12:47:20 AM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' not found
Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 64 to line 1, column 66: Table 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' not found
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec <<< FAILURE! - in org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender) Time elapsed: 50.023 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
at org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
at org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)
Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
blue red green
2147483647 9223372036854775807 2147483647
2147483647 9223372036854775807 2147483647
<TRIMMED>
2147483647 9223372036854775807 2147483647
Total rows returned : 50. Returned in 181ms.
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
blue red green
true -9223372036854775808 -2147483648
false 9223372036854775807 null
<TRIMMED>
true -9223372036854775808 -2147483648
false 9223372036854775807 null
Total rows returned : 100. Returned in 54ms.
C-c C-c
real 708m38.962s
user 11m3.332s
sys 1m8.068s
[130]dbarclay@dev-linux2 ~/work/git/incubator-drill $
Daniel
--
Daniel Barclay
MapR Technologies
Re: Threads left after Drillbit shutdown (in dev./unit tests)
Posted by Hanifi Gunes <hg...@maprtech.com>.
Is there any way to re-produce this at a smaller scale? Have you tried
failing a couple of tests and dumping threads?
-Hanifi
Thanks
On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay <db...@maprtech.com>
wrote:
> Is Drill terminating threads correctly?
>
> In running jstack on a JVM running a dev. test run that ended up hung
> after getting about three test timeout errors, I see that there are
> 409 threads.
>
> Although 138 of those are not-unexpected ShutdownHook threads (since
> many tests are run in one VM), there are:
> - 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
> - 7 "Client-1" threads
> - 4 "UserServer-1" threads
> - 21 "BitClient-1" threads
> - 4 "BitClient-2" threads
> - 3 "BitClient-3" threads
> - 8 "BitServer-1" threads
> - 8 "BitServer-2" threads
> - 7 "BitServer-3" threads
> - 7 "BitServer-4" threads
> - 7 "BitServer-5" threads
> - 6 "BitServer-6" threads
> - 6 "BitServer-7" threads
> - 6 "BitServer-8" threads
> - 5 "BitServer-9" threads
> - 5 "BitServer-10" threads
> (Other thread names have only 1 or 2 occurrences.)
>
> Regarding the 4 for the number of "UserServer-1" threads: Three test
> methods had timeout failures plus one got hung.
>
>
> Here's the tail end of the output from the test running, including
> all the timeout errors and including the hang (except for repeated
> query-results data lines).
>
>
>
> dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install
>
> <TRIMMED>
>
> Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
> Running
> org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec
> - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
> Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
> 00:44:34.017 [globalEventExecutor-1-523] ERROR
> o.a.z.server.NIOServerCnxnFactory - Thread
> Thread[globalEventExecutor-1-523,5,main] died
> java.lang.AssertionError: null
> at
> io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
> Running
> org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
> 10000
> Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599
> sec <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.TestBroadcastExchange
> TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.063 sec <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
> at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
> at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at
> org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
> at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
>
> TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
> Time elapsed: 50.014 sec <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
> at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
> at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
> at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
> at
> org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)
>
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
> Running
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
> ok summary
> true planner.slice_target updated.
> Total rows returned : 1. Returned in 38ms.
> Jul 10, 2015 12:47:20 AM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
> column 64 to line 1, column 66: Table
> 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
> not found
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec
> <<< FAILURE! - in
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
> testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender)
> Time elapsed: 50.023 sec <<< ERROR!
> java.lang.Exception: test timed out after 50000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
> at
> io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
> at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
> at
> org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
> at
> org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at
> org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
> at com.google.common.io.Closeables.close(Closeables.java:77)
> at com.google.common.io
> .Closeables.closeQuietly(Closeables.java:108)
> at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
> at
> org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
> at
> org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
> at
> org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)
>
> Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in
> org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
> Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
> blue red green
> 2147483647 9223372036854775807 2147483647
> 2147483647 9223372036854775807 2147483647
>
> <TRIMMED>
>
> 2147483647 9223372036854775807 2147483647
> Total rows returned : 50. Returned in 181ms.
> Running
> org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
> blue red green
> true -9223372036854775808 -2147483648
> false 9223372036854775807 null
>
> <TRIMMED>
>
> true -9223372036854775808 -2147483648
> false 9223372036854775807 null
> Total rows returned : 100. Returned in 54ms.
> C-c C-c
> real 708m38.962s
> user 11m3.332s
> sys 1m8.068s
> [130]dbarclay@dev-linux2 ~/work/git/incubator-drill $
>
>
>
>
>
> Daniel
> --
> Daniel Barclay
> MapR Technologies
>