You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Birger Brunswiek (JIRA)" <ji...@apache.org> on 2017/06/23 12:03:00 UTC

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

     [ https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Birger Brunswiek updated HIVE-16949:
------------------------------------
    Description: 
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for which is not shutdown upon completion of its threads. This leads to a leak of threads. They are not removed by the GC. When queries spanning multiple partitions are made the number of threads increases and is never reduced. On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]. I am not sure why this is not the case. I would add a _pool.shutdown()_ just [after the pool has completed its work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137] to make sure the threads are really shutdown. This, however, would only fix normal operation. There are other exit points, namely through exceptions, which would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118] [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for which is not shutdown upon completion of its threads. This leads to a leak of threads. They are not removed by the GC. When queries spanning multiple partitions are made the number of threads increases and is never reduced. On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]. I am not sure why this is not the case. I would add a _pool.shutdown()_ just [after the pool has completed its work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137] to make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118] [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].




> Leak of threads from Get-Input-Paths thread pool when more than 1 used in query
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-16949
>                 URL: https://issues.apache.org/jira/browse/HIVE-16949
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for which is not shutdown upon completion of its threads. This leads to a leak of threads. They are not removed by the GC. When queries spanning multiple partitions are made the number of threads increases and is never reduced. On my machine hiveserver2 starts to get slower and slower once 10k threads are reached.
> Thread pools should be should be [shutdown automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]. I am not sure why this is not the case. I would add a _pool.shutdown()_ just [after the pool has completed its work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137] to make sure the threads are really shutdown. This, however, would only fix normal operation. There are other exit points, namely through exceptions, which would still lead to the same leak of threads.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. This prevents the the thread pool from being spawned [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118] [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)