You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/07/30 01:16:00 UTC

[jira] [Resolved] (SPARK-32010) Thread leaks in pinned thread mode

     [ https://issues.apache.org/jira/browse/SPARK-32010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-32010.
----------------------------------
    Fix Version/s: 3.1.0
       Resolution: Fixed

Issue resolved by pull request 28968
[https://github.com/apache/spark/pull/28968]

> Thread leaks in pinned thread mode
> ----------------------------------
>
>                 Key: SPARK-32010
>                 URL: https://issues.apache.org/jira/browse/SPARK-32010
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.1.0
>
>
> SPARK-22340 introduced a pin thread mode which guarantees you to sync Python thread and JVM thread.
> However, looks like the JVM threads are not finished even when the Python thread is finished. It can be debugged via YourKit, and run multiple jobs with multiple threads at the same time.
> Easiest reproducer is:
> {code}
> PYSPARK_PIN_THREAD=true ./bin/pyspark
> {code}
> {code}
> >>> from threading import Thread
> >>> Thread(target=lambda: spark.range(1000).collect()).start()
> >>> Thread(target=lambda: spark.range(1000).collect()).start()
> >>> Thread(target=lambda: spark.range(1000).collect()).start()
> >>> spark._jvm._gateway_client.deque
> deque([<py4j.clientserver.ClientServerConnection object at 0x119f7aba8>, <py4j.clientserver.ClientServerConnection object at 0x119fc9b70>, <py4j.clientserver.ClientServerConnection object at 0x119fc9e10>, <py4j.clientserver.ClientServerConnection object at 0x11a015358>, <py4j.clientserver.ClientServerConnection object at 0x119fc00f0>])
> >>> Thread(target=lambda: spark.range(1000).collect()).start()
> >>> spark._jvm._gateway_client.deque
> deque([<py4j.clientserver.ClientServerConnection object at 0x119f7aba8>, <py4j.clientserver.ClientServerConnection object at 0x119fc9b70>, <py4j.clientserver.ClientServerConnection object at 0x119fc9e10>, <py4j.clientserver.ClientServerConnection object at 0x11a015358>, <py4j.clientserver.ClientServerConnection object at 0x119fc08d0>, <py4j.clientserver.ClientServerConnection object at 0x119fc00f0>])
> {code}
> The connection doesn't get closed, and it holds JVM thread running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org