You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/01/30 02:46:00 UTC

[jira] [Commented] (SPARK-26776) Reduce Py4J communication cost in PySpark's execution barrier check

    [ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755602#comment-16755602 ] 

Apache Spark commented on SPARK-26776:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23690

> Reduce Py4J communication cost in PySpark's execution barrier check
> -------------------------------------------------------------------
>
>                 Key: SPARK-26776
>                 URL: https://issues.apache.org/jira/browse/SPARK-26776
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> I am investigating flaky tests. I realised that:
> {code}
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 2512, in __init__
>         self.is_barrier = prev._is_barrier() or isFromBarrier
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 2412, in _is_barrier
>         return self._jrdd.rdd().isBarrier()
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in __call__
>         answer, self.gateway_client, self.target_id, self.name)
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 342, in get_return_value
>         return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 2492, in <lambda>
>         lambda target_id, gateway_client: JavaObject(target_id, gateway_client))
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1324, in __init__
>         ThreadSafeFinalizer.add_finalizer(key, value)
>       File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/finalizer.py", line 43, in add_finalizer
>         cls.finalizers[id] = weak_ref
>       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 216, in __exit__
>         self.release()
>       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 208, in release
>         self.__block.release()
>     error: release unlocked lock
> {code}
> I assume it might not be directly related with the test itself but I noticed that it prev._is_barrier() attempts to access via Py4J.
> Accessing via Py4J is expensive and IMHO it makes it flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org