You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/07/27 03:19:00 UTC
[jira] [Created] (SPARK-28536) Reduce shuffle partitions in Python
UDF tests in SQLQueryTestSuite
Hyukjin Kwon created SPARK-28536:
------------------------------------
Summary: Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite
Key: SPARK-28536
URL: https://issues.apache.org/jira/browse/SPARK-28536
Project: Spark
Issue Type: Test
Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon
Currently, some SQL tests with Python UDFs takes long.
In my local:
{code:java}
[info] SQLQueryTestSuite:
[info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds)
[info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds)
[info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds){code}
and it takes up to 9 mins in Jenkins currently.
In Python UDF tests, the number of shuffle partitions matter considerably in testing time because it requires to fork and communicate between external processes. We should reduce the number of it.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org