You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/07/27 03:19:00 UTC

[jira] [Created] (SPARK-28536) Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite

Hyukjin Kwon created SPARK-28536:
------------------------------------

             Summary: Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite
                 Key: SPARK-28536
                 URL: https://issues.apache.org/jira/browse/SPARK-28536
             Project: Spark
          Issue Type: Test
          Components: PySpark, SQL
    Affects Versions: 3.0.0
            Reporter: Hyukjin Kwon


Currently, some SQL tests with Python UDFs takes long.

In my local:


{code:java}
[info] SQLQueryTestSuite:
[info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds)
[info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds)
[info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds){code}

and it takes up to 9 mins in Jenkins currently.



In Python UDF tests, the number of shuffle partitions matter considerably in testing time because it requires to fork and communicate between external processes. We should reduce the number of it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org