You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/07/27 17:47:22 UTC

[spark] branch master updated: [SPARK-28536][SQL][PYTHON][TESTS] Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 8ce1ae5  [SPARK-28536][SQL][PYTHON][TESTS] Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite
8ce1ae5 is described below

commit 8ce1ae52db189f751066648e91e1727318932e35
Author: HyukjinKwon <gu...@apache.org>
AuthorDate: Sat Jul 27 10:46:35 2019 -0700

    [SPARK-28536][SQL][PYTHON][TESTS] Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite
    
    ## What changes were proposed in this pull request?
    
    In Python UDF tests, the number of shuffle partitions matters considerably in the testing time because it requires to fork and communicate between external processes.
    
    **Before:**
    
    ![image](https://user-images.githubusercontent.com/6477701/61989374-465c0080-b069-11e9-9936-b386d0cccf7a.png)
    
    **After: (with 4)**
    
    ![Screen Shot 2019-07-27 at 10 43 34 AM](https://user-images.githubusercontent.com/9700541/61997757-743a4880-b05b-11e9-9180-8d0976bda3bd.png)
    
    ## How was this patch tested?
    
    Manually tested in my local.
    
    **Before:**
    
    ```
    [info] SQLQueryTestSuite:
    [info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds)
    [info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds)
    [info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds)
    ```
    
    **After:**
    
    ```
    [info] SQLQueryTestSuite:
    [info] - udf/udf-window.sql - Scala UDF (14 seconds, 690 milliseconds)
    [info] - udf/udf-window.sql - Regular Python UDF (10 seconds, 467 milliseconds)
    [info] - udf/udf-window.sql - Scalar Pandas UDF (10 seconds, 895 milliseconds)
    ```
    
    Closes #25271 from HyukjinKwon/SPARK-28536.
    
    Authored-by: HyukjinKwon <gu...@apache.org>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
index e4052b7..726b806 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
@@ -279,6 +279,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext {
 
     testCase match {
       case udfTestCase: UDFTest =>
+        // In Python UDF tests, the number of shuffle partitions matters considerably in
+        // the testing time because it requires to fork and communicate between external
+        // processes.
+        localSparkSession.conf.set(SQLConf.SHUFFLE_PARTITIONS.key, 4)
         registerTestUDF(udfTestCase.udf, localSparkSession)
       case _ =>
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org