You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/12/03 08:17:21 UTC

[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/23203

    [SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script

    ## What changes were proposed in this pull request?
    
    This PR proposes add a developer option, `--testnames`, to our testing script to allow run specific set of unittests and doctests.
    
    
    **1. Run unittests in the class**
    
    ```
    ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'
    Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests']
    Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
    Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
    Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s)
    Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests were skipped
    Tests passed in 14 seconds
    
    Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
        test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
        test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
        test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
        test_createDataFrame_fallback_enabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
    ...
    ```
    
    **2. Run single unittest in the class.**
    
    ```
    ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'
    Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
    Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
    Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
    Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (0s) ... 1 tests were skipped
    Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (8s)
    Tests passed in 8 seconds
    
    Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion with pypy:
        test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    ```
    
    **3. Run doctests in single PySpark module.**
    
    ```
    ./run-tests --testnames 'pyspark.sql.dataframe'
    Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python tests: ['pyspark.sql.dataframe']
    Starting test(pypy): pyspark.sql.dataframe
    Starting test(python2.7): pyspark.sql.dataframe
    Finished test(python2.7): pyspark.sql.dataframe (47s)
    Finished test(pypy): pyspark.sql.dataframe (48s)
    Tests passed in 48 seconds
    ```
    
    Of course, you can mix them:
    
    ```
    ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe'
    \Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests', 'pyspark.sql.dataframe']
    Starting test(pypy): pyspark.sql.dataframe
    Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
    Starting test(python2.7): pyspark.sql.dataframe
    Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
    Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests were skipped
    Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s)
    Finished test(python2.7): pyspark.sql.dataframe (50s)
    Finished test(pypy): pyspark.sql.dataframe (52s)
    Tests passed in 52 seconds
    
    Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
        test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
        test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
        test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    ```
    
    and also you can use all other options (except `--modules`, which will be ignored)
    
    ```
    ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=python
    Running PySpark tests. Output is in /.../spark/python/unit-tests.log
    Will test against the following Python executables: ['python']
    Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion']
    Starting test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
    Finished test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (12s)
    Tests passed in 12 seconds
    ```
    
    See help below:
    
    ```
     ./run-tests --help
    Usage: run-tests [options]
    
    Options:
    ...
      Developer Options:
        --testnames=TESTNAMES
                            A comma-separated list of specific modules, classes
                            and functions of doctest or unittest to test. For
                            example, 'pyspark.sql.foo' to run the module as
                            unittests or doctests, 'pyspark.sql.tests FooTests' to
                            run the specific class of unittests,
                            'pyspark.sql.tests FooTests.test_foo' to run the
                            specific unittest in the class. '--modules' option is
                            ignored if they are given.
    ```
    
    I intentionally grouped it as a developer option to be more conservative.
    
    ## How was this patch tested?
    
    Manually tested. Negative tests were also done.
    
    ```
    $ ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python
    ...
    AttributeError: type object 'ArrowTests' has no attribute 'test_null_conversion1'
    ...
    ```
    
    ```
    ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python
    ...
    AttributeError: 'module' object has no attribute 'ArrowT'
    ...
    ```
    
    ```
     ./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python
    ...
    /.../python2.7: No module named pyspark.sql.tests.test_ar
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-26252

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23203
    
----
commit 44c622bf17ab642ef372d9a534b5bfc18c98a0da
Author: Hyukjin Kwon <gu...@...>
Date:   2018-12-03T08:02:35Z

    Add support to run specific unittests and/or doctests in python/run-tests script

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5656/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23203#discussion_r238868565
  
    --- Diff: python/run-tests.py ---
    @@ -93,17 +93,18 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
             "pyspark-shell"
         ]
         env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
    -
    -    LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
    +    str_test_name = " ".join(test_name)
    +    LOGGER.info("Starting test(%s): %s", pyspark_python, str_test_name)
         start_time = time.time()
         try:
             per_test_output = tempfile.TemporaryFile()
             retcode = subprocess.Popen(
    -            [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
    --- End diff --
    
    Just a thought, could you leave `test_name` as a string and then change this line to `[os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),`?  I think it would be a little more simple and wouldn't need `str_test_name`, wdyt?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23203#discussion_r238887812
  
    --- Diff: python/run-tests.py ---
    @@ -93,17 +93,18 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
             "pyspark-shell"
         ]
         env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
    -
    -    LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
    +    str_test_name = " ".join(test_name)
    +    LOGGER.info("Starting test(%s): %s", pyspark_python, str_test_name)
         start_time = time.time()
         try:
             per_test_output = tempfile.TemporaryFile()
             retcode = subprocess.Popen(
    -            [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
    --- End diff --
    
    Oh, yea. Looks that's going to reduce the diff. Let me try.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Yea, will update it as well after this one gets merged.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    I used to run pyspark test via `python python/pyspark/sql/dataframe.py`, after setting `export PYTHONPATH="$(find "${SPARK_HOME}"/python/lib/ -name 'py4j-*-src.zip' -type f | uniq)":"${SPARK_HOME}"/python`.
    
    I'm happy to see an easier way to do it, though I'm not very familiar with these scrpts. Thanks for doing it!



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23203#discussion_r238173745
  
    --- Diff: python/run-tests-with-coverage ---
    @@ -50,8 +50,6 @@ export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
     # This environment variable enables the coverage.
     export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
     
    -# If you'd like to run a specific unittest class, you could do such as
    -# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
     ./run-tests "$@"
    --- End diff --
    
    BTW, it works with coverage script as well. manually tested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    **[Test build #99599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99599/testReport)** for PR 23203 at commit [`44c622b`](https://github.com/apache/spark/commit/44c622bf17ab642ef372d9a534b5bfc18c98a0da).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    **[Test build #99697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99697/testReport)** for PR 23203 at commit [`bd23e01`](https://github.com/apache/spark/commit/bd23e01078deb90bcdba654ff82047603a462b2e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    **[Test build #99697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99697/testReport)** for PR 23203 at commit [`bd23e01`](https://github.com/apache/spark/commit/bd23e01078deb90bcdba654ff82047603a462b2e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5746/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Not look closely at the changes yet, but I think it should be very useful. Thanks @HyukjinKwon  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99697/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99599/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    cc @cloud-fan, @dongjoon-hyun, @icexelloss, @BryanCutler, @viirya (who I talked about this before).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23203


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    **[Test build #99599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99599/testReport)** for PR 23203 at commit [`44c622b`](https://github.com/apache/spark/commit/44c622bf17ab642ef372d9a534b5bfc18c98a0da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/23203
  
    Thank you @cloud-fan, @viirya, @srowen, and @BryanCutler.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org