You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/02/05 10:18:50 UTC

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/20507

    [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return type StringType() to handle str type properly in Python 2.

    ## What changes were proposed in this pull request?
    
    In Python 2, when `pandas_udf` tries to return string type value created in the udf with `".."`, the execution fails. E.g.,
    
    ```python
    from pyspark.sql.functions import pandas_udf, col
    import pandas as pd
    
    df = spark.range(10)
    str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), "string")
    df.select(str_f(col('id'))).show()
    ```
    
    raises the following exception:
    
    ```
    ...
    
    java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: expected StringType, got BinaryType
    	at scala.Predef$.assert(Predef.scala:170)
    	at org.apache.spark.sql.execution.python.ArrowEvalPythonExec$$anon$2.<init>(ArrowEvalPythonExec.scala:93)
    
    ...
    ```
    
    Seems like pyarrow ignores `type` parameter for `pa.Array.from_pandas()` and consider it as binary type when the type is string type and the string values are `str` instead of `unicode` in Python 2.
    
    This pr adds a workaround for the case.
    
    ## How was this patch tested?
    
    Added a test and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23334

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20507
    
----
commit 47b88734b91a7f9a4335bc3c667640eb4600b8e1
Author: Takuya UESHIN <ue...@...>
Date:   2018-02-05T09:30:20Z

    Fix pandas_udf with return type StringType() to handle str type properly.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/604/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87069/testReport)** for PR 20507 at commit [`06ae568`](https://github.com/apache/spark/commit/06ae568df2088652754c2df66d2f78c8fbdac48d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r165972212
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
             res = df.select(str_f(col('str')))
             self.assertEquals(df.collect(), res.collect())
     
    +    def test_vectorized_udf_string_in_udf(self):
    +        from pyspark.sql.functions import pandas_udf, col
    +        import pandas as pd
    +        df = self.spark.range(10)
    +        str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), StringType())
    --- End diff --
    
    Not a big deal. How about `pd.Series(map(str, x))`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    > Seems like pyarrow ignores `type` parameter for `pa.Array.from_pandas()` and consider it as binary type when the type is string type and the string values are `str` instead of `unicode` in Python 2.
    
    @BryanCutler Btw, do you think this is a bug of pyarrow in Python 2?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87063/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r165968902
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
             res = df.select(str_f(col('str')))
             self.assertEquals(df.collect(), res.collect())
     
    +    def test_vectorized_udf_string_in_udf(self):
    +        from pyspark.sql.functions import pandas_udf, col
    +        import pandas as pd
    +        df = self.spark.range(10)
    +        str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), StringType())
    +        res = df.select(str_f(col('id')))
    --- End diff --
    
    How about variable names 'expected' and 'actual'?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87083/testReport)** for PR 20507 at commit [`b3d5209`](https://github.com/apache/spark/commit/b3d5209b26322329d7e4ba1fd1b1457f86b44a8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87083/testReport)** for PR 20507 at commit [`b3d5209`](https://github.com/apache/spark/commit/b3d5209b26322329d7e4ba1fd1b1457f86b44a8a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20507


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r166018470
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -230,6 +230,9 @@ def create_array(s, t):
                 s = _check_series_convert_timestamps_internal(s.fillna(0), timezone)
                 # TODO: need cast after Arrow conversion, ns values cause error with pandas 0.19.2
                 return pa.Array.from_pandas(s, mask=mask).cast(t, safe=False)
    +        elif t is not None and pa.types.is_string(t) and sys.version < '3':
    +            # TODO: need decode before converting to Arrow in Python 2
    +            return pa.Array.from_pandas(s.str.decode('utf-8'), mask=mask, type=t)
    --- End diff --
    
    @ueshin, actually, how about `s.apply(lambda v: v.decode("utf-8") if isinstance(v, str) else v)` to allow non-ascii unicodes too like `u"아"`? I was worried of performance but I ran a simple perf test vs `s.str.decode('utf-8')` for sure. Seems actually fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87083/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r165980594
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
             res = df.select(str_f(col('str')))
             self.assertEquals(df.collect(), res.collect())
     
    +    def test_vectorized_udf_string_in_udf(self):
    +        from pyspark.sql.functions import pandas_udf, col
    +        import pandas as pd
    +        df = self.spark.range(10)
    +        str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), StringType())
    +        res = df.select(str_f(col('id')))
    --- End diff --
    
    Sure, I'll update it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/592/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Sorry I've been travelling, but I'll try to look into this soon on the Arrow side to see if it is a bug in pyarrow.  The workaround here seems fine to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r166168248
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -230,6 +230,9 @@ def create_array(s, t):
                 s = _check_series_convert_timestamps_internal(s.fillna(0), timezone)
                 # TODO: need cast after Arrow conversion, ns values cause error with pandas 0.19.2
                 return pa.Array.from_pandas(s, mask=mask).cast(t, safe=False)
    +        elif t is not None and pa.types.is_string(t) and sys.version < '3':
    +            # TODO: need decode before converting to Arrow in Python 2
    +            return pa.Array.from_pandas(s.str.decode('utf-8'), mask=mask, type=t)
    --- End diff --
    
    Good catch! I'll take it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r165980572
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
             res = df.select(str_f(col('str')))
             self.assertEquals(df.collect(), res.collect())
     
    +    def test_vectorized_udf_string_in_udf(self):
    +        from pyspark.sql.functions import pandas_udf, col
    +        import pandas as pd
    +        df = self.spark.range(10)
    +        str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), StringType())
    --- End diff --
    
    Sounds good. I'll take it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87069/testReport)** for PR 20507 at commit [`06ae568`](https://github.com/apache/spark/commit/06ae568df2088652754c2df66d2f78c8fbdac48d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87063/testReport)** for PR 20507 at commit [`47b8873`](https://github.com/apache/spark/commit/47b88734b91a7f9a4335bc3c667640eb4600b8e1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    also cc @cloud-fan @gatorsmile @sameeragarwal 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    **[Test build #87063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87063/testReport)** for PR 20507 at commit [`47b8873`](https://github.com/apache/spark/commit/47b88734b91a7f9a4335bc3c667640eb4600b8e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87069/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Thanks! @HyukjinKwon @BryanCutler


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    I made https://issues.apache.org/jira/browse/ARROW-2101 to track the issue in Arrow


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/586/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20507
  
    cc @BryanCutler @icexelloss @HyukjinKwon 
    Could you help me double-check this?
    Since seems like this happens only in Python 2 environment, Jenkins will skip the tests.
    And let me know if you know better workaround.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org