You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/02/05 08:31:18 UTC

[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/20506

    [SPARK-23290][SQL][PYTHON] Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame.

    ## What changes were proposed in this pull request?
    
    In #18664, there was a change in how `DateType` is being returned to users ([line 1968 in dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)). This can cause client code which works in Spark 2.2 to fail.
    See [SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917) for an example.
    
    This pr modifies to use `datetime.date` for date type as Spark 2.2 does.
    
    ## How was this patch tested?
    
    Tests modified to fit the new behavior and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23290

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20506
    
----
commit 223d0a06a755d3ceb59664b37a87af82f61f2ae4
Author: Takuya UESHIN <ue...@...>
Date:   2018-02-05T06:52:43Z

    Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame.

commit 57ab41b90dbdace4dc5ce71421c42cfff27d061c
Author: Takuya UESHIN <ue...@...>
Date:   2018-02-05T07:49:36Z

    Modify a test for date type.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87071/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166191974
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _correct_date_of_dataframe_from_arrow(pdf, schema):
    --- End diff --
    
    Sure. I'll update it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87062/testReport)** for PR 20506 at commit [`57ab41b`](https://github.com/apache/spark/commit/57ab41b90dbdace4dc5ce71421c42cfff27d061c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r165980562
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _correct_date_of_dataframe_from_arrow(pdf, schema):
    +    """ Correct date type value to use datetime.date.
    +
    +    Pandas DataFrame created from PyArrow uses datetime64[ns] for date type values, but we should
    +    use datetime.date to keep backward compatibility.
    --- End diff --
    
    Shall we say like to match it with when Arrow optimization is disabled?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    I originally thought similarly but after another look into this again, it seems it would rather be better to keep it consistent with what Pandas does for now. FYI, seems `datetime.date` -> `object` in Pandas:
    
    ```
    >>> pd.Series([datetime.date(2012,1,1)])
    0    2012-01-01
    dtype: object
    ```
    
    and looks it needs a explicit conversion:
    
    ```
    >>> pd.Series([pd.Timestamp(datetime.date(2012,1,1))])
    0   2012-01-01
    dtype: datetime64[ns]
    ```
    
    Given `datetime.datetime` and `datetime.date` are not directly comparable, seems making sense to have a different type at least for now. I think we can even go with it into the master and then research the past discussion within Pandas after 2.3.0.
    
    I have been reading related discussions from yesterday with Pandas dev and seems we should go with `object`. For example see `https://github.com/pandas-dev/pandas/issues/6932#issuecomment-41084598` and `https://github.com/pandas-dev/pandas/issues/4338` (I left links with code blocks to avoid messing up links to other repos).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    a late +1 for me since it seems like Pandas needs an explicit conversion to get to datetime64 and doesn't directly support `datetime.date`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    LGTM, merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87071/testReport)** for PR 20506 at commit [`ebdbd8c`](https://github.com/apache/spark/commit/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/613/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87062/testReport)** for PR 20506 at commit [`57ab41b`](https://github.com/apache/spark/commit/57ab41b90dbdace4dc5ce71421c42cfff27d061c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166067222
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt):
             return np.int32
         elif type(dt) == FloatType:
             return np.float32
    -    elif type(dt) == DateType:
    -        return 'datetime64[ns]'
    --- End diff --
    
    I thought we were considering the interpretation of DateType as object as a bug, similar to how FloatType was being interpreted as float64?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/585/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20506


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/594/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87062/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166179612
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt):
             return np.int32
         elif type(dt) == FloatType:
             return np.float32
    -    elif type(dt) == DateType:
    -        return 'datetime64[ns]'
    --- End diff --
    
    +1, I feel it was a bug. Maybe we can merge this to branch-2.3 only and update the migration guide in the master branch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    **[Test build #87071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87071/testReport)** for PR 20506 at commit [`ebdbd8c`](https://github.com/apache/spark/commit/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    cc @BryanCutler @icexelloss @HyukjinKwon @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166189014
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _correct_date_of_dataframe_from_arrow(pdf, schema):
    --- End diff --
    
    to be consistent with other methods in this file, how about `_check_dataframe_convert_date`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    @ueshin  can you send a new PR for 2.3? it conflicts, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87092/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166192233
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
                 with self.assertRaisesRegexp(Exception, 'Unsupported data type'):
                     df.select(f(col('map'))).collect()
     
    -    def test_vectorized_udf_null_date(self):
    +    def test_vectorized_udf_dates(self):
    --- End diff --
    
    Maybe `ArrowTests.test_toPandas_arrow_toggle`:
    
    https://github.com/apache/spark/blob/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c/python/pyspark/sql/tests.py#L3461-L3464
    
    ?
    
    In addition, I'll modify it to check between its expected Pandas DataFrame.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    @HyukjinKwon SGTM!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r166189478
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
                 with self.assertRaisesRegexp(Exception, 'Unsupported data type'):
                     df.select(f(col('map'))).collect()
     
    -    def test_vectorized_udf_null_date(self):
    +    def test_vectorized_udf_dates(self):
    --- End diff --
    
    shall we have a new test to directly verify the `toPandas` works?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20506
  
    Thanks! @HyukjinKwon @BryanCutler @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20506#discussion_r165987965
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _correct_date_of_dataframe_from_arrow(pdf, schema):
    +    """ Correct date type value to use datetime.date.
    +
    +    Pandas DataFrame created from PyArrow uses datetime64[ns] for date type values, but we should
    +    use datetime.date to keep backward compatibility.
    --- End diff --
    
    Maybe we don't need to say about backward compatibility here. I'll update it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org