You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/02/05 08:31:18 UTC
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/20506
[SPARK-23290][SQL][PYTHON] Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame.
## What changes were proposed in this pull request?
In #18664, there was a change in how `DateType` is being returned to users ([line 1968 in dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)). This can cause client code which works in Spark 2.2 to fail.
See [SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917) for an example.
This pr modifies to use `datetime.date` for date type as Spark 2.2 does.
## How was this patch tested?
Tests modified to fit the new behavior and existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23290
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20506.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20506
----
commit 223d0a06a755d3ceb59664b37a87af82f61f2ae4
Author: Takuya UESHIN <ue...@...>
Date: 2018-02-05T06:52:43Z
Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame.
commit 57ab41b90dbdace4dc5ce71421c42cfff27d061c
Author: Takuya UESHIN <ue...@...>
Date: 2018-02-05T07:49:36Z
Modify a test for date type.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87071/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166191974
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
for field in arrow_schema])
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
--- End diff --
Sure. I'll update it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87062/testReport)** for PR 20506 at commit [`57ab41b`](https://github.com/apache/spark/commit/57ab41b90dbdace4dc5ce71421c42cfff27d061c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r165980562
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
for field in arrow_schema])
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
+ """ Correct date type value to use datetime.date.
+
+ Pandas DataFrame created from PyArrow uses datetime64[ns] for date type values, but we should
+ use datetime.date to keep backward compatibility.
--- End diff --
Shall we say like to match it with when Arrow optimization is disabled?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20506
I originally thought similarly but after another look into this again, it seems it would rather be better to keep it consistent with what Pandas does for now. FYI, seems `datetime.date` -> `object` in Pandas:
```
>>> pd.Series([datetime.date(2012,1,1)])
0 2012-01-01
dtype: object
```
and looks it needs a explicit conversion:
```
>>> pd.Series([pd.Timestamp(datetime.date(2012,1,1))])
0 2012-01-01
dtype: datetime64[ns]
```
Given `datetime.datetime` and `datetime.date` are not directly comparable, seems making sense to have a different type at least for now. I think we can even go with it into the master and then research the past discussion within Pandas after 2.3.0.
I have been reading related discussions from yesterday with Pandas dev and seems we should go with `object`. For example see `https://github.com/pandas-dev/pandas/issues/6932#issuecomment-41084598` and `https://github.com/pandas-dev/pandas/issues/4338` (I left links with code blocks to avoid messing up links to other repos).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20506
a late +1 for me since it seems like Pandas needs an explicit conversion to get to datetime64 and doesn't directly support `datetime.date`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20506
LGTM, merging to master/2.3!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87071/testReport)** for PR 20506 at commit [`ebdbd8c`](https://github.com/apache/spark/commit/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/613/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87062/testReport)** for PR 20506 at commit [`57ab41b`](https://github.com/apache/spark/commit/57ab41b90dbdace4dc5ce71421c42cfff27d061c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166067222
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt):
return np.int32
elif type(dt) == FloatType:
return np.float32
- elif type(dt) == DateType:
- return 'datetime64[ns]'
--- End diff --
I thought we were considering the interpretation of DateType as object as a bug, similar to how FloatType was being interpreted as float64?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/585/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20506
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/594/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87062/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166179612
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt):
return np.int32
elif type(dt) == FloatType:
return np.float32
- elif type(dt) == DateType:
- return 'datetime64[ns]'
--- End diff --
+1, I feel it was a bug. Maybe we can merge this to branch-2.3 only and update the migration guide in the master branch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20506
**[Test build #87071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87071/testReport)** for PR 20506 at commit [`ebdbd8c`](https://github.com/apache/spark/commit/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20506
cc @BryanCutler @icexelloss @HyukjinKwon @cloud-fan @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166189014
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
for field in arrow_schema])
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
--- End diff --
to be consistent with other methods in this file, how about `_check_dataframe_convert_date`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20506
@ueshin can you send a new PR for 2.3? it conflicts, thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20506
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87092/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166192233
--- Diff: python/pyspark/sql/tests.py ---
@@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
with self.assertRaisesRegexp(Exception, 'Unsupported data type'):
df.select(f(col('map'))).collect()
- def test_vectorized_udf_null_date(self):
+ def test_vectorized_udf_dates(self):
--- End diff --
Maybe `ArrowTests.test_toPandas_arrow_toggle`:
https://github.com/apache/spark/blob/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c/python/pyspark/sql/tests.py#L3461-L3464
?
In addition, I'll modify it to check between its expected Pandas DataFrame.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20506
@HyukjinKwon SGTM!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r166189478
--- Diff: python/pyspark/sql/tests.py ---
@@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
with self.assertRaisesRegexp(Exception, 'Unsupported data type'):
df.select(f(col('map'))).collect()
- def test_vectorized_udf_null_date(self):
+ def test_vectorized_udf_dates(self):
--- End diff --
shall we have a new test to directly verify the `toPandas` works?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20506
Thanks! @HyukjinKwon @BryanCutler @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20506#discussion_r165987965
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
for field in arrow_schema])
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
+ """ Correct date type value to use datetime.date.
+
+ Pandas DataFrame created from PyArrow uses datetime64[ns] for date type values, but we should
+ use datetime.date to keep backward compatibility.
--- End diff --
Maybe we don't need to say about backward compatibility here. I'll update it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org