You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/02/07 22:49:03 UTC
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/20537
[SPARK-23314][PYTHON] Add ambiguous=False when localizing tz-naive timestamps to deal with dst
## What changes were proposed in this pull request?
When tz_localize a tz-naive timetamp, pandas will throw exception if the timestamp is during daylight saving time period, e.g., 2015-11-01 01:30:00. This PR fixes this issue by setting `ambiguous=False` when calling tz_localize, which is the same default behavior of pytz.
## How was this patch tested?
Add `test_timestamp_dst`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark SPARK-23314
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20537.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20537
----
commit 6435feffdc056a8744848e367a585d32e8734b5f
Author: Li Jin <ic...@...>
Date: 2018-02-07T22:38:19Z
Add ambiguous=False when localizing tz-naive timestamps to deal with dst
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit [`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166974650
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
+ >>> import datetime
+ >>> import pandas as pd
+ >>> import pytz
+ >>>
+ >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ >>> ts = pd.Series([t])
+ >>> tz = pytz.timezone('America/New_York')
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=False)
+ >>> 0 2015-11-01 01:23:24-05:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=True)
+ >>> 0 2015-11-01 01:23:24-04:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> str(tz.localize(t))
+ >>> '2015-11-01 01:23:24-05:00'
--- End diff --
Yeah Let me clean up the format...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)** for PR 20537 at commit [`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20537
@HyukjinKwon no worries. Rebased.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/20537
thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20537
cc @felixcheung @BryanCutler @ueshin @HyukjinKwon
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/721/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/776/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87178/testReport)** for PR 20537 at commit [`6435fef`](https://github.com/apache/spark/commit/6435feffdc056a8744848e367a585d32e8734b5f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)** for PR 20537 at commit [`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166810415
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
--- End diff --
I'm not sure we want this doctest
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20537
Sorry, @icexelloss. Mind resolving the conflict?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166811270
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
+ >>> import datetime
+ >>> import pandas as pd
+ >>> import pytz
+ >>>
+ >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ >>> ts = pd.Series([t])
+ >>> tz = pytz.timezone('America/New_York')
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=False)
+ >>> 0 2015-11-01 01:23:24-05:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=True)
+ >>> 0 2015-11-01 01:23:24-04:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> str(tz.localize(t))
+ >>> '2015-11-01 01:23:24-05:00'
+ """
+ return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
--- End diff --
I think for a `pd.Series` `ambiguous` takes an ndarray. Can also add a `pandas_udf` test case?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/719/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/725/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit [`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87264/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/679/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r167133597
--- Diff: python/pyspark/sql/types.py ---
@@ -1744,8 +1744,27 @@ def _check_series_convert_timestamps_internal(s, timezone):
from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
+ # tz_localize with ambiguous=False has the same behavior of pytz.localize
+ # >>> import datetime
+ # >>> import pandas as pd
+ # >>> import pytz
+ # >>>
+ # >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ # >>> ts = pd.Series([t])
+ # >>> tz = pytz.timezone('America/New_York')
+ # >>>
+ # >>> ts.dt.tz_localize(tz, ambiguous=False)
+ # 0 2015-11-01 01:23:24-05:00
+ # dtype: datetime64[ns, America/New_York]
+ # >>>
+ # >>> ts.dt.tz_localize(tz, ambiguous=True)
+ # 0 2015-11-01 01:23:24-04:00
+ # dtype: datetime64[ns, America/New_York]
+ # >>>
+ # >>> str(tz.localize(t))
+ # '2015-11-01 01:23:24-05:00'
--- End diff --
@icexelloss, I got that it's good to know but shall we describe it as a prose? This comment looks a format of a doctest but they are actually just in comments.
It would be nicer if we just have a explanation in the comments, not as a doctest format.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20537
Merged to master and branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)** for PR 20537 at commit [`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166826644
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
+ >>> import datetime
+ >>> import pandas as pd
+ >>> import pytz
+ >>>
+ >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ >>> ts = pd.Series([t])
+ >>> tz = pytz.timezone('America/New_York')
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=False)
+ >>> 0 2015-11-01 01:23:24-05:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=True)
+ >>> 0 2015-11-01 01:23:24-04:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> str(tz.localize(t))
+ >>> '2015-11-01 01:23:24-05:00'
--- End diff --
Hm .. this one seems a bit weird. Shouldn't it be `... '2015-11-01 01:23:24-05:00'`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87178/testReport)** for PR 20537 at commit [`6435fef`](https://github.com/apache/spark/commit/6435feffdc056a8744848e367a585d32e8734b5f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20537
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)** for PR 20537 at commit [`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)** for PR 20537 at commit [`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)** for PR 20537 at commit [`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87227/testReport)** for PR 20537 at commit [`94ec45e`](https://github.com/apache/spark/commit/94ec45e735aad92e019dec302811b8a5bfeb0644).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/20537
Great thanks
The fix is actually just two lines. LGTM
@hyukjinkwon could you help merge this ASAP to 2.3?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166826468
--- Diff: python/pyspark/sql/tests.py ---
@@ -3638,6 +3638,21 @@ def test_createDataFrame_with_int_col_names(self):
self.assertEqual(pdf_col_names, df.columns)
self.assertEqual(pdf_col_names, df_arrow.columns)
+ def test_timestamp_dst(self):
+ """
+ SPARK-23314: Test daylight saving time
+ """
--- End diff --
Shall we just leave this as a comment (just to follow the majority)?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166974367
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
--- End diff --
Oh definitely not doctest..Let me change to comments
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87223/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87227/testReport)** for PR 20537 at commit [`94ec45e`](https://github.com/apache/spark/commit/94ec45e735aad92e019dec302811b8a5bfeb0644).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87264/testReport)** for PR 20537 at commit [`0357a2b`](https://github.com/apache/spark/commit/0357a2b14d2590e44a7cb1ce5327448f191cc801).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87227/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87178/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20537
LGTM too
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20537
I think this PR is ready. I encountered another issue with the non-Arrow path and filed SPARK-23360. However, that seems to be a different bug than the one here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r167266079
--- Diff: python/pyspark/sql/types.py ---
@@ -1744,8 +1744,27 @@ def _check_series_convert_timestamps_internal(s, timezone):
from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
+ # tz_localize with ambiguous=False has the same behavior of pytz.localize
+ # >>> import datetime
+ # >>> import pandas as pd
+ # >>> import pytz
+ # >>>
+ # >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ # >>> ts = pd.Series([t])
+ # >>> tz = pytz.timezone('America/New_York')
+ # >>>
+ # >>> ts.dt.tz_localize(tz, ambiguous=False)
+ # 0 2015-11-01 01:23:24-05:00
+ # dtype: datetime64[ns, America/New_York]
+ # >>>
+ # >>> ts.dt.tz_localize(tz, ambiguous=True)
+ # 0 2015-11-01 01:23:24-04:00
+ # dtype: datetime64[ns, America/New_York]
+ # >>>
+ # >>> str(tz.localize(t))
+ # '2015-11-01 01:23:24-05:00'
--- End diff --
I add comment to explain this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20537
**[Test build #87264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87264/testReport)** for PR 20537 at commit [`0357a2b`](https://github.com/apache/spark/commit/0357a2b14d2590e44a7cb1ce5327448f191cc801).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87222/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/754/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20537
Thanks everyone for review!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20537
This is pretty important bug fix that we should try to get in Spark 2.3...Thanks @felixcheung for reporting this!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87221/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/720/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20537
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87290/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20537#discussion_r166975718
--- Diff: python/pyspark/sql/types.py ---
@@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64_dtype(s.dtype):
tz = timezone or 'tzlocal()'
- return s.dt.tz_localize(tz).dt.tz_convert('UTC')
+ """
+ tz_localize with ambiguous=False has the same behavior of pytz.localize
+ >>> import datetime
+ >>> import pandas as pd
+ >>> import pytz
+ >>>
+ >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
+ >>> ts = pd.Series([t])
+ >>> tz = pytz.timezone('America/New_York')
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=False)
+ >>> 0 2015-11-01 01:23:24-05:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> ts.dt.tz_localize(tz, ambiguous=True)
+ >>> 0 2015-11-01 01:23:24-04:00
+ >>> dtype: datetime64[ns, America/New_York]
+ >>>
+ >>> str(tz.localize(t))
+ >>> '2015-11-01 01:23:24-05:00'
+ """
+ return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
--- End diff --
Yes will create a new for `pandas_udf`.
Seems `ambiguous=False` is undocumented in the method doc, @jreback can you please confirm this usage is correct?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org