You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/02/09 13:51:58 UTC
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/20559
[WIP][SPARK-23360][SQL][PYTHON] Get local timezone from environment vi pytz, or dateutil.
## What changes were proposed in this pull request?
Currently we use `tzlocal()` to get Python local timezone, but it sometimes causes unexpected behavior.
I changed the way to get Python local timezone to use pytz if the timezone is specified in environment variable, or timezone file via dateutil .
## How was this patch tested?
Modified some tests and existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23360/master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20559.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20559
----
commit e87bd7639e546e24be1bf7a781ccd1571ad71964
Author: Takuya UESHIN <ue...@...>
Date: 2018-02-09T13:43:44Z
Get local timezone from environment vi pytz, or dateutil.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167318107
--- Diff: python/pyspark/sql/tests.py ---
@@ -4124,7 +4126,7 @@ def test_vectorized_udf_timestamps(self):
data = [(0, datetime(1969, 1, 1, 1, 1, 1)),
(1, datetime(2012, 2, 2, 2, 2, 2)),
(2, None),
- (3, datetime(2100, 3, 3, 3, 3, 3))]
+ (3, datetime(2100, 4, 4, 4, 4, 4))]
--- End diff --
Just wondering if changing these values made a difference somewhere?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87278/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20559
@BryanCutler Seems like pandas handles `tzlocal()` differently than other timezone, and it might handle DST incorrectly, I guess.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167246316
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
+ from pyspark.sql.utils import require_minimum_pandas_version
+ require_minimum_pandas_version()
+
+ import os
+ return os.environ.get('TZ', 'dateutil/:')
--- End diff --
I don't really understand how does "dateutil/:" work, can you maybe add some comments for that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20559
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87278/testReport)** for PR 20559 at commit [`e20e9fd`](https://github.com/apache/spark/commit/e20e9fdba7fd2a4db059fe8016f0a0a60f3dd71d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/773/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20559
Merged to master and branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167394718
--- Diff: python/pyspark/sql/types.py ---
@@ -1766,15 +1781,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):
import pandas as pd
from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
- from_tz = from_timezone or 'tzlocal()'
- to_tz = to_timezone or 'tzlocal()'
+ from_tz = from_timezone or _get_local_timezone()
+ to_tz = to_timezone or _get_local_timezone()
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64tz_dtype(s.dtype):
return s.dt.tz_convert(to_tz).dt.tz_localize(None)
elif is_datetime64_dtype(s.dtype) and from_tz != to_tz:
- # `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT.
- return s.apply(lambda ts: ts.tz_localize(from_tz).tz_convert(to_tz).tz_localize(None)
- if ts is not pd.NaT else pd.NaT)
+ return s.dt.tz_localize(from_tz).dt.tz_convert(to_tz).dt.tz_localize(None)
--- End diff --
Good catch! I'll revert this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87286/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87278/testReport)** for PR 20559 at commit [`e20e9fd`](https://github.com/apache/spark/commit/e20e9fdba7fd2a4db059fe8016f0a0a60f3dd71d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167391409
--- Diff: python/pyspark/sql/types.py ---
@@ -1766,15 +1781,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):
import pandas as pd
from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
- from_tz = from_timezone or 'tzlocal()'
- to_tz = to_timezone or 'tzlocal()'
+ from_tz = from_timezone or _get_local_timezone()
+ to_tz = to_timezone or _get_local_timezone()
# TODO: handle nested timestamps, such as ArrayType(TimestampType())?
if is_datetime64tz_dtype(s.dtype):
return s.dt.tz_convert(to_tz).dt.tz_localize(None)
elif is_datetime64_dtype(s.dtype) and from_tz != to_tz:
- # `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT.
- return s.apply(lambda ts: ts.tz_localize(from_tz).tz_convert(to_tz).tz_localize(None)
- if ts is not pd.NaT else pd.NaT)
+ return s.dt.tz_localize(from_tz).dt.tz_convert(to_tz).dt.tz_localize(None)
--- End diff --
@ueshin, is it safe to remove `if ts is not pd.NaT else pd.NaT`? Seems there is a small possibility for `tzlocal()`:
https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/tslib.pyx#L1760
https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/tslib.pyx#L54
https://github.com/dateutil/dateutil/blob/2.6.1/dateutil/tz/tz.py#L1362
https://github.com/dateutil/dateutil/blob/2.6.1/dateutil/tz/tz.py#L1408
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20559
Thanks @ueshin for the quick patch!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20559
cc @icexelloss @BryanCutler @felixcheung @HyukjinKwon @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87260/testReport)** for PR 20559 at commit [`e87bd76`](https://github.com/apache/spark/commit/e87bd7639e546e24be1bf7a781ccd1571ad71964).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/772/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167390815
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
--- End diff --
I modified and added the comment. Can you understand by the comment?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167322999
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
--- End diff --
did you mean "from environment via pytz"? the 'TZ' environment var is read by the `datetime` module, does pytz do anything with this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/751/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167394720
--- Diff: python/pyspark/sql/tests.py ---
@@ -2867,6 +2867,35 @@ def test_create_dataframe_required_pandas_not_found(self):
"d": [pd.Timestamp.now().date()]})
self.spark.createDataFrame(pdf)
+ # Regression test for SPARK-23360
+ @unittest.skipIf(not _have_pandas, _pandas_requirement_message)
+ def test_create_dateframe_from_pandas_with_dst(self):
+ import pandas as pd
+ from datetime import datetime
+
+ pdf = pd.DataFrame({'time': [datetime(2015, 10, 31, 22, 30)]})
+
+ df = self.spark.createDataFrame(pdf)
+ self.assertPandasEqual(pdf, df.toPandas())
+
+ orig_env_tz = os.environ.get('TZ', None)
+ orig_session_tz = self.spark.conf.get('spark.sql.session.timeZone')
+ try:
+ tz = 'America/Los_Angeles'
+ os.environ['TZ'] = tz
+ time.tzset()
+ self.spark.conf.set('spark.sql.session.timeZone', tz)
+
+ df = self.spark.createDataFrame(pdf)
+ df.show()
--- End diff --
Oops, I should've removed it. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87287/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167389710
--- Diff: python/pyspark/sql/tests.py ---
@@ -4124,7 +4126,7 @@ def test_vectorized_udf_timestamps(self):
data = [(0, datetime(1969, 1, 1, 1, 1, 1)),
(1, datetime(2012, 2, 2, 2, 2, 2)),
(2, None),
- (3, datetime(2100, 3, 3, 3, 3, 3))]
+ (3, datetime(2100, 4, 4, 4, 4, 4))]
--- End diff --
I'll revert it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/764/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167391496
--- Diff: python/pyspark/sql/tests.py ---
@@ -2867,6 +2867,35 @@ def test_create_dataframe_required_pandas_not_found(self):
"d": [pd.Timestamp.now().date()]})
self.spark.createDataFrame(pdf)
+ # Regression test for SPARK-23360
+ @unittest.skipIf(not _have_pandas, _pandas_requirement_message)
+ def test_create_dateframe_from_pandas_with_dst(self):
+ import pandas as pd
+ from datetime import datetime
+
+ pdf = pd.DataFrame({'time': [datetime(2015, 10, 31, 22, 30)]})
+
+ df = self.spark.createDataFrame(pdf)
+ self.assertPandasEqual(pdf, df.toPandas())
+
+ orig_env_tz = os.environ.get('TZ', None)
+ orig_session_tz = self.spark.conf.get('spark.sql.session.timeZone')
+ try:
+ tz = 'America/Los_Angeles'
+ os.environ['TZ'] = tz
+ time.tzset()
+ self.spark.conf.set('spark.sql.session.timeZone', tz)
+
+ df = self.spark.createDataFrame(pdf)
+ df.show()
--- End diff --
gentle reminder for it. Seems it's now there for debugging purpose I guess? :).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167389709
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
+ from pyspark.sql.utils import require_minimum_pandas_version
+ require_minimum_pandas_version()
+
+ import os
+ return os.environ.get('TZ', 'dateutil/:')
--- End diff --
Sure, I'll add some comments.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167389708
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
+ from pyspark.sql.utils import require_minimum_pandas_version
+ require_minimum_pandas_version()
--- End diff --
Actually, it isn't needed here. I'll remove it. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20559
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87260/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20559#discussion_r167246084
--- Diff: python/pyspark/sql/types.py ---
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
return pdf
+def _get_local_timezone():
+ """ Get local timezone from environment vi pytz, or dateutil. """
+ from pyspark.sql.utils import require_minimum_pandas_version
+ require_minimum_pandas_version()
--- End diff --
Why do we need this here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20559
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20559
**[Test build #87260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87260/testReport)** for PR 20559 at commit [`e87bd76`](https://github.com/apache/spark/commit/e87bd7639e546e24be1bf7a781ccd1571ad71964).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org