You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/02/07 22:49:03 UTC

[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

GitHub user icexelloss opened a pull request:

    https://github.com/apache/spark/pull/20537

    [SPARK-23314][PYTHON] Add ambiguous=False when localizing tz-naive timestamps to deal with dst

    ## What changes were proposed in this pull request?
    When tz_localize a tz-naive timetamp, pandas will throw exception if the timestamp is during daylight saving time period, e.g., 2015-11-01 01:30:00. This PR fixes this issue by setting `ambiguous=False` when calling tz_localize, which is the same default behavior of pytz.
    
    ## How was this patch tested?
    Add `test_timestamp_dst`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/icexelloss/spark SPARK-23314

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20537
    
----
commit 6435feffdc056a8744848e367a585d32e8734b5f
Author: Li Jin <ic...@...>
Date:   2018-02-07T22:38:19Z

    Add ambiguous=False when localizing tz-naive timestamps to deal with dst

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit [`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166974650
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        >>> import datetime
    +        >>> import pandas as pd
    +        >>> import pytz
    +        >>>
    +        >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        >>> ts = pd.Series([t])
    +        >>> tz = pytz.timezone('America/New_York')
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        >>> 0   2015-11-01 01:23:24-05:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        >>> 0   2015-11-01 01:23:24-04:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> str(tz.localize(t))
    +        >>> '2015-11-01 01:23:24-05:00'
    --- End diff --
    
    Yeah Let me clean up the format...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)** for PR 20537 at commit [`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    @HyukjinKwon no worries. Rebased.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    cc @felixcheung @BryanCutler @ueshin @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/721/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/776/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87178/testReport)** for PR 20537 at commit [`6435fef`](https://github.com/apache/spark/commit/6435feffdc056a8744848e367a585d32e8734b5f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)** for PR 20537 at commit [`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166810415
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    --- End diff --
    
    I'm not sure we want this doctest


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Sorry, @icexelloss. Mind resolving the conflict?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166811270
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        >>> import datetime
    +        >>> import pandas as pd
    +        >>> import pytz
    +        >>>
    +        >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        >>> ts = pd.Series([t])
    +        >>> tz = pytz.timezone('America/New_York')
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        >>> 0   2015-11-01 01:23:24-05:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        >>> 0   2015-11-01 01:23:24-04:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> str(tz.localize(t))
    +        >>> '2015-11-01 01:23:24-05:00'
    +        """
    +        return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
    --- End diff --
    
    I think for a `pd.Series` `ambiguous` takes an ndarray.  Can also add a `pandas_udf` test case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/719/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/725/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87290/testReport)** for PR 20537 at commit [`23abfb0`](https://github.com/apache/spark/commit/23abfb0e01f98dc4bfbd3fb9f04e487ec9af052c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87264/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/679/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r167133597
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1744,8 +1744,27 @@ def _check_series_convert_timestamps_internal(s, timezone):
         from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
    +        # tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        # >>> import datetime
    +        # >>> import pandas as pd
    +        # >>> import pytz
    +        # >>>
    +        # >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        # >>> ts = pd.Series([t])
    +        # >>> tz = pytz.timezone('America/New_York')
    +        # >>>
    +        # >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        # 0   2015-11-01 01:23:24-05:00
    +        # dtype: datetime64[ns, America/New_York]
    +        # >>>
    +        # >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        # 0   2015-11-01 01:23:24-04:00
    +        # dtype: datetime64[ns, America/New_York]
    +        # >>>
    +        # >>> str(tz.localize(t))
    +        # '2015-11-01 01:23:24-05:00'
    --- End diff --
    
    @icexelloss, I got that it's good to know but shall we describe it as a prose? This comment looks a format of a doctest but they are actually just in comments.
    
    It would be nicer if we just have a explanation in the comments, not as a doctest format.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87223/testReport)** for PR 20537 at commit [`2c1a258`](https://github.com/apache/spark/commit/2c1a2582c04a5b9cb7d011892343ca0a07ddb854).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166826644
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        >>> import datetime
    +        >>> import pandas as pd
    +        >>> import pytz
    +        >>>
    +        >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        >>> ts = pd.Series([t])
    +        >>> tz = pytz.timezone('America/New_York')
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        >>> 0   2015-11-01 01:23:24-05:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        >>> 0   2015-11-01 01:23:24-04:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> str(tz.localize(t))
    +        >>> '2015-11-01 01:23:24-05:00'
    --- End diff --
    
    Hm .. this one seems a bit weird. Shouldn't it be `... '2015-11-01 01:23:24-05:00'`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87178/testReport)** for PR 20537 at commit [`6435fef`](https://github.com/apache/spark/commit/6435feffdc056a8744848e367a585d32e8734b5f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20537


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)** for PR 20537 at commit [`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87222/testReport)** for PR 20537 at commit [`f6b5d28`](https://github.com/apache/spark/commit/f6b5d2868c3ca7c8c2cc2bfb6e7a06ce7c01998c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87221/testReport)** for PR 20537 at commit [`304666a`](https://github.com/apache/spark/commit/304666ad089d497d666de25476955da52aae5395).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87227/testReport)** for PR 20537 at commit [`94ec45e`](https://github.com/apache/spark/commit/94ec45e735aad92e019dec302811b8a5bfeb0644).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Great thanks
    The fix is actually just two lines. LGTM
    
    @hyukjinkwon could you help merge this ASAP to 2.3?
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166826468
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3638,6 +3638,21 @@ def test_createDataFrame_with_int_col_names(self):
             self.assertEqual(pdf_col_names, df.columns)
             self.assertEqual(pdf_col_names, df_arrow.columns)
     
    +    def test_timestamp_dst(self):
    +        """
    +        SPARK-23314: Test daylight saving time
    +        """
    --- End diff --
    
    Shall we just leave this as a comment (just to follow the majority)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166974367
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    --- End diff --
    
    Oh definitely not doctest..Let me change to comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87223/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87227/testReport)** for PR 20537 at commit [`94ec45e`](https://github.com/apache/spark/commit/94ec45e735aad92e019dec302811b8a5bfeb0644).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87264/testReport)** for PR 20537 at commit [`0357a2b`](https://github.com/apache/spark/commit/0357a2b14d2590e44a7cb1ce5327448f191cc801).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87227/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87178/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    LGTM too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    I think this PR is ready. I encountered another issue with the non-Arrow path and filed SPARK-23360. However, that seems to be a different bug than the one here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r167266079
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1744,8 +1744,27 @@ def _check_series_convert_timestamps_internal(s, timezone):
         from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
    +        # tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        # >>> import datetime
    +        # >>> import pandas as pd
    +        # >>> import pytz
    +        # >>>
    +        # >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        # >>> ts = pd.Series([t])
    +        # >>> tz = pytz.timezone('America/New_York')
    +        # >>>
    +        # >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        # 0   2015-11-01 01:23:24-05:00
    +        # dtype: datetime64[ns, America/New_York]
    +        # >>>
    +        # >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        # 0   2015-11-01 01:23:24-04:00
    +        # dtype: datetime64[ns, America/New_York]
    +        # >>>
    +        # >>> str(tz.localize(t))
    +        # '2015-11-01 01:23:24-05:00'
    --- End diff --
    
    I add comment to explain this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    **[Test build #87264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87264/testReport)** for PR 20537 at commit [`0357a2b`](https://github.com/apache/spark/commit/0357a2b14d2590e44a7cb1ce5327448f191cc801).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87222/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/754/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Thanks everyone for review!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    This is pretty important bug fix that we should try to get in Spark 2.3...Thanks @felixcheung for reporting this!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87221/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/720/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20537: [SPARK-23314][PYTHON] Add ambiguous=False when localizin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20537
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87290/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20537: [SPARK-23314][PYTHON] Add ambiguous=False when lo...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20537#discussion_r166975718
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1730,7 +1730,28 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64_dtype(s.dtype):
             tz = timezone or 'tzlocal()'
    -        return s.dt.tz_localize(tz).dt.tz_convert('UTC')
    +        """
    +        tz_localize with ambiguous=False has the same behavior of pytz.localize
    +        >>> import datetime
    +        >>> import pandas as pd
    +        >>> import pytz
    +        >>>
    +        >>> t = datetime.datetime(2015, 11, 1, 1, 23, 24)
    +        >>> ts = pd.Series([t])
    +        >>> tz = pytz.timezone('America/New_York')
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=False)
    +        >>> 0   2015-11-01 01:23:24-05:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> ts.dt.tz_localize(tz, ambiguous=True)
    +        >>> 0   2015-11-01 01:23:24-04:00
    +        >>> dtype: datetime64[ns, America/New_York]
    +        >>>
    +        >>> str(tz.localize(t))
    +        >>> '2015-11-01 01:23:24-05:00'
    +        """
    +        return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
    --- End diff --
    
    Yes will create a new for `pandas_udf`.
    
    Seems `ambiguous=False` is undocumented in the method doc, @jreback can you please confirm this usage is correct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org