You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/02/09 13:51:58 UTC

[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/20559

    [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from environment vi pytz, or dateutil.

    ## What changes were proposed in this pull request?
    
    Currently we use `tzlocal()` to get Python local timezone, but it sometimes causes unexpected behavior.
    I changed the way to get Python local timezone to use pytz if the timezone is specified in environment variable, or timezone file via dateutil .
    
    ## How was this patch tested?
    
    Modified some tests and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23360/master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20559
    
----
commit e87bd7639e546e24be1bf7a781ccd1571ad71964
Author: Takuya UESHIN <ue...@...>
Date:   2018-02-09T13:43:44Z

    Get local timezone from environment vi pytz, or dateutil.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167318107
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4124,7 +4126,7 @@ def test_vectorized_udf_timestamps(self):
             data = [(0, datetime(1969, 1, 1, 1, 1, 1)),
                     (1, datetime(2012, 2, 2, 2, 2, 2)),
                     (2, None),
    -                (3, datetime(2100, 3, 3, 3, 3, 3))]
    +                (3, datetime(2100, 4, 4, 4, 4, 4))]
    --- End diff --
    
    Just wondering if changing these values made a difference somewhere?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87278/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    @BryanCutler Seems like pandas handles `tzlocal()` differently than other timezone, and it might handle DST incorrectly, I guess.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167246316
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    +    from pyspark.sql.utils import require_minimum_pandas_version
    +    require_minimum_pandas_version()
    +
    +    import os
    +    return os.environ.get('TZ', 'dateutil/:')
    --- End diff --
    
    I don't really understand how does "dateutil/:" work, can you maybe add some comments for that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87278/testReport)** for PR 20559 at commit [`e20e9fd`](https://github.com/apache/spark/commit/e20e9fdba7fd2a4db059fe8016f0a0a60f3dd71d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/773/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167394718
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1766,15 +1781,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):
     
         import pandas as pd
         from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
    -    from_tz = from_timezone or 'tzlocal()'
    -    to_tz = to_timezone or 'tzlocal()'
    +    from_tz = from_timezone or _get_local_timezone()
    +    to_tz = to_timezone or _get_local_timezone()
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64tz_dtype(s.dtype):
             return s.dt.tz_convert(to_tz).dt.tz_localize(None)
         elif is_datetime64_dtype(s.dtype) and from_tz != to_tz:
    -        # `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT.
    -        return s.apply(lambda ts: ts.tz_localize(from_tz).tz_convert(to_tz).tz_localize(None)
    -                       if ts is not pd.NaT else pd.NaT)
    +        return s.dt.tz_localize(from_tz).dt.tz_convert(to_tz).dt.tz_localize(None)
    --- End diff --
    
    Good catch! I'll revert this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87286/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87278/testReport)** for PR 20559 at commit [`e20e9fd`](https://github.com/apache/spark/commit/e20e9fdba7fd2a4db059fe8016f0a0a60f3dd71d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167391409
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1766,15 +1781,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):
     
         import pandas as pd
         from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
    -    from_tz = from_timezone or 'tzlocal()'
    -    to_tz = to_timezone or 'tzlocal()'
    +    from_tz = from_timezone or _get_local_timezone()
    +    to_tz = to_timezone or _get_local_timezone()
         # TODO: handle nested timestamps, such as ArrayType(TimestampType())?
         if is_datetime64tz_dtype(s.dtype):
             return s.dt.tz_convert(to_tz).dt.tz_localize(None)
         elif is_datetime64_dtype(s.dtype) and from_tz != to_tz:
    -        # `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT.
    -        return s.apply(lambda ts: ts.tz_localize(from_tz).tz_convert(to_tz).tz_localize(None)
    -                       if ts is not pd.NaT else pd.NaT)
    +        return s.dt.tz_localize(from_tz).dt.tz_convert(to_tz).dt.tz_localize(None)
    --- End diff --
    
    @ueshin, is it safe to remove `if ts is not pd.NaT else pd.NaT`? Seems there is a small possibility for `tzlocal()`:
    
    https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/tslib.pyx#L1760
    https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/tslib.pyx#L54
    https://github.com/dateutil/dateutil/blob/2.6.1/dateutil/tz/tz.py#L1362
    https://github.com/dateutil/dateutil/blob/2.6.1/dateutil/tz/tz.py#L1408


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Thanks @ueshin for the quick patch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    cc @icexelloss @BryanCutler @felixcheung @HyukjinKwon @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87260/testReport)** for PR 20559 at commit [`e87bd76`](https://github.com/apache/spark/commit/e87bd7639e546e24be1bf7a781ccd1571ad71964).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/772/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167390815
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    --- End diff --
    
    I modified and added the comment. Can you understand by the comment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167322999
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    --- End diff --
    
    did you mean "from environment via pytz"?  the 'TZ' environment var is read by the `datetime` module, does pytz do anything with this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/751/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167394720
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2867,6 +2867,35 @@ def test_create_dataframe_required_pandas_not_found(self):
                                         "d": [pd.Timestamp.now().date()]})
                     self.spark.createDataFrame(pdf)
     
    +    # Regression test for SPARK-23360
    +    @unittest.skipIf(not _have_pandas, _pandas_requirement_message)
    +    def test_create_dateframe_from_pandas_with_dst(self):
    +        import pandas as pd
    +        from datetime import datetime
    +
    +        pdf = pd.DataFrame({'time': [datetime(2015, 10, 31, 22, 30)]})
    +
    +        df = self.spark.createDataFrame(pdf)
    +        self.assertPandasEqual(pdf, df.toPandas())
    +
    +        orig_env_tz = os.environ.get('TZ', None)
    +        orig_session_tz = self.spark.conf.get('spark.sql.session.timeZone')
    +        try:
    +            tz = 'America/Los_Angeles'
    +            os.environ['TZ'] = tz
    +            time.tzset()
    +            self.spark.conf.set('spark.sql.session.timeZone', tz)
    +
    +            df = self.spark.createDataFrame(pdf)
    +            df.show()
    --- End diff --
    
    Oops, I should've removed it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87286/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87287/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167389710
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4124,7 +4126,7 @@ def test_vectorized_udf_timestamps(self):
             data = [(0, datetime(1969, 1, 1, 1, 1, 1)),
                     (1, datetime(2012, 2, 2, 2, 2, 2)),
                     (2, None),
    -                (3, datetime(2100, 3, 3, 3, 3, 3))]
    +                (3, datetime(2100, 4, 4, 4, 4, 4))]
    --- End diff --
    
    I'll revert it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/764/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167391496
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2867,6 +2867,35 @@ def test_create_dataframe_required_pandas_not_found(self):
                                         "d": [pd.Timestamp.now().date()]})
                     self.spark.createDataFrame(pdf)
     
    +    # Regression test for SPARK-23360
    +    @unittest.skipIf(not _have_pandas, _pandas_requirement_message)
    +    def test_create_dateframe_from_pandas_with_dst(self):
    +        import pandas as pd
    +        from datetime import datetime
    +
    +        pdf = pd.DataFrame({'time': [datetime(2015, 10, 31, 22, 30)]})
    +
    +        df = self.spark.createDataFrame(pdf)
    +        self.assertPandasEqual(pdf, df.toPandas())
    +
    +        orig_env_tz = os.environ.get('TZ', None)
    +        orig_session_tz = self.spark.conf.get('spark.sql.session.timeZone')
    +        try:
    +            tz = 'America/Los_Angeles'
    +            os.environ['TZ'] = tz
    +            time.tzset()
    +            self.spark.conf.set('spark.sql.session.timeZone', tz)
    +
    +            df = self.spark.createDataFrame(pdf)
    +            df.show()
    --- End diff --
    
    gentle reminder for it. Seems it's now there for debugging purpose I guess? :).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167389709
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    +    from pyspark.sql.utils import require_minimum_pandas_version
    +    require_minimum_pandas_version()
    +
    +    import os
    +    return os.environ.get('TZ', 'dateutil/:')
    --- End diff --
    
    Sure, I'll add some comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167389708
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    +    from pyspark.sql.utils import require_minimum_pandas_version
    +    require_minimum_pandas_version()
    --- End diff --
    
    Actually, it isn't needed here. I'll remove it. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [SPARK-23360][SQL][PYTHON] Get local timezone fro...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20559


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87260/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87287/testReport)** for PR 20559 at commit [`a082e8c`](https://github.com/apache/spark/commit/a082e8c66265906ed54dbc4594ab1d534ca5c4c4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezon...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20559#discussion_r167246084
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema):
         return pdf
     
     
    +def _get_local_timezone():
    +    """ Get local timezone from environment vi pytz, or dateutil. """
    +    from pyspark.sql.utils import require_minimum_pandas_version
    +    require_minimum_pandas_version()
    --- End diff --
    
    Why do we need this here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [SPARK-23360][SQL][PYTHON] Get local timezone from envir...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20559: [WIP][SPARK-23360][SQL][PYTHON] Get local timezone from ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20559
  
    **[Test build #87260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87260/testReport)** for PR 20559 at commit [`e87bd76`](https://github.com/apache/spark/commit/e87bd7639e546e24be1bf7a781ccd1571ad71964).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org