You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/10/21 00:25:09 UTC

[spark] branch branch-3.0 updated: [SPARK-33189][PYTHON][TESTS] Add env var to tests for legacy nested timestamps in pyarrow

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 5e33155  [SPARK-33189][PYTHON][TESTS] Add env var to tests for legacy nested timestamps in pyarrow
5e33155 is described below

commit 5e331553726f838f2f14788c135f3497319b4714
Author: Bryan Cutler <cu...@gmail.com>
AuthorDate: Wed Oct 21 09:13:33 2020 +0900

    [SPARK-33189][PYTHON][TESTS] Add env var to tests for legacy nested timestamps in pyarrow
    
    Add an environment variable `PYARROW_IGNORE_TIMEZONE` to pyspark tests in run-tests.py to use legacy nested timestamp behavior. This means that when converting arrow to pandas, nested timestamps with timezones will have the timezone localized during conversion.
    
    The default behavior was changed in PyArrow 2.0.0 to propagate timezone information. Using the environment variable enables testing with newer versions of pyarrow until the issue can be fixed in SPARK-32285.
    
    No
    
    Existing tests
    
    Closes #30111 from BryanCutler/arrow-enable-legacy-nested-timestamps-SPARK-33189.
    
    Authored-by: Bryan Cutler <cu...@gmail.com>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
    (cherry picked from commit 47a6568265525002021c1e5cfa4330f5b1a91469)
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 .github/workflows/build_and_test.yml | 4 ++--
 python/run-tests.py                  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml
index 649ce95..d1203c6 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -149,7 +149,7 @@ jobs:
       # PyArrow is not supported in PyPy yet, see ARROW-2651.
       # TODO(SPARK-32247): scipy installation with PyPy fails for an unknown reason.
       run: |
-        python2.7 -m pip install numpy 'pyarrow<2.0.0' pandas scipy xmlrunner
+        python2.7 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
         python2.7 -m pip list
         # PyPy does not have xmlrunner
         pypy3 -m pip install numpy pandas
@@ -157,7 +157,7 @@ jobs:
     - name: Install Python packages (Python 3.8)
       if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
       run: |
-        python3.8 -m pip install numpy 'pyarrow<2.0.0' pandas scipy xmlrunner
+        python3.8 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
         python3.8 -m pip list
     # SparkR
     - name: Install R 4.0
diff --git a/python/run-tests.py b/python/run-tests.py
index 9fb0983..a647f13 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -76,6 +76,8 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
         'PYSPARK_PYTHON': which(pyspark_python),
         'PYSPARK_DRIVER_PYTHON': which(pyspark_python),
         'PYSPARK_ROW_FIELD_SORTING_ENABLED': 'true'
+        # Preserve legacy nested timezone behavior for pyarrow>=2, remove after SPARK-32285
+        'PYARROW_IGNORE_TIMEZONE': '1',
     })
 
     # Create a unique temp directory under 'target/' for each run. The TMPDIR variable is


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org