You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/03/08 10:40:11 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

AlenkaF opened a new pull request, #34498:
URL: https://github.com/apache/arrow/pull/34498

   ### Rationale for this change
   Several failing tests in the nightly build (https://github.com/ursacomputing/crossbow/actions/runs/4277727973/jobs/7446784501) 
   
   ### What changes are included in this PR?
   Due to change in supported dtypes for Index in pandas, the tests expecting `int64`and not `int32` are failing with dev version of pandas. The failing tests are updated to match the new pandas behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #34498:
URL: https://github.com/apache/arrow/pull/34498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34498:
URL: https://github.com/apache/arrow/pull/34498#issuecomment-1460846916

   Revision: a6cd6ad4fc1388e396a8573734b242914809422b
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-d00ad7f02e](https://github.com/ursacomputing/crossbow/branches/all?query=actions-d00ad7f02e)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.7-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d00ad7f02e-github-test-conda-python-3.7-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/4368331825/jobs/7640768814)|
   |test-conda-python-3.7-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d00ad7f02e-github-test-conda-python-3.7-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/4368332214/jobs/7640769396)|
   |test-conda-python-3.8-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d00ad7f02e-github-test-conda-python-3.8-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/4368331089/jobs/7640766915)|
   |test-conda-python-3.8-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d00ad7f02e-github-test-conda-python-3.8-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/4368330740/jobs/7640766318)|
   |test-conda-python-3.9-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-d00ad7f02e-github-test-conda-python-3.9-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/4368331381/jobs/7640767644)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34498:
URL: https://github.com/apache/arrow/pull/34498#issuecomment-1459964892

   * Closes: #34404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34498:
URL: https://github.com/apache/arrow/pull/34498#discussion_r1130550569


##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -735,8 +735,15 @@ def _partition_test_for_filesystem(fs, base_path, use_legacy_dataset=True):
                    .reset_index(drop=True)
                    .reindex(columns=result_df.columns))
 
-    expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
-    expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    if use_legacy_dataset or Version(pd.__version__) < Version("2.0.0"):
+        expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
+        expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    else:
+        # With pandas 2.0.0 Index can store all numeric dtypes (not just
+        # int64/uint64/float64). Using astype() to create a categorical
+        # column preserves original dtype (int32)
+        expected_df['foo'] = expected_df['foo'].astype("category")
+        expected_df['bar'] = expected_df['bar'].astype("category")

Review Comment:
   This new way might work for all pandas versions? (then the if/else is not needed)



##########
python/pyarrow/tests/test_compute.py:
##########
@@ -1934,22 +1934,48 @@ def _check_datetime_components(timestamps, timezone=None):
         [iso_year, iso_week, iso_day],
         fields=iso_calendar_fields)
 
-    assert pc.year(tsa).equals(pa.array(ts.dt.year))
+    year = ts.dt.year
+    month = ts.dt.month
+    day = ts.dt.day
+    dayofweek = ts.dt.dayofweek
+    dayofyear = ts.dt.dayofyear
+    quarter = ts.dt.quarter
+    hour = ts.dt.hour
+    minute = ts.dt.minute
+    second = ts.dt.second.values
+    microsecond = ts.dt.microsecond
+    nanosecond = ts.dt.nanosecond
+    if Version(pd.__version__) >= Version("2.0.0"):
+        # Casting is required because pandas with 2.0.0 various numeric
+        # date/time attributes have dtype int32 (previously int64)

Review Comment:
   Also here, it might not do any harm to just always cast to int64 above, without this extra `if: ` block (if it's already int64, that should be a no-op)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34498:
URL: https://github.com/apache/arrow/pull/34498#issuecomment-1460844491

   @github-actions crossbow submit *pandas*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #34498:
URL: https://github.com/apache/arrow/pull/34498#discussion_r1131231811


##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -735,8 +735,15 @@ def _partition_test_for_filesystem(fs, base_path, use_legacy_dataset=True):
                    .reset_index(drop=True)
                    .reindex(columns=result_df.columns))
 
-    expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
-    expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    if use_legacy_dataset or Version(pd.__version__) < Version("2.0.0"):
+        expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
+        expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    else:
+        # With pandas 2.0.0 Index can store all numeric dtypes (not just
+        # int64/uint64/float64). Using astype() to create a categorical
+        # column preserves original dtype (int32)
+        expected_df['foo'] = expected_df['foo'].astype("category")
+        expected_df['bar'] = expected_df['bar'].astype("category")

Review Comment:
   Unfortunately it doesn't: on older versions of pandas (and in the legacy dataset, donno why, didn't think it makes sense to investigate) the `foo` value type in `result_df ` is `int64` but `.astype("category")` would define the type of `foo` in `expected_df` as `int32`.
   
   Which is just the opposite in newer version of pandas: the `foo` value type in `result_df` is `int32` but `pd.Categorical` defines the type of `foo` in `expected_df` as `int64`.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34498:
URL: https://github.com/apache/arrow/pull/34498#issuecomment-1464469614

   Benchmark runs are scheduled for baseline = 9baefea1bca62e390219bd321b5915b4fba99279 and contender = 71f3c568af8fe8a6f886ffddb4318b728046a01a. 71f3c568af8fe8a6f886ffddb4318b728046a01a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/092aecde4ca04d3a96c7c66f04a3fd0c...7eb85a38dc1f484c932b68360f31d38f/)
   [Finished :arrow_down:0.24% :arrow_up:0.12%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/104525e9c5a6411f8cf04bfb65610142...ed8881cee9a24b299969352c262532b2/)
   [Finished :arrow_down:2.55% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/7b5b2796149941f0806163c301d8c052...c9a553a60d5b470389944c9c22b206e5/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/60f1e4f102ee42768a89a5f452b77c06...0750c6e8e1cd408ea9ee5ab7c3274503/)
   Buildkite builds:
   [Finished] [`71f3c568` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2509)
   [Finished] [`71f3c568` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2539)
   [Finished] [`71f3c568` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2507)
   [Finished] [`71f3c568` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2530)
   [Finished] [`9baefea1` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2508)
   [Finished] [`9baefea1` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2538)
   [Finished] [`9baefea1` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2506)
   [Finished] [`9baefea1` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2529)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org