You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/03/09 15:49:15 UTC

[GitHub] [arrow] AlenkaF commented on a diff in pull request #34498: GH-34404: [Python] Failing tests because pandas.Index can now store all numeric dtypes (not only 64bit versions)

AlenkaF commented on code in PR #34498:
URL: https://github.com/apache/arrow/pull/34498#discussion_r1131231811


##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -735,8 +735,15 @@ def _partition_test_for_filesystem(fs, base_path, use_legacy_dataset=True):
                    .reset_index(drop=True)
                    .reindex(columns=result_df.columns))
 
-    expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
-    expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    if use_legacy_dataset or Version(pd.__version__) < Version("2.0.0"):
+        expected_df['foo'] = pd.Categorical(df['foo'], categories=foo_keys)
+        expected_df['bar'] = pd.Categorical(df['bar'], categories=bar_keys)
+    else:
+        # With pandas 2.0.0 Index can store all numeric dtypes (not just
+        # int64/uint64/float64). Using astype() to create a categorical
+        # column preserves original dtype (int32)
+        expected_df['foo'] = expected_df['foo'].astype("category")
+        expected_df['bar'] = expected_df['bar'].astype("category")

Review Comment:
   Unfortunately it doesn't: on older versions of pandas (and in the legacy dataset, donno why, didn't think it makes sense to investigate) the `foo` value type in `result_df ` is `int64` but `.astype("category")` would define the type of `foo` in `expected_df` as `int32`.
   
   Which is just the opposite in newer version of pandas: the `foo` value type in `result_df` is `int32` but `pd.Categorical` defines the type of `foo` in `expected_df` as `int64`.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org