You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/03/06 08:27:20 UTC

[GitHub] [arrow] AlenkaF commented on issue #34449: [Python] `to_parquet` fails with a category field backed by pyarrow string

AlenkaF commented on issue #34449:
URL: https://github.com/apache/arrow/issues/34449#issuecomment-1455695427

   Thank you for reporting. Yes, this is a known bug in pyarrow and is already being fixed in https://github.com/apache/arrow/issues/33727 with https://github.com/apache/arrow/pull/34289.
   
   Running your example on the linked PR branch:
   
   ```python
   >>> df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "y": list("caaab")})
   >>> df["y"] = df["y"].astype(pd.StringDtype("pyarrow"))
   >>> df["y"] = df["y"].astype("category")
   >>> df.to_parquet("test-arrow.parquet")
   
   >>> pd.read_parquet("test-arrow.parquet")
      x  y
   0  1  c
   1  2  a
   2  3  a
   3  4  a
   4  5  b
   >>> pd.read_parquet("test-arrow.parquet").dtypes
   x       int64
   y    category
   dtype: object
   >>> pd.read_parquet("test-arrow.parquet").dtypes.y
   CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org