You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/14 16:30:15 UTC

[GitHub] [arrow] wjones127 commented on issue #12416: Parquet Partition issues with Int64 Null

wjones127 commented on issue #12416:
URL: https://github.com/apache/arrow/issues/12416#issuecomment-1039292700


   Hi, this problem here likely isn't the partitioning read, but the conversion to pandas. From [the docs](https://arrow.apache.org/docs/python/pandas.html#nullable-types):
   
   > In Arrow all data types are nullable, meaning they support storing missing values. In pandas, however, not all data types have support for missing data. Most notably, the default integer data types do not, and will get casted to float when missing values are introduced. Therefore, when an Arrow array or table gets converted to pandas, integer columns will become float when missing values are present:
   
   There is a workaround using `type_mapper` in [that section of the docs]https://arrow.apache.org/docs/python/pandas.html#nullable-types), so probably worth reading.
   
   If you do find there is an issue with the inferred partitioning schema, you can manually pass the partitioning schema. This should be available in version 4.0.1: https://arrow.apache.org/docs/4.0/python/dataset.html#different-partitioning-schemes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org