You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yan-hic (via GitHub)" <gi...@apache.org> on 2023/06/18 17:51:48 UTC

[GitHub] [arrow] yan-hic commented on issue #11967: Parquet schema / data type for entire null object DataFrame columns

yan-hic commented on issue #11967:
URL: https://github.com/apache/arrow/issues/11967#issuecomment-1596221798

   > still needs to choose some physical type for the column in the Parquet file. And by default, Arrow uses INT32 for the physical type.
   @jorisvandenbossche can that default be changed in `pyarrow` ? I want to use STRING instead.
   The issue we are facing is that when saving a dataset, i.e. multiple parquet files, with one file having a column coincidentally with all nulls whereas other files have some strings for that column. The result is different schema across the dataset (INT32 or STRING), which when reading with bigquery (for instance), raises errors.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org