You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/15 00:15:13 UTC

[GitHub] [arrow] wesm edited a comment on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

wesm edited a comment on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856


   I think the rationale is that the memory and performance savings related to materializing the partition columns are mos significant with string data. So it's definitely beneficial to return them as dictionary types.
   
   IMHO if there is a change from dictionary/dense required post-1.0.0 it is not the end of the world, so I'm OK either with merging this as is or changing all partition types to be dictionary. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org