You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "0x26res (via GitHub)" <gi...@apache.org> on 2023/02/03 12:36:16 UTC

[GitHub] [arrow] 0x26res commented on issue #32067: [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset

0x26res commented on issue #32067:
URL: https://github.com/apache/arrow/issues/32067#issuecomment-1415814636

   Now that this change is effective in 11.0, we get this warning when loading data with `use_legacy_dataset=True`.
   
   ```
   FutureWarning: Passing 'use_legacy_dataset=True' to get the legacy behaviour is deprecated as of pyarrow 11.0.0, and the legacy implementation will be removed in a future version.
   ```
   
   I'm in the process of migrating to `use_legacy_dataset=False`, but was wondering what differences to expect between the 2 implementations. Is this documented somewhere?
   
   I have noticed one significant difference in behaviour. The legacy implementation would complain if the parquet schema are heterogeneous. The new implementation will try to convert all files to the schema of the first file it found (or the `schema` argument when provided).
   
   Are there other differences to expect?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org