You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/05/15 18:35:41 UTC

[GitHub] [arrow] westonpace commented on issue #35569: python - read multiple parquets that have different schema?

westonpace commented on issue #35569:
URL: https://github.com/apache/arrow/issues/35569#issuecomment-1548361801

   Which version of pyarrow is this?  Any schema evolution is going to be provided by the new datasets feature (`pyarrow.dataset`) and probably not added to `parquet.ParquetDataset`.
   
   Do you get an error with:
   
   ```
   pyarrow.dataset.dataset("bucket/folder", filesystem=s3_src, partitioning="hive")
   ```
   
   Is it possible to manually specify a schema that includes all of the fields (even if some of those fields are missing in some files)?  Do you still get this error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org