You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/17 14:36:16 UTC
[GitHub] [arrow-datafusion] alamb commented on issue #1527: Error reading Parquet files after schema evolution
alamb commented on issue #1527:
URL: https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1014610661
Thanks for the report @capkurmagati -- I am not sure if your usecase ever worked (in which case it is a bug).
Regardless, as @tustvold mentions, we basically have the same usecase in IOx where some parquet files have a subset of the unified schema and we pad the remaining columns with NULLs.
This picture might help https://github.com/influxdata/influxdb_iox/blob/f3f6f335a93d2910a5cc55e12662dfda82143701/query/src/provider/adapter.rs#L45-L72
We would be happy to contribute this to DataFusion / the file reader. @capkurmagati is there any chance you can write an end to end test (aka make the two parquet files you refer to above)? If so bringing in the `SchemaAdapter` stream would be pretty straightforward
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org