You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/17 14:36:16 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #1527: Error reading Parquet files after schema evolution

alamb commented on issue #1527:
URL: https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1014610661


   Thanks for the report @capkurmagati  -- I am not sure if your usecase ever worked (in which case it is a bug).
   
   Regardless, as @tustvold  mentions, we basically have the same usecase in IOx where some parquet files have a subset of the unified schema and we pad the remaining columns with NULLs. 
   
   This picture might help https://github.com/influxdata/influxdb_iox/blob/f3f6f335a93d2910a5cc55e12662dfda82143701/query/src/provider/adapter.rs#L45-L72
   
   We would be happy to contribute this to DataFusion / the file reader. @capkurmagati  is there any chance you can write an end to end test (aka make the two parquet files you refer to above)? If so bringing in the `SchemaAdapter` stream would be pretty straightforward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org