You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/28 17:38:36 UTC

[GitHub] [arrow-rs] alamb commented on issue #4459: Regression in in parquet `42.0.0` : Bad parquet column indexes for All Null Columns, resulting in `Parquet error: StructArrayReader out of sync` on read

alamb commented on issue #4459:
URL: https://github.com/apache/arrow-rs/issues/4459#issuecomment-1611827803

   FYI We found this in our internal testing. I will post symptoms here to help anyone else who comes across this:
   
   We found a query like this in IOx that resulted in `Parquet error: StructArrayReader out of sync` on read errors 
   
   
   ```
   $ datafusion-cli -c "SELECT col, time FROM 'data.parquet' WHERE 1684850057953220316 <= time::bigint"
   DataFusion CLI v27.0.0
   Arrow error: External error: Arrow: Parquet argument error: Parquet error: StructArrayReader out of sync in read_records, expected 0 skipped, got 11
   ```
   
   The workaround for datafusion is to disable using the page index:
   
   ```
   ❯ set datafusion.execution.parquet.enable_page_index = false;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org