You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/09 20:58:31 UTC

[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1392: Fix index out of bounds for stats on nested fields

houqp edited a comment on pull request #1392:
URL: https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-989633614


   Sorry for the late reply, @andrei-ionescu the problem you are getting is basically caused by the problem I mentioned in https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-985333246. Fundamentally, it's due to differences between how nested struct fields are handled in Arrow and Parquet.
   
   @lst-codes managing stats in a nested data structure could fix the problem. However, being inspired by https://github.com/apache/arrow/pull/11704, I think it would be more efficient to resolve the nested column key path during planning by traversing the `Expr::GetIndexedField` expression , then only load corresponding parquet column stats into memory. This way, we can skip columns that are not accessed by the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org