You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/21 15:53:27 UTC

[GitHub] [arrow] emkornfield edited a comment on pull request #8177: ARROW-8494: [C++][Parquet] Full support for reading mixed list and structs

emkornfield edited a comment on pull request #8177:
URL: https://github.com/apache/arrow/pull/8177#issuecomment-696205222


   > Just for the record, apart from FixedSizeList, is there anything remaining for full nested Parquet -> Arrow reading?
   
   We need to support LargeList, and Map which should be smaller change (I'm working on a PR) at the schema level inference.  There are a few other JIRAs still open about benchmarking and randomized testing,   Past that, there are some open JIRAs about performance improvements:
   *  Computing all all offsets/bitmaps together (the JIRA is about non-vectorized).  I would expect that for deeply nested structures containing lists this would start to show performance improvements.
   *  Using bitmap based code that was removed from this.  For non-list types I think it can be a big performance (potentially another 20% on our benchmarks) win on all platforms and a win at least for shallowly nested lists I expect it to be better for native Intel.
   
   There is also an unrelated bug on the write side https://github.com/apache/arrow/pull/8219 which I asked for @wesm to review (it is based on some changes in this PR).
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org