You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/18 01:06:18 UTC

[GitHub] [arrow] nevi-me commented on pull request #9240: ARROW-10766: [Rust] [Parquet] Compute nested list definitions

nevi-me commented on pull request #9240:
URL: https://github.com/apache/arrow/pull/9240#issuecomment-761919066


   Hi everyone interested in the Parquet writer.
   
   This PR effectively gives us the ability to compute how to write arbitrarily nested types. It has the side effect that nested lists can also be written.
   There's a few places where I need to tidy up, but they're dependent on the Arrow reader (ARROW-10391), which unfortunately might be a lot of work on its own. I'm a bit worried that I might have to rework a fair share of the writer to handle nesting correctly. I've already seen instances where we don't always have enough information to arrive at the correct solution.
   
   I'll open JIRAs as I go along.
   
   For reviewers, please note:
   
   This has taken me a few months on weekends to get right. I've iterated over various solutions to arrive here.
   The implementation is not optimal (I haven't benchmarked the latest impl), but I'm confident that it's correct.
   the extensive tests on the levels.rs will allow us to refactor with some confidence.
   
   I've spent far too long on this, so I practically don't have any fresh eyes here. I worked on all the edge-cases that I could think with lists and structs. I've documented them, but I'll review the doc comments and add more detail where I still feel that it's lacking.
   
   Thank you ❤️


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org