You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/01 15:55:50 UTC

[GitHub] [arrow] bkietz commented on pull request #7181: ARROW-8799: [C++][Parquet] NestedListReader needs to handle empty item batches

bkietz commented on pull request #7181:
URL: https://github.com/apache/arrow/pull/7181#issuecomment-636937045


   @emkornfield @wesm In adding a unit test I've become uncertain of the `ColumnReader` contract and whether my solution upholds it
   
   - [ColumnReader::NextBatch's doccomment](https://github.com/apache/arrow/blob/6716bbd/cpp/src/parquet/arrow/reader.h#L254-L255) states that when no data remains null should be yielded (which I read as: a null ChunkedArray).
   - Instead the tests [assert that the chunked array contain a single null chunk](https://github.com/apache/arrow/blob/6716bbd25ead03ad4774c8d1caa612a8f66e853c/cpp/src/parquet/arrow/arrow_reader_writer_test.cc#L502-L504)
   - When reading into a dictionary `LeafReader` does neither of these and instead yields an empty `ChunkedArray` (for which `NestedListReader` on master is unprepared, causing ARROW-8799).
   
   If modifying `NestedListReader` as I have here is unsatisfactory, I could change `TransferDictionary` to ensure `LeafReader` yields `ChunkedArray{nullptr}` when its out of data. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org