You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/03/28 00:52:00 UTC

[jira] [Created] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader

Wes McKinney created ARROW-8250:
-----------------------------------

             Summary: [C++] Add "random access" / slice read API to RecordBatchFileReader
                 Key: ARROW-8250
                 URL: https://issues.apache.org/jira/browse/ARROW-8250
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Wes McKinney
             Fix For: 1.0.0


If you want to read a small section of a file, it is not possible to easily determine the relevant record batches that need "rehydrating".

I would propose the following:

* A way to cheaply read (and cache, so this doesn't have to be done multiple times) all the RecordBatch metadata without deserializing the record batch data structures themselves
* Based on the metadata you can then determine the range of batches that need to be rehydrated and then sliced accordingly to produce the Table of interest

This functionality can be lifted into the Feather read APIs also



--
This message was sent by Atlassian Jira
(v8.3.4#803005)