You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Lubo Slivka (Jira)" <ji...@apache.org> on 2022/03/18 13:22:00 UTC

[jira] [Created] (ARROW-15969) [Python] Add conversion from RecordBatchFileReader to RecordBatchReader

Lubo Slivka created ARROW-15969:
-----------------------------------

             Summary: [Python] Add conversion from RecordBatchFileReader to RecordBatchReader
                 Key: ARROW-15969
                 URL: https://issues.apache.org/jira/browse/ARROW-15969
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Lubo Slivka


The suggested improvement is to introduce a conversion/adapter so that all batches from RecordBatchFileReader can be read one-by-one, once using RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using RecordBatchFileReader. The interface of this reader is incompatible with RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to efficiently (e.g. fully in C++) send out all data by using pyarrow.flight.RecordBatchStream. However, there may be other use cases where client code wants to read data batch-by-batch transparently, without caring about the serialization format.

Further background is here: [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)