You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Lubo Slivka (Jira)" <ji...@apache.org> on 2022/03/18 13:22:00 UTC
[jira] [Created] (ARROW-15969) [Python] Add conversion from RecordBatchFileReader to RecordBatchReader
Lubo Slivka created ARROW-15969:
-----------------------------------
Summary: [Python] Add conversion from RecordBatchFileReader to RecordBatchReader
Key: ARROW-15969
URL: https://issues.apache.org/jira/browse/ARROW-15969
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Lubo Slivka
The suggested improvement is to introduce a conversion/adapter so that all batches from RecordBatchFileReader can be read one-by-one, once using RecordBatchReader.
Perhaps a new instance method RecordBatchFileReader.to_reader()? This would follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader which also has to_reader().
*Motivation*
Record Batches serialized into IPC file format can be read using RecordBatchFileReader. The interface of this reader is incompatible with RecordBatchReader.
This impacts for instance the Flight RPC DoGet, where it is not possible to efficiently (e.g. fully in C++) send out all data by using pyarrow.flight.RecordBatchStream. However, there may be other use cases where client code wants to read data batch-by-batch transparently, without caring about the serialization format.
Further background is here: [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)