You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Lubo Slivka (Jira)" <ji...@apache.org> on 2022/03/18 13:27:00 UTC

[jira] [Updated] (ARROW-15969) [Python] Add conversion from RecordBatchFileReader to RecordBatchReader

     [ https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lubo Slivka updated ARROW-15969:
--------------------------------
    Description: 
The suggested improvement is to introduce a conversion/adapter so that all batches from RecordBatchFileReader can be read one-by-one using RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using RecordBatchFileReader. The interface of this reader is incompatible with RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to efficiently (e.g. fully in C++) send out all data by using pyarrow.flight.RecordBatchStream. However, there may be other use cases where client code wants to read data batch-by-batch transparently, without caring about the serialization format.

Further background is here: [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 

  was:
The suggested improvement is to introduce a conversion/adapter so that all batches from RecordBatchFileReader can be read one-by-one, once using RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using RecordBatchFileReader. The interface of this reader is incompatible with RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to efficiently (e.g. fully in C++) send out all data by using pyarrow.flight.RecordBatchStream. However, there may be other use cases where client code wants to read data batch-by-batch transparently, without caring about the serialization format.

Further background is here: [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 


> [Python] Add conversion from RecordBatchFileReader to RecordBatchReader
> -----------------------------------------------------------------------
>
>                 Key: ARROW-15969
>                 URL: https://issues.apache.org/jira/browse/ARROW-15969
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Lubo Slivka
>            Priority: Major
>
> The suggested improvement is to introduce a conversion/adapter so that all batches from RecordBatchFileReader can be read one-by-one using RecordBatchReader.
> Perhaps a new instance method RecordBatchFileReader.to_reader()? This would follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader which also has to_reader().
> *Motivation*
> Record Batches serialized into IPC file format can be read using RecordBatchFileReader. The interface of this reader is incompatible with RecordBatchReader.
> This impacts for instance the Flight RPC DoGet, where it is not possible to efficiently (e.g. fully in C++) send out all data by using pyarrow.flight.RecordBatchStream. However, there may be other use cases where client code wants to read data batch-by-batch transparently, without caring about the serialization format.
> Further background is here: [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)