You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/03/14 16:09:00 UTC

[jira] [Updated] (ARROW-2307) [Python] Unable to read arrow stream containing 0 record batches

     [ https://issues.apache.org/jira/browse/ARROW-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-2307:
--------------------------------
    Component/s:     (was: C)

> [Python] Unable to read arrow stream containing 0 record batches
> ----------------------------------------------------------------
>
>                 Key: ARROW-2307
>                 URL: https://issues.apache.org/jira/browse/ARROW-2307
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Benjamin Duffield
>            Assignee: Wes McKinney
>            Priority: Major
>
> Using java arrow I'm creating an arrow stream, using the stream writer.
>  
> Sometimes I don't have anything to serialize, and so I don't write any record batches. My arrow stream thus consists of just a schema message. 
> {code:java}
> <SCHEMA>
> <EOS [optional]: int32>
> {code}
> I am able to deserialize this arrow stream correctly using the java stream reader, but when reading it with python I instead hit an error
> {code}
> import pyarrow as pa
> # ...
> reader = pa.open_stream(stream)
> df = reader.read_all().to_pandas()
> {code}
> produces
> {code}
>   File "ipc.pxi", line 307, in pyarrow.lib._RecordBatchReader.read_all
>   File "error.pxi", line 77, in pyarrow.lib.check_status
> ArrowInvalid: Must pass at least one record batch
> {code}
> i.e. we're hitting the check in https://github.com/apache/arrow/blob/apache-arrow-0.8.0/cpp/src/arrow/table.cc#L284
> The workaround we're currently using is to always ensure we serialize at least one record batch, even if it's empty. However, I think it would be nice to either support a stream without record batches or explicitly disallow this and then match behaviour in java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)