You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Tim Cooijmans (Jira)" <ji...@apache.org> on 2021/03/17 20:46:00 UTC

[jira] [Commented] (ARROW-2579) [Python] Appending to streamable table file format doesn't seem to work

    [ https://issues.apache.org/jira/browse/ARROW-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303712#comment-17303712 ] 

Tim Cooijmans commented on ARROW-2579:
--------------------------------------

I found this issue through a Google search. At this time, it's not clear how to do appends from Python as FileOutputStream does not seem to be exposed by pyarrow.

(A naive use of pyarrow's OSFile fails because it rejects the "ab" file mode, expecting either read or write, and a naive use of the cat utility results in only the most recent data being readable from the file.)

> [Python] Appending to streamable table file format doesn't seem to work
> -----------------------------------------------------------------------
>
>                 Key: ARROW-2579
>                 URL: https://issues.apache.org/jira/browse/ARROW-2579
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>            Reporter: Rob Ambalu
>            Priority: Major
>
> As far as I can tell it looks like appending to a streaming file format isn’t currently supported, is that right?
> RecordBatchStreamWriter always writes the schema up front, and it doesn’t look like a schema is expected mid file ( assuming im doing this append test correctly, this is the error I hit when I try to read back this file into python:
>  Traceback (most recent call last):
>   File "/home/ra7293/rba_arrow_mmap.py", line 9, in <module>
>     table = reader.read_all()
>   File "ipc.pxi", line 302, in pyarrow.lib._RecordBatchReader.read_all
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Message not expected type: record batch, was: 1
>  
> This reader script works fine if I write once / don’t append.
> Seeing as IO interfaces support Append, streaming should support it as well ( if for whatever reason this cant be supported, RecordBatchStreamWriter should throw if configured with an OutputStreamer that is attempting to append )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)