You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/08/21 16:21:00 UTC

[jira] [Resolved] (ARROW-3098) [Python] BufferReader doesn't adhere to the seek protocol

     [ https://issues.apache.org/jira/browse/ARROW-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-3098.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 2454
[https://github.com/apache/arrow/pull/2454]

> [Python] BufferReader doesn't adhere to the seek protocol
> ---------------------------------------------------------
>
>                 Key: ARROW-3098
>                 URL: https://issues.apache.org/jira/browse/ARROW-3098
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.10.0
>            Reporter: Björn Andersson
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have a script that creates a Parquet file and then writes it out to a {{BufferOutputStream}} and then into a {{BufferReader}} with the intention of passing it to a place that takes a file-like object to upload it somewhere else. But the other location relies on being able to seek to the end of the file to figure out how big the file is, e.g.
> {code:python}
> reader.seek(0, 2)
> size = reader.tell()
> reader.seek(0)
> {code}
>  
> But when I do that the following exception is raised: 
>  
> {code}
> pyarrow/io.pxi:209: in pyarrow.lib.NativeFile.seek
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> > ???
> E pyarrow.lib.ArrowIOError: position out of bounds
> {code}
> I compared it to casting to an {{io.BytesIO}} instead which works:
> {code:python}
> import io
> import pyarrow as pa
> def test_arrow_output_stream():
>     output = pa.BufferOutputStream()
>     output.write(b'hello')
>     reader = pa.BufferReader(output.getvalue())
>     reader.seek(0, 2)
>     assert reader.tell() == 5
> def test_python_io_stream():
>     output = pa.BufferOutputStream()
>     output.write(b'hello')
>     buffer = io.BytesIO(output.getvalue().to_pybytes())
>     reader = io.BufferedRandom(buffer)
>     reader.seek(0, 2)
>     assert reader.tell() == 5
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)