You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/12/21 16:35:00 UTC

[jira] [Comment Edited] (ARROW-11000) [Python] Enable random access reading for Python file objects (if supported)

    [ https://issues.apache.org/jira/browse/ARROW-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252954#comment-17252954 ] 

Joris Van den Bossche edited comment on ARROW-11000 at 12/21/20, 4:34 PM:
--------------------------------------------------------------------------

It might also be an issue specific to {{PyFileSystem}} handlers. Because when adding some print statements in {{PyReadableFile::ReadAt}}, this is clearly called for a plain python file object:

{code}
In [3]: with open("test.parquet", "rb") as f:
   ...:     pq.read_table(f)
   ...: 
Calling PyReadableFile::ReadAt
Called seek successfully
....
{code}


was (Author: jorisvandenbossche):
It might also be an issue specific to {{PyFileSystem}} handlers. Because when adding a print in {{PyReadableFile::ReadAt}}, this is clearly called for a plain python file object:

{code}
In [3]: with open("test.parquet", "rb") as f:
   ...:     pq.read_table(f)
   ...: 
Calling PyReadableFile::ReadAt
Called seek successfully
....
{code}

> [Python] Enable random access reading for Python file objects (if supported)
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-11000
>                 URL: https://issues.apache.org/jira/browse/ARROW-11000
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> {{arrow::py::PyReadableFile::ReadAt}} is being commented as thread-safe (it puts a lock on the underlying python file) and should thus allow random access in parallel code (for example, reading a subset (eg column) of a parquet file). 
> However, based on experimentation, it seems this doesn't work (eg with s3fs filesystem to read a specific parquet column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)