You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/02 15:36:00 UTC

[jira] [Comment Edited] (ARROW-10433) [Python] pyarrow doesn't work with s3fs>=0.5

    [ https://issues.apache.org/jira/browse/ARROW-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224739#comment-17224739 ] 

Joris Van den Bossche edited comment on ARROW-10433 at 11/2/20, 3:35 PM:
-------------------------------------------------------------------------

Ah, I suppose the reason you start to see problems with pyarrow 2.0, has the same underlying reason as the windows path issues I was debugging: ARROW-10462. The change is that fsspec stopped subclassing pyarrow.filesystem.FileSystem when pyarrow >= 2.0 is installed (because we now better handle fsspec filesystems, and because pyarrow.filesystem is deprecated). 

But so the code path that you are moving in the PR only occurs when the passed filesystem is _not_ a pyarrow filesystem subclass. So in pyarrow 1.0 (where fsspec-based filesystems subclassed pyarrow), this code was never run in practice ..

Still trying to see where this is not covered by our tests, though ..


was (Author: jorisvandenbossche):
Ah, I suppose the reason you start to see problems with pyarrow 2.0, has the same underlying reason as the windows path issues I was debugging: ARROW-10462. The change is that fsspec stopped subclassing pyarrow.filesystem.FileSystem when pyarrow >= 2.0 is installed (because we now better handle fsspec filesystems, and because pyarrow.filesystem is deprecated). 

But so the code path that you are moving in the PR only occurs when the passed filesystem is _not_ a pyarrow filesystem subclass. So in pyarrow 1.0 (where fsspec-based filesystems subclassed pyarrow), this code was never run in practice ..

> [Python] pyarrow doesn't work with s3fs>=0.5
> --------------------------------------------
>
>                 Key: ARROW-10433
>                 URL: https://issues.apache.org/jira/browse/ARROW-10433
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: Marius van Niekerk
>            Priority: Major
>
> s3fs has moved to using asyncio underneath. 
> Unfortunately pyarrow relies on internal private methods
> as seen here 
> https://github.com/apache/arrow/blob/478286658055bb91737394c2065b92a7e92fb0c1/python/pyarrow/filesystem.py#L412
> this _ls has been changed to be an asynchronous coroutine in more modern versions of s3fs
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)