You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/10 13:39:48 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

jorisvandenbossche commented on a change in pull request #7395:
URL: https://github.com/apache/arrow/pull/7395#discussion_r438129306



##########
File path: python/pyarrow/tests/test_fs.py
##########
@@ -324,6 +350,14 @@ def hdfs(request, hdfs_connection):
         pytest.lazy_fixture('py_localfs'),
         id='PyFileSystem(ProxyHandler(LocalFileSystem()))'
     ),
+    pytest.param(
+        pytest.lazy_fixture('py_fsspec_localfs'),
+        id='PyFileSystem(FSSpecHandler(fsspec.LocalFileSystem()))'
+    ),
+    # pytest.param(
+    #     pytest.lazy_fixture('py_fsspec_memoryfs'),
+    #     id='PyFileSystem(FSSpecHandler(fsspec.filesystem("memory")))'

Review comment:
       I still need to clean this up before merging. I added them for testing, but the problem is that several tests fail because both the in-memory filesystem as s3fs don't fully follow the spec, so not all "generic" tests work (like creating or removing nested directories). 
   
   Opened issues for the in-memory: https://github.com/intake/filesystem_spec/issues/314, https://github.com/intake/filesystem_spec/issues/313, and existing one for s3fs: https://github.com/dask/s3fs/issues/245
   
   Now I suppose that in practice when it comes to reading files (eg in the dataset API, or in `parquet.read_table`), those limitations won't necessarily be a problem. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org