You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/12/14 13:41:00 UTC
[jira] [Commented] (ARROW-10872) [Python]
pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)
[ https://issues.apache.org/jira/browse/ARROW-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249002#comment-17249002 ]
Joris Van den Bossche commented on ARROW-10872:
-----------------------------------------------
[~jjgalvez] thanks a lot for the report!
It's difficult for me to test whether your suggestion would work (and for other arrow developers as well, since we often don't have a Hadoop or Azure filesystem at our disposal to test). But would you be able to try your suggestion yourself, and see it that works for you? A PR would then also be very welcome.
cc [~kszucs]
> [Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)
> -------------------------------------------------------------------------
>
> Key: ARROW-10872
> URL: https://issues.apache.org/jira/browse/ARROW-10872
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 2.0.0
> Reporter: Juan Galvez
> Priority: Major
>
> It's not possible to open a `{{abfs://}}` or `abfss://` URI with the pyarrow.fs.HadoopFileSystem.
> Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an error saying that the authority is invalid (I checked that this is because the string is empty).
> Note that the legacy pyarrow.hdfs.HadoopFileSystem interface works by doing for example:
> * pyarrow.hdfs.HadoopFileSystem(host="abfs://xxx@xxx.dfs.core.windows.net")
> * pyarrow.hdfs.connect(host="abfs://xxx@xxx.dfs.core.windows.net")
> and I believe the new interface should work too by passing the full URI as "host" to `pyarrow.fs.HadoopFileSystem` constructor. However, the constructor wrongly prepends "hdfs://" at the beginning: [https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/python/pyarrow/_hdfs.pyx#L64]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)