You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "WillDyson (via GitHub)" <gi...@apache.org> on 2023/06/02 14:56:29 UTC

[GitHub] [arrow] WillDyson commented on issue #26807: [Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)

WillDyson commented on issue #26807:
URL: https://github.com/apache/arrow/issues/26807#issuecomment-1573870542

   ABFS URIs take the following form:
   abfs://<container_name>@<account_name>.dfs.core.windows.net
   
   It looks like the sanitisation that's done as part of the from_uri method ends up changing it to:
   abfs://<account_name>.dfs.core.windows.net
   
   This can be seen in the error returned – it is missing the container name.
   
   CC: [hdfs.cc](https://github.com/apache/arrow/blob/7ca7724139d3b04161369ffce04cf53e74eec54c/cpp/src/arrow/filesystem/hdfs.cc#L367) (not familiar with this codebase so I may have picked up the wrong codepath)
   
   A similar exception can be found using the Java client:
   
   ```
   scala> FileSystem.get(new URI("abfs://bogus.dfs.core.windows.net"), new Configuration())
   23/06/02 14:50:26 WARN fs.FileSystem: Failed to initialize fileystem abfs://bogus.dfs.core.windows.net: abfs://bogus.dfs.core.windows.net has invalid authority.
   org.apache.hadoop.fs.azurebfs.contracts.exceptions.InvalidUriAuthorityException: abfs://bogus.dfs.core.windows.net has invalid authority.
     at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.authorityParts(AzureBlobFileSystemStore.java:334)
     at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:202)
     at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:195)
     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452)
     at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162)
     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557)
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504)
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522)
     ... 59 elided
   
   ```
   
   Interestingly, this all appears to happen before a connection to Azure is attempted so you may not need an ADLSgen2 container to validate this particular issue.
   
   If we include a valid authority, the FileSystem is returned:
   
   ```
   scala> FileSystem.get(new URI("abfs://data@bogus.dfs.core.windows.net"), new Configuration())
   res0: org.apache.hadoop.fs.FileSystem = AzureBlobFileSystem{uri=abfs://data@bogus.dfs.core.windows.net, user='wdyson', primaryUserGroup='wdyson'[fs.azure.capability.readahead.safe]}
   ```
   
   The wrapper around libhdfs should be modified to retain the container name before the @.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org