You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Thomas Newton (Jira)" <ji...@apache.org> on 2023/01/18 14:17:00 UTC
[jira] [Created] (HDFS-16894) Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`
Thomas Newton created HDFS-16894:
------------------------------------
Summary: Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`
Key: HDFS-16894
URL: https://issues.apache.org/jira/browse/HDFS-16894
Project: Hadoop HDFS
Issue Type: Improvement
Components: fs/azure
Affects Versions: 3.3.4, 3.3.2
Reporter: Thomas Newton
Fix For: 3.3.2
When working with Azure blob storage listing operations can often be quite slow even on storage accounts with the hierarchical namespace.
This can be mitigated by listing only a specific subset of directories using a function like [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-]
Which accepts a `startFrom` argument and lists all files in order starting from there.
I'm wondering if we could add a method to the `AzureBlobFileSystem`
Something like:
```
public FileStatus[] listStatus(final Path f, final String startFrom) throws IOException
```
This exposes the functionality that already exists on the underlying `AzureBlobFileSystemStore`. My understanding from reading a bit of the code is that users should mainly be dealing with `AzureBlobFileSystem`s and `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing it on the `AzureBlobFileSystem`.
I'm very un-familiar with java but I'm told that keeping strictly to interfaces is strongly preferred. However I can see some examples already on `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) so I'm hoping its acceptable to add a method like I described only for the one `FileSystem` implementation.
The specific motivation for this is to unblock [https://github.com/delta-io/delta/issues/1568]
I would be willing to contribute this if maintainers think the plan is reasonable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org