You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Thomas Marquardt (JIRA)" <ji...@apache.org> on 2018/09/01 00:42:00 UTC

[jira] [Reopened] (HADOOP-15547) WASB: improve listStatus performance

     [ https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Marquardt reopened HADOOP-15547:
---------------------------------------

Reactivating for branch-2 backport.

> WASB: improve listStatus performance
> ------------------------------------
>
>                 Key: HADOOP-15547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15547
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>    Affects Versions: 2.9.1, 3.0.2
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>            Priority: Major
>             Fix For: 3.1.1
>
>         Attachments: HADOOP-15547-004.patch, HADOOP-15547-004.patch, HADOOP-15547.001.patch, HADOOP-15547.002.patch, HADOOP-15547.003.patch
>
>
> The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm to remove duplicates and uses too much memory due to the extra conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org