You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2021/03/16 11:18:00 UTC
[jira] [Commented] (HADOOP-17428) ABFS: Implementation for
getContentSummary
[ https://issues.apache.org/jira/browse/HADOOP-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302431#comment-17302431 ]
Steve Loughran commented on HADOOP-17428:
-----------------------------------------
I turns out that Hive does use this for some of its calculations of size of unmanaged tables, so its performance *does* matter -at least until/unless we can move Hive off this. So apparently does spark.
Personally, I don't think they should be using it as it is doing an expensive treewalk, but they probably aren't aware of its cost.
Could abfs do some of the treewalk in parallel? I think for s3a I'd go to the deep listing (listFiles(Recursive=true) and make up some directory count number
> ABFS: Implementation for getContentSummary
> ------------------------------------------
>
> Key: HADOOP-17428
> URL: https://issues.apache.org/jira/browse/HADOOP-17428
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.0
> Reporter: Sumangala Patki
> Assignee: Sumangala Patki
> Priority: Major
>
> Adds implementation for HDFS method getContentSummary, which takes in a Path argument and returns details such as file/directory count and space utilized under that path.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org