You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/05/28 22:27:02 UTC
[jira] [Commented] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization

    [ https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011531#comment-14011531 ] 

Hadoop QA commented on HADOOP-10634:
------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12647191/HADOOP-10634.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//console

This message is automatically generated.

> Add recursive list apis to FileSystem to give implementations an opportunity for optimization
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10634
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10634
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.4.0
>            Reporter: Sumit Kumar
>         Attachments: HADOOP-10634.patch
>
>
> Currently different code flows in hadoop use recursive listing to discover files/folders in a given path. For example in FileInputFormat (both mapreduce and mapred implementations) this is done while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for fs implementations like s3 because every listStatus call ends up being a webservice call to s3. In cases where large number of files are considered for input, this makes getSplits() call slow. 
> This patch adds a new set of recursive list apis that give opportunity to the s3 fs implementation to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for s3 it provides a simple change (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)