You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/03/14 21:57:45 UTC

[jira] Resolved: (HADOOP-79) listFiles optimization

     [ http://issues.apache.org/jira/browse/HADOOP-79?page=all ]
     
Doug Cutting resolved HADOOP-79:
--------------------------------

    Fix Version: 0.1
     Resolution: Fixed
      Assign To: Konstantin Shvachko

This looks fine to me.  I simplified FSDirectory.isDir() a bit more & committed this.

Did you find this to be a bottleneck in benchmarks?  BTW, I have had some success profiling Hadoop daemons using Sun's built-in sampling profiler.  I simply set HADOOP_OPTS to  '-agentlib:hprof=cpu=samples,interval=20' before starting a daemon.  Then, when I stop that daemon, it dumps profile data to a text file.

And, finally, yes, DFSFileInfo could re-use the length field for both purposes.  But this class is only used for interchange, right?, so making it small will only serve to make RPC's a bit faster and won't save a lot of memory.

> listFiles optimization
> ----------------------
>
>          Key: HADOOP-79
>          URL: http://issues.apache.org/jira/browse/HADOOP-79
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.1
>  Attachments: DFSFileInfo.patch
>
> In FSDirectory.getListing() looking at line
> listing[i] = new DFSFileInfo(curName, cur.computeFileLength(), cur.computeContentsLength(), isDir(curName));
> 1. computeContentsLength() is actually calling computeFileLength(), so this is called twice,
> meaning that file length is calculated twice.
> 2. isDir() is looking for the INode (starting from the rootDir) that has actually been obtained
> just two lines above, note that the tree is locked by that time.
> I propose a simple optimization for this, see attachment.
> 3. A related question: Why DFSFileInfo needs 2 separate fields len for file length and
> contentsLen for directory contents size? It looks like these fields are mutually exclusive,
> and we can use just one, interpreting it one way or another with respect to the value of isDir.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira