You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/03/14 21:57:45 UTC
[jira] Resolved: (HADOOP-79) listFiles optimization
[ http://issues.apache.org/jira/browse/HADOOP-79?page=all ]
Doug Cutting resolved HADOOP-79:
--------------------------------
Fix Version: 0.1
Resolution: Fixed
Assign To: Konstantin Shvachko
This looks fine to me. I simplified FSDirectory.isDir() a bit more & committed this.
Did you find this to be a bottleneck in benchmarks? BTW, I have had some success profiling Hadoop daemons using Sun's built-in sampling profiler. I simply set HADOOP_OPTS to '-agentlib:hprof=cpu=samples,interval=20' before starting a daemon. Then, when I stop that daemon, it dumps profile data to a text file.
And, finally, yes, DFSFileInfo could re-use the length field for both purposes. But this class is only used for interchange, right?, so making it small will only serve to make RPC's a bit faster and won't save a lot of memory.
> listFiles optimization
> ----------------------
>
> Key: HADOOP-79
> URL: http://issues.apache.org/jira/browse/HADOOP-79
> Project: Hadoop
> Type: Improvement
> Components: dfs
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Fix For: 0.1
> Attachments: DFSFileInfo.patch
>
> In FSDirectory.getListing() looking at line
> listing[i] = new DFSFileInfo(curName, cur.computeFileLength(), cur.computeContentsLength(), isDir(curName));
> 1. computeContentsLength() is actually calling computeFileLength(), so this is called twice,
> meaning that file length is calculated twice.
> 2. isDir() is looking for the INode (starting from the rootDir) that has actually been obtained
> just two lines above, note that the tree is locked by that time.
> I propose a simple optimization for this, see attachment.
> 3. A related question: Why DFSFileInfo needs 2 separate fields len for file length and
> contentsLen for directory contents size? It looks like these fields are mutually exclusive,
> and we can use just one, interpreting it one way or another with respect to the value of isDir.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira