You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org> on 2012/02/14 02:33:00 UTC

[jira] [Commented] (HADOOP-6502) DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300

    [ https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207442#comment-13207442 ] 

Todd Lipcon commented on HADOOP-6502:
-------------------------------------

I believe this is still an issue in trunk, since the protobufs are still tunneled over a Writable-based mechanism. I see the following trace in an IPC benchmark I'm working on:
{code}
"IPC Client (1065524847) connection to /127.0.0.1:12345 from todd" daemon prio=10 tid=0x000000000250e000 nid=0x3dba runnable [0x00007f96164f0000]
   java.lang.Thread.State: RUNNABLE
        at java.util.zip.ZipFile.getEntry(Native Method)
        at java.util.zip.ZipFile.getEntry(ZipFile.java:166)
        - locked <0x00000007840bb5b0> (a java.util.jar.JarFile)
        at java.util.jar.JarFile.getEntry(JarFile.java:223)
        at java.util.jar.JarFile.getJarEntry(JarFile.java:206)
        at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:771)
        at sun.misc.URLClassPath.getResource(URLClassPath.java:185)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:209)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        - locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        - locked <0x0000000784000150> (a sun.misc.Launcher$AppClassLoader)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1162)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:89)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:835)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:762)
{code}
                
> DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Priority: Critical
>         Attachments: 6502.patch, 6502_v2.patch
>
>
> When listing a directory of around 1300 children, it takes hundreds of milliseconds. It turns out the slowdowness is caused by the change made by HADOOP-4187. The return value of listStatus is an array of FileStatus. When deserializing each element of the array, ReflectionUtils#newInstance(Class<T>, Configuration) is called and then calls setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class path by calling Configuration#getClassByName. Even though Configuration#getClassByName tries to optimize the lookup using a cached map, but since JobConf is not in the class path, so it is not in the cache. Every checkup ends up calling Class.ForName which is very expensive. Deserializing an array of 1300 entries requires calling of Class#ForName 1300 times!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira