You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/03/12 15:32:27 UTC
[GitHub] [hadoop] sahilTakiar opened a new pull request #595: HDFS-14304: High lock contention on hdfsHashMutex in libhdfs

sahilTakiar opened a new pull request #595: HDFS-14304: High lock contention on hdfsHashMutex in libhdfs
URL: https://github.com/apache/hadoop/pull/595
 
 
   [HDFS-14304|https://issues.apache.org/jira/browse/HDFS-14304] describes the issue. The TL;DR is that libhdfs caches JNI `jclass` objects (which is a recommended practice, see https://www.ibm.com/developerworks/library/j-jni/index.html#notc). However, the issue is that the current implementation caches all the `jclass` objects in a global hash table. Since libhdfs is designed to be used by multiple threads concurrently, the hash table is protected by a single mutex to avoid concurrency issues. The issue is that for short lived Java method, acquiring the lock contributes to significant performance overhead.
   
   I first considered using a RW Lock (`pthread_rwlock_t`) to fix this issue, but after some prototyping I found that it didn't fix the actual issue. Acquiring a read lock still requires going through lock acquisition, so the performance issue is still present.
   
   Instead, after some digging, I landed on an approach similar to https://stackoverflow.com/questions/10617735/in-jni-how-do-i-cache-the-class-methodid-and-fieldids-per-ibms-performance-r
   
   Instead of using a hash table to lazily-load and store all the `jclass` objects. This patch allocates all necessary `jclass` objects upon JVM load (the classes are loaded in `getGlobalJNIEnv`). This removes the need for the hash table and the associated locking completely, which is why this patch deletes the `htable.h` class.
   
   One could argue that lazily-loading the `jclass`s (which is what the current code does) is preferable to eagerly-loading the `jclass`s (which is what this patch does). However, after doing some analysis I don't think that holds. Consider the following:
   * There are only 18 `jclass`s that need to be used throughout libhdfs
   * This is done only once per process, and is done during JVM load time (loading 18 `jclass`s should not significantly contribute to startup costs)
   
   Another concern is the space required to store all the `jclass` objects compared to storing them in the `htable`.
   
   Consider that the current `htable` used by `jni_helper` is initially allocated with 4096 `htable_pair` where each `htable_pair` consists of two pointers; so in total that should be about 64 kilobytes.
   
   On the other hand, a single `jclass` object should conservatively take up ~250 bytes:
   * [JOL|https://openjdk.java.net/projects/code-tools/jol/] shows that a single `Class` object takes up 96 bytes
   * Some runtime analysis shows that a `Class` object has to store the name of the `Class` itself, which is of course variable, but is conservatively estimated to be 100 bytes
   ** The longest class name we store is 50 characters, which should equate to about 100 bytes in Java
   * The last 50 bytes are for any additional overhead required for a `jclass` object to reference the Java `Class` object (I'm assuming there are some additional pointers somewhere, but I don't know enough about the JNI implementation to know what they are).
   
   With 18 `jclass` objects that is about 4.5 kilobytes; significantly less that the size of an empty `htable`. So overall, this patch should actually decrease the memory requirement for libhdfs.
   
   Now, some more details about the actual implementation:
   * `jclasses.h` contains all the `jclass` objects that are used throughout libhdfs
   * I changed `invokeMethod` in `jni_helper` to take in an existing `jclass` and deleted all the hash table lookup code
   * `findClassAndInvokeMethod` is equivalent to the old implementation of `invokeMethod` - it is currently just used in test code
   * Deleted `htable` and all associated code: test classes, mutexes, etc.
   * Fixed a few other checkstyle issues

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org