You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by "Yingqi Lu (JIRA)" <ji...@apache.org> on 2015/10/20 23:11:27 UTC

[jira] [Created] (YARN-4282) JVM reuse in Yarn

Yingqi Lu created YARN-4282:
-------------------------------

Summary: JVM reuse in Yarn
Key: YARN-4282
URL: https://issues.apache.org/jira/browse/YARN-4282
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Yingqi Lu

Dear All,

Recently, we identified an issue inside Yarn with MapReduce. There is a significant amount of time spent in libjvm.so and most of which is compilation.

Attached is a flame graph (visual call graph) of a query running for about 8 mins. Most of the yellow bars represent ‘libjvm.so’ functions while the java functions are colored in red. Data show that more than 40% of overall execution time is spent in compilation itself, but still a lot of code ran in the interpreter mode by looking inside the JVM themselves. In the ideal case, we want everything runs with compiled code over and over again. However in reality, mappers and reducers are long died before the compilation benefits kick in. In other word, we take the performance hit from both compilation and interpreter. JVM reuse feature in MapReduce 1.0 addressed this issue, but it was removed in Yarn. We are right now working on a bunch of JVM parameters to minimize the impact of the performance, but still think it would be good to open a discussion here to seek for more permanent solutions since it ties to the nature of how Yarn works.

We are wondering if any of you have seen this issue before or if there is any on-going project already happening to address this?

Data for this graph was collected across the entire system with multiple JVMs running. The workload we use is BigBench workload (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench).

Thanks,
Yingqi Lu

1. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)