You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Misha Dmitriev (JIRA)" <ji...@apache.org> on 2018/07/16 22:49:00 UTC
[jira] [Created] (SPARK-24827) Some memory waste in History Server
by strings in AccumulableInfo objects
Misha Dmitriev created SPARK-24827:
--------------------------------------
Summary: Some memory waste in History Server by strings in AccumulableInfo objects
Key: SPARK-24827
URL: https://issues.apache.org/jira/browse/SPARK-24827
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.2.2
Reporter: Misha Dmitriev
I've analyzed a heap dump of Spark History Server with jxray ([www.jxray.com)|http://www.jxray.com)/] and found that 42% of the heap is wasted due to duplicate strings. The biggest sources of such strings are the {{name}} and {{value}} data fields of {{AccumulableInfo}} objects:
{code:java}
7. Duplicate Strings: overhead 42.1%
Total strings Unique strings Duplicate values Overhead
13,732,278 729,234 354,032 867,177K (42.1%)
Expensive data fields:
318,421K (15.4%), 3669685 / 100% dup strings (8 unique), 3669685 dup backing arrays:
↖org.apache.spark.scheduler.AccumulableInfo.name
178,994K (8.7%), 3674403 / 99% dup strings (35640 unique), 3674403 dup backing arrays:
↖scala.Some.x
168,601K (8.2%), 3401960 / 92% dup strings (175826 unique), 3401960 dup backing arrays:
↖org.apache.spark.scheduler.AccumulableInfo.value{code}
That is, 15.4% of the heap is wasted by {{AccumulableInfo.name}} and 8.2% is wasted by {{AccumulableInfo.value}}.
It turns out that the problem has been partially addressed in spark 2.3+, e.g.
[https://github.com/apache/spark/blob/b045315e5d87b7ea3588436053aaa4d5a7bd103f/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L590]
However, this code has two minor problems:
# Strings for {{AccumulableInfo.value}} are not interned in the above code, only {{AccumulableInfo.name}}.
# For interning, the code in {{weakIntern(String)}} method uses a Guava interner ({{stringInterner = Interners.newWeakInterner[String]()}}). This is an old-fashioned, less efficient way of interning strings. Since some 3-4 years old JDK7 version, the built-in JVM {{String.intern()}} method is much more efficient, both in terms of CPU and memory.
It is therefore suggested to add interning for {{value}} and replace the Guava interner with {{String.intern()}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org