You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "shimingfei (JIRA)" <ji...@apache.org> on 2015/05/06 03:41:59 UTC
[jira] [Created] (SPARK-7389) Tachyon integration improvement
shimingfei created SPARK-7389:
---------------------------------
Summary: Tachyon integration improvement
Key: SPARK-7389
URL: https://issues.apache.org/jira/browse/SPARK-7389
Project: Spark
Issue Type: Improvement
Components: Block Manager
Reporter: shimingfei
Two main changes:
1. Add two functions in ExternalBlockManager, which are putValues and getValues, because the implementation may not rely on the putBytes and getBytes
2. improve Tachyon integration.
Currently, when putting data into Tachyon, Spark first serialize all data in one partition into a ByteBuffer, and then write into Tachyon, this will use much memory and increase GC overhead
when getting data from Tachyon, getValues depends on getBytes, which also read all data into On heap byte arry, and result in much memory usage.
This PR changes the approach of the two functions, make them read / write data by stream to reduce memory usage.
In our testing, when data size is huge, this patch reduces about 30% GC time and 70% full GC time, and total execution time reduces about 10%
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org