You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2017/02/17 23:18:44 UTC

[jira] [Updated] (SPARK-19525) Enable Compression of RDD Checkpoints

     [ https://issues.apache.org/jira/browse/SPARK-19525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shixiong Zhu updated SPARK-19525:
---------------------------------
    Summary: Enable Compression of RDD Checkpoints  (was: Enable Compression of Spark Streaming Checkpoints)

> Enable Compression of RDD Checkpoints
> -------------------------------------
>
>                 Key: SPARK-19525
>                 URL: https://issues.apache.org/jira/browse/SPARK-19525
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Aaditya Ramesh
>
> In our testing, compressing partitions while writing them to checkpoints on HDFS using snappy helped performance significantly while also reducing the variability of the checkpointing operation. In our tests, checkpointing time was reduced by 3X, and variability was reduced by 2X for data sets of compressed size approximately 1 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org