You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "András Barják (JIRA)" <ji...@apache.org> on 2014/10/06 09:42:34 UTC

[jira] [Commented] (SPARK-2418) Custom checkpointing with an external function as parameter

    [ https://issues.apache.org/jira/browse/SPARK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160072#comment-14160072 ] 

András Barják commented on SPARK-2418:
--------------------------------------

Hi, I would be happy if someone could comment on my pull request. I don't mind if it gets rejected, but I would like to know the reason why so I can possibly come up with a better solution that fits to the official Spark core vision!

We really need this feature at our company to be able to custom save and checkpoint the rdds without the need of reloading them.
I am not sure I understand how this would be related to the pluggable interfaces. Please, explain me how you imagine solving this issue!

> Custom checkpointing with an external function as parameter
> -----------------------------------------------------------
>
>                 Key: SPARK-2418
>                 URL: https://issues.apache.org/jira/browse/SPARK-2418
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: András Barják
>
> If a job consists of many shuffle heavy transformations the current resilience model might be unsatisfactory. In our current use-case we need a persistent checkpoint that we can use to save our RDDs on disk in a custom location and load it back even if the driver dies. (Possible other use cases: store the checkpointed data in various formats: SequenceFile, csv, Parquet file, MySQL etc.)
> After talking to [~pwendell] at the Spark Summit 2014 we concluded that a checkpoint where one can customize the saving and RDD reloading behavior can be a good solution. I am open to further suggestions if you have better ideas about how to make checkpointing more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org