You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2015/06/24 23:51:05 UTC

[jira] [Commented] (SPARK-7292) Provide operator to truncate lineage without persisting RDD's

    [ https://issues.apache.org/jira/browse/SPARK-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600229#comment-14600229 ] 

Andrew Or commented on SPARK-7292:
----------------------------------

Posted a short design doc. I'm going to go ahead and implement it and report back if I find anything unexpected in the process.

> Provide operator to truncate lineage without persisting RDD's
> -------------------------------------------------------------
>
>                 Key: SPARK-7292
>                 URL: https://issues.apache.org/jira/browse/SPARK-7292
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Andrew Or
>         Attachments: SPARK-7292-design.pdf
>
>
> Checkpointing exists in Spark to truncate a lineage chain. I've heard requests from some users to allow truncation of lineage in a way that is "cheap" and doesn't serialized and persist the RDD. This is possible if the user is willing to forgo fault tolerance for that RDD (for instance, for shorter running jobs or ones that use a small number of machines). It's pretty easy to allow this so we should look into it for Spark 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org