You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/26 15:22:00 UTC

[jira] [Commented] (TINKERPOP-2081) PersistedOutputRDD materialises rdd lazily with Spark 2.x

    [ https://issues.apache.org/jira/browse/TINKERPOP-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665290#comment-16665290 ] 

ASF GitHub Bot commented on TINKERPOP-2081:
-------------------------------------------

artem-aliev opened a new pull request #973: TINKERPOP-2081: Fix PersistedOutputRDD to eager persist RDD
URL: https://github.com/apache/tinkerpop/pull/973
 
 
   call rdd.count() action to trigger the caching

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> PersistedOutputRDD materialises rdd lazily with Spark 2.x
> ---------------------------------------------------------
>
>                 Key: TINKERPOP-2081
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2081
>             Project: TinkerPop
>          Issue Type: Bug
>    Affects Versions: 3.3.4
>            Reporter: Artem Aliev
>            Priority: Major
>
> PersistedOutputRDD is not actually persist RDD in spark memory but mark it for lazy caching in the future. It looks like caching was eager in Spark 1.6, but in spark 2.0 it lazy.
> The lazy caching looks wrong for this case, the source graph could be changed after snapshot is created and snapshot should not be affected by that changes.
> The fix itself is simple: PersistedOutputRDD should call any spark action to trigger eager caching. For example count()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)