You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/07/19 11:20:20 UTC

[jira] [Resolved] (SPARK-16478) strongly connected components doesn't cache returned RDD

     [ https://issues.apache.org/jira/browse/SPARK-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-16478.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 14137
[https://github.com/apache/spark/pull/14137]

> strongly connected components doesn't cache returned RDD
> --------------------------------------------------------
>
>                 Key: SPARK-16478
>                 URL: https://issues.apache.org/jira/browse/SPARK-16478
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.6.2
>            Reporter: Michał Wesołowski
>             Fix For: 2.1.0
>
>
> Strongly Connected Components algorithm caches intermediary RDD's but doesn't cache the one that is going to be returned. With large enough graph comparing to available memory when one tries to take action on returned RDD whole RDD has to be computed from scratch which takes much more time than StronglyConnectedComponents alone . 
> I managed to replicate the issue on databrics platform. [Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html] is notebook. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org