You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/07/19 11:21:21 UTC

[jira] [Updated] (SPARK-16478) strongly connected components doesn't cache returned RDD

     [ https://issues.apache.org/jira/browse/SPARK-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-16478:
------------------------------
      Assignee: Michał Wesołowski
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

> strongly connected components doesn't cache returned RDD
> --------------------------------------------------------
>
>                 Key: SPARK-16478
>                 URL: https://issues.apache.org/jira/browse/SPARK-16478
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>    Affects Versions: 1.6.2
>            Reporter: Michał Wesołowski
>            Assignee: Michał Wesołowski
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> Strongly Connected Components algorithm caches intermediary RDD's but doesn't cache the one that is going to be returned. With large enough graph comparing to available memory when one tries to take action on returned RDD whole RDD has to be computed from scratch which takes much more time than StronglyConnectedComponents alone . 
> I managed to replicate the issue on databrics platform. [Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html] is notebook. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org