You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ankur Dave (JIRA)" <ji...@apache.org> on 2014/06/06 04:06:01 UTC

[jira] [Resolved] (SPARK-1988) Enable storing edges out-of-core

     [ https://issues.apache.org/jira/browse/SPARK-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur Dave resolved SPARK-1988.
-------------------------------

    Resolution: Fixed

This is mitigated by SPARK-1991, because the user can increase the number of edge partitions so that each edge partition individually fits in memory, then set the storage level of the edges to MEMORY_AND_DISK.

> Enable storing edges out-of-core
> --------------------------------
>
>                 Key: SPARK-1988
>                 URL: https://issues.apache.org/jira/browse/SPARK-1988
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>            Priority: Minor
>
> A graph's edges are usually the largest component of the graph, and a cluster may not have enough memory to hold them. For example, a graph with 20 billion edges requires at least 400 GB of memory, because each edge takes 20 bytes.
> GraphX only ever accesses the edges using full table scans or cluster scans using the clustered index on source vertex ID. The edges are therefore amenable to being stored on disk. EdgePartition should provide the option of storing edges on disk transparently and streaming through them as needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)