You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Philipp Claßen (JIRA)" <ji...@apache.org> on 2016/04/30 20:15:13 UTC

[jira] [Created] (SPARK-15042) ConnectedComponents fails to compute graph with 200 vertices (but long paths)

Philipp Claßen created SPARK-15042:
--------------------------------------

             Summary: ConnectedComponents fails to compute graph with 200 vertices (but long paths)
                 Key: SPARK-15042
                 URL: https://issues.apache.org/jira/browse/SPARK-15042
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.6.1
         Environment: Local cluster (1 instance) running on Arch Linux
Scala 2.11.7, Java 1.8.0_92
            Reporter: Philipp Claßen


ConnectedComponents takes forever and eventually fails with OutOfMemory when computing this graph: {code}{ (i, i+1) | i <- { 1..200 } }{code}

If you generate the example graph, e.g., with this bash command

{code}
for i in {1..200} ; do echo "$i $(($i+1))" ; done > input.graph
{code}

... then should be able to reproduce in the spark-shell by running:

{code}
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._
val graph = GraphLoader.edgeListFile(sc, "input.graph").cache()

ConnectedComponents.run(graph)
{code}

For additional information, here is a link to my related question on Stackoverflow:
http://stackoverflow.com/q/36892272/783510

One comment so far, was that the number of skipping tasks grows exponentially.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org