You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nitinkak001 <ni...@gmail.com> on 2014/11/24 17:19:07 UTC

Connected Components running for a long time and failing eventually

I am trying to run connected components on a graph generated by reading an
edge file. Its running for a long time(3-4 hrs) and then eventually failing.
Cant find any error in log file. The file I am testing it on has 27M
rows(edges). Is there something obviously wrong with the code?

I tested the same code with around 1000 rows input and it works just fine.

object ConnectedComponentsTest {
  def main(args: Array[String]) {
    val inputFile =
"/user/hive/warehouse/spark_poc.db/window_compare_output_subset/000000_0.snappy,/user/hive/warehouse/spark_poc.db/window_compare_output_subset/000001_0.snappy"
// Should be some file on your system
    val conf = new SparkConf().setAppName("ConnectedComponentsTest")
    val sc = new SparkContext(conf)
    val graph = GraphLoader.edgeListFile(sc, inputFile, true);
    val cc = graph.connectedComponents().vertices;
    cc.saveAsTextFile("/user/kakn/output");
  }
}



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Connected-Components-running-for-a-long-time-and-failing-eventually-tp19659.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org