You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Buttler, David" <bu...@llnl.gov> on 2014/11/11 02:51:43 UTC

inconsistent edge counts in GraphX

Hi,
I am building a graph from a large CSV file.  Each record contains a couple of nodes and about 10 edges.  When I try to load a large portion of the graph, using multiple partitions, I get inconsistent results in the number of edges between different runs.  However, if I use a single partition, or a small portion of the CSV file (say 1000 rows), then I get a consistent number of edges.  Is there anything I should be aware of as to why this could be happening in GraphX?

Thanks,
Dave


Re: inconsistent edge counts in GraphX

Posted by Ankur Dave <an...@gmail.com>.
At 2014-11-11 01:51:43 +0000, "Buttler, David" <bu...@llnl.gov> wrote:
> I am building a graph from a large CSV file.  Each record contains a couple of nodes and about 10 edges.  When I try to load a large portion of the graph, using multiple partitions, I get inconsistent results in the number of edges between different runs.  However, if I use a single partition, or a small portion of the CSV file (say 1000 rows), then I get a consistent number of edges.  Is there anything I should be aware of as to why this could be happening in GraphX?

Is it possible there's some nondeterminism in the way you're reading the file? It would be helpful if you could post the code you're using to load the graph.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org