You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Glenn Strycker (JIRA)" <ji...@apache.org> on 2014/05/20 22:12:38 UTC

[jira] [Closed] (SPARK-1885) GraphX reduce function not working properly -- returns only 1 element

     [ https://issues.apache.org/jira/browse/SPARK-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Glenn Strycker closed SPARK-1885.
---------------------------------

    Resolution: Fixed

User needed to use reduceByKey, not reduce

> GraphX reduce function not working properly -- returns only 1 element
> ---------------------------------------------------------------------
>
>                 Key: SPARK-1885
>                 URL: https://issues.apache.org/jira/browse/SPARK-1885
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Glenn Strycker
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When .reduce() is performed on an EdgeRDD of length n, the result is only 1 element value instead of n.
> Given a graph object "orig_graph", I have the following output for the edge list:
> scala> orig_graph.edges.collect
> = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1), Edge(7,3,1))
> When I apply a map function (with _.copy() commands added, as per https://issues.apache.org/jira/browse/SPARK-1188), it looks ok:
> scala> orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).collect
> = Array((Edge(1,4,1),1), (Edge(1,5,1),1), (Edge(1,7,1),1), (Edge(2,5,1),1), (Edge(2,6,1),1), (Edge(3,5,1),1), (Edge(3,6,1),1), (Edge(3,7,1),1), (Edge(4,1,1),1), (Edge(5,1,1),1), (Edge(5,2,1),1), (Edge(5,3,1),1), (Edge(6,2,1),1), (Edge(6,3,1),1), (Edge(7,1,1),1), (Edge(7,3,1),1))
> BUT NOW, when I run the following, I only get 1 element returned:
> scala> orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) => { if (A._1.dstId == B._1.srcId) (Edge(A._1.srcId, B._1.dstId, 2), 1) else if (A._1.srcId == B._1.dstId) (Edge(B._1.srcId, A._1.dstId, 2), 1) else (Edge(0, 0, A._1.srcId.toInt*A._1.dstId.toInt*B._1.srcId.toInt*B._1.dstId.toInt), A._1.srcId.toInt*A._1.dstId.toInt*B._1.srcId.toInt*B._1.dstId.toInt) } )
> = (Edge(0,0,0),0)



--
This message was sent by Atlassian JIRA
(v6.2#6252)