You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by GlennStrycker <gl...@gmail.com> on 2014/05/16 22:28:21 UTC

reduce only removes duplicates, cannot be arbitrary function

I am attempting to write a mapreduce job on a graph object to take an edge
list and return a new edge list.  Unfortunately I find that the current
function is

def reduce(f: (T, T) => T): T

not

def reduce(f: (T1, T2) => T3): T


I see this because the following 2 commands give different results for the
final number, which should be the same (tempMappedRDD is a MappedRDD of the
form (Edge,1), and I found the the A and B here are (1,4) and (7,3) )

tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, A._1.dstId,
A._1.dstId.toInt), 1) )  // (Edge(1,4,4),1)
tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, B._1.dstId,
A._1.dstId.toInt), 1) )  // (Edge(1,3,3),1)

why is the 3rd digit above a '3' in the second line, and not a '4'?  Does it
have something to do with toInt?

the really weird thing is that it is only for A, since the following
commands work correctly:

tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, B._1.dstId,
B._1.dstId.toInt), 1) )  // (Edge(7,3,3),1)
tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, A._1.dstId,
B._1.dstId.toInt), 1) )  // (Edge(7,4,3),1)




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-tp6606.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.