You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by HARIPRIYA AYYALASOMAYAJULA <ah...@gmail.com> on 2014/11/07 00:38:30 UTC
job works well on small data set but fails on large data set
Hello all,
I am running the following operations:
val part1= maOutput.toArray.flatten
val part2 = sc.parallelize(part1)
val reduceOutput = part2.combineByKey(
(v) => (v, 1),
(acc: (Double, Int), v) => ( acc._1 + v, acc._2 + 1),
(acc1: (Double, Int), acc2: (Double, Int)) => (acc1._1 +
acc2._1, acc1._2 + acc2._2)
)
while mapOutput is an output of map function which is a tuple of (x,y)
where y is a Double value and x is a tuple of 4 strings. When I used float
instead of Double, it worked with small data set but failed on the large
file.
I changed it to Double and on the large file it works till I get the
mapOutput. But when I include the remaining part , it fails.
Can someone please help me understand where I am going wrong?
Thank you for your time.
--
Regards,
Haripriya Ayyalasomayajula
Graduate Student
Department of Computer Science
University of Houston
Contact : 650-796-7112