You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Anqi Chen <ax...@icloud.com> on 2017/05/04 00:07:09 UTC

Unable to set aggregate user defined case classes

Hi,

I’m observing a weird behavior in zeppelin %spark. I have the following in my paragraph:

case class Test(a: Int, b: Int)
val a = Test(1,2)
val b = Test(1,2)
val c = Test(2,3)
val l = List(a,b,c)
val rdd = spark.sparkContext.parallelize(l)
rdd.map(v => (1, v)).aggregateByKey(scala.collection.mutable.HashSet.empty[Test])((result, item) => result + item, (result1, result2) => result1 ++ result2).collect()

I would expect the result to be Array((1, Set(Test(1,2), Test(2,3)))), however, I’m actually seeing Array(1, Set(Test(1,2), Test(1,2), Test(2,3)))). Why is spark unable to set aggregate on my case class? Is this a zeppelin issue or spark problem?

I have confirmed that doing Set(a,b,c) in scala REPL returns back Set(Test(1,2), Test(2,3)), as expected.

Thanks,
Anqi