You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Adrian Mocanu <am...@verticalscope.com> on 2014/03/24 17:44:59 UTC

remove duplicates

I have a DStream like this:
..RDD[a,b],RDD[b,c]..

Is there a way to remove duplicates across the entire DStream? Ie: I would like the output to be (by removing one of the b's):
..RDD[a],RDD[b,c]..  or ..RDD[a,b],RDD[c]..

Thanks
-Adrian