You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Darren Hoo <da...@gmail.com> on 2015/03/20 18:37:00 UTC
can distinct transform applied on DStream?
val aDstream = ...
val distinctStream = aDstream.transform(_.distinct())
but the elements in distinctStream are not distinct.
Did I use it wrong?
Re: can distinct transform applied on DStream?
Posted by Dean Wampler <de...@gmail.com>.
aDstream.transform(_.distinct()) will only make the elements of each RDD
in the DStream distinct, not for the whole DStream globally. Is that what
you're seeing?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Fri, Mar 20, 2015 at 10:37 AM, Darren Hoo <da...@gmail.com> wrote:
> val aDstream = ...
>
> val distinctStream = aDstream.transform(_.distinct())
>
> but the elements in distinctStream are not distinct.
>
> Did I use it wrong?
>
Re: can distinct transform applied on DStream?
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
What do you mean not distinct?
It does works for me:
[image: Inline image 1]
Code:
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}
val ssc = new StreamingContext(sc, Seconds(1))
val data =
ssc.textFileStream("/home/akhld/mobi/localcluster/spark-1/sigmoid/")
val dist = data.transform(_.distinct())
dist.print()
ssc.start()
ssc.awaitTermination()
Thanks
Best Regards
On Fri, Mar 20, 2015 at 11:07 PM, Darren Hoo <da...@gmail.com> wrote:
> val aDstream = ...
>
> val distinctStream = aDstream.transform(_.distinct())
>
> but the elements in distinctStream are not distinct.
>
> Did I use it wrong?
>