You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Darren Hoo <da...@gmail.com> on 2015/03/20 18:37:00 UTC

can distinct transform applied on DStream?

val aDstream = ...

val distinctStream = aDstream.transform(_.distinct())

but the elements in distinctStream  are not distinct.

Did I use it wrong?

Re: can distinct transform applied on DStream?

Posted by Dean Wampler <de...@gmail.com>.
aDstream.transform(_.distinct())  will only make the elements of each RDD
in the DStream distinct, not for the whole DStream globally. Is that what
you're seeing?

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Fri, Mar 20, 2015 at 10:37 AM, Darren Hoo <da...@gmail.com> wrote:

> val aDstream = ...
>
> val distinctStream = aDstream.transform(_.distinct())
>
> but the elements in distinctStream  are not distinct.
>
> Did I use it wrong?
>

Re: can distinct transform applied on DStream?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
What do you mean not distinct?

It does works for me:
[image: Inline image 1]

Code:

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}

val ssc = new StreamingContext(sc, Seconds(1))

val data =
ssc.textFileStream("/home/akhld/mobi/localcluster/spark-1/sigmoid/")
val dist = data.transform(_.distinct())


dist.print()

ssc.start()
ssc.awaitTermination()






Thanks
Best Regards

On Fri, Mar 20, 2015 at 11:07 PM, Darren Hoo <da...@gmail.com> wrote:

> val aDstream = ...
>
> val distinctStream = aDstream.transform(_.distinct())
>
> but the elements in distinctStream  are not distinct.
>
> Did I use it wrong?
>