You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2014/11/11 22:19:53 UTC

groupBy for DStream

Hi.

1) I dont see a groupBy() method for a DStream object. Not sure why that is
not supported. Currently I am using filter () to separate out the different
groups. I would like to know if there is a way to convert a DStream object
to a regular RDD so that I can apply the RDD methods like groupBy.


2) The count() method for a DStream object returns a DStream[Long] instead
of a simple Long (like RDD does). How can I extract the simple Long count
value? I tried dstream(0) but got a compilation error that it does not take
parameters. I also tried dstream[0], but that also resulted in a compilation
error. I am not able to use the head() or take(0) method for DStream either.

thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: groupBy for DStream

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

1. Use foreachRDD over the dstream and on the each rdd you can call the
groupBy()

2. DStream.count() Return a new DStream in which each RDD has a single
element generated by counting each RDD of this DStream.

Thanks
Best Regards

On Wed, Nov 12, 2014 at 2:49 AM, SK <sk...@gmail.com> wrote:

>
> Hi.
>
> 1) I dont see a groupBy() method for a DStream object. Not sure why that is
> not supported. Currently I am using filter () to separate out the different
> groups. I would like to know if there is a way to convert a DStream object
> to a regular RDD so that I can apply the RDD methods like groupBy.
>
>
> 2) The count() method for a DStream object returns a DStream[Long] instead
> of a simple Long (like RDD does). How can I extract the simple Long count
> value? I tried dstream(0) but got a compilation error that it does not take
> parameters. I also tried dstream[0], but that also resulted in a
> compilation
> error. I am not able to use the head() or take(0) method for DStream
> either.
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: groupBy for DStream

Posted by Sean Owen <so...@cloudera.com>.

A DStream is a sequence of RDDs. Just groupBy each RDD.
Likewise, count() does not return a count over all history. It returns a
count of each RDD in the stream, not one count.

You can head or take an RDD in the stream, but it doesn't make as much
sense to talk about the first element of the entire stream. It may be long
since gone before the streaming operation started

On Tue, Nov 11, 2014 at 9:19 PM, SK <sk...@gmail.com> wrote:

>
> Hi.
>
> 1) I dont see a groupBy() method for a DStream object. Not sure why that is
> not supported. Currently I am using filter () to separate out the different
> groups. I would like to know if there is a way to convert a DStream object
> to a regular RDD so that I can apply the RDD methods like groupBy.
>
>
> 2) The count() method for a DStream object returns a DStream[Long] instead
> of a simple Long (like RDD does). How can I extract the simple Long count
> value? I tried dstream(0) but got a compilation error that it does not take
> parameters. I also tried dstream[0], but that also resulted in a
> compilation
> error. I am not able to use the head() or take(0) method for DStream
> either.
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>