You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by salemi <al...@udo.edu> on 2014/09/02 23:54:49 UTC

Spark Streaming - how to implement multiple calculation using the same data set

Hi,

I am planing to use a incoming DStream and calculate different measures from
the same stream.

I was able to calculate the individual measures separately and know I have
to merge them and spark streaming doesn't support outer join yet.


handlingtimePerWorker List(workerId, hanlingTime)
fileProcessedCountPerWorker (workerId, filesProcessedCount)

Is there a design pattern that allows to use each RDD in the DStream and
calculate the measures for the worker and save the attributes in the same
object (Worker).





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-how-to-implement-multiple-calculation-using-the-same-data-set-tp13306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming - how to implement multiple calculation using the same data set

Posted by Alireza Salemi <al...@udo.edu>.
Tobias,

That was what I was planing to do and technical lead is the opinion that
we should some how process a message only once and calculate all the
measures for the worker.

I was wondering if there is a solution out there for that?

Thanks,
Ali

> Hi,
>
> On Wed, Sep 3, 2014 at 6:54 AM, salemi <al...@udo.edu> wrote:
>
>> I was able to calculate the individual measures separately and know I
>> have
>> to merge them and spark streaming doesn't support outer join yet.
>>
>
> Can't you assign some dummy key (e.g., index) before your processing and
> then join on that key using a function from
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
> ?
>
> Tobias
>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming - how to implement multiple calculation using the same data set

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

On Wed, Sep 3, 2014 at 6:54 AM, salemi <al...@udo.edu> wrote:

> I was able to calculate the individual measures separately and know I have
> to merge them and spark streaming doesn't support outer join yet.
>

Can't you assign some dummy key (e.g., index) before your processing and
then join on that key using a function from
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
?

Tobias