You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by salemi <al...@udo.edu> on 2014/09/02 23:54:49 UTC
Spark Streaming - how to implement multiple calculation using the
same data set
Hi,
I am planing to use a incoming DStream and calculate different measures from
the same stream.
I was able to calculate the individual measures separately and know I have
to merge them and spark streaming doesn't support outer join yet.
handlingtimePerWorker List(workerId, hanlingTime)
fileProcessedCountPerWorker (workerId, filesProcessedCount)
Is there a design pattern that allows to use each RDD in the DStream and
calculate the measures for the worker and save the attributes in the same
object (Worker).
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-how-to-implement-multiple-calculation-using-the-same-data-set-tp13306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark Streaming - how to implement multiple calculation using
the same data set
Posted by Alireza Salemi <al...@udo.edu>.
Tobias,
That was what I was planing to do and technical lead is the opinion that
we should some how process a message only once and calculate all the
measures for the worker.
I was wondering if there is a solution out there for that?
Thanks,
Ali
> Hi,
>
> On Wed, Sep 3, 2014 at 6:54 AM, salemi <al...@udo.edu> wrote:
>
>> I was able to calculate the individual measures separately and know I
>> have
>> to merge them and spark streaming doesn't support outer join yet.
>>
>
> Can't you assign some dummy key (e.g., index) before your processing and
> then join on that key using a function from
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
> ?
>
> Tobias
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark Streaming - how to implement multiple calculation using the
same data set
Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,
On Wed, Sep 3, 2014 at 6:54 AM, salemi <al...@udo.edu> wrote:
> I was able to calculate the individual measures separately and know I have
> to merge them and spark streaming doesn't support outer join yet.
>
Can't you assign some dummy key (e.g., index) before your processing and
then join on that key using a function from
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
?
Tobias