You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aniket Bhatnagar <an...@gmail.com> on 2014/09/01 14:25:00 UTC

[Streaming] Triggering an action in absence of data

Hi all

I am struggling to implement a use case wherein I need to trigger an action
in case no data has been received for X amount of time. I haven't been able
to figure out an easy way to do this. No state/foreach methods get called
when no data has arrived. I thought of generating a 'tick' DStream that
generates an arbitrary object and union/group the tick stream with data
stream to detect that data hasn't arrived for X amount of time. However,
since my data DStream is Paired (has key-value tuple) and I use
updateStateByKey method for processing the data stream, I can't group/union
it with tick stream(s) without knowing all keys in advance.

My second idea was to push data from DStream to an actor and let actor (per
key) manage state and data absent use cases. However, there is no way to
run an actor continuously for all data belonging to a key or a partition.

I am stuck now and can't think of anything else to solve for the use case.
Has anyone else ran into similar issue? Any thoughts on how the use case
could be implemented in Spark streaming?

Thanks,
Aniket

Re: [Streaming] Triggering an action in absence of data

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,


On Mon, Sep 1, 2014 at 9:25 PM, Aniket Bhatnagar <aniket.bhatnagar@gmail.com
> wrote:
>
> No state/foreach methods get called when no data has arrived.
>

Have you double-checked this? I am pretty sure that, for example,
foreachRDD gets called (with an empty RDD) even when there was no data
received.

Tobias