You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tathagata Das <td...@databricks.com> on 2016/01/04 22:55:57 UTC

Re: Batch together RDDs for Streaming output, without delaying execution of map or transform functions

You could enforce the evaluation of the transformed DStream by putting a
dummy output operation on it, and then do the windowing.

transformedDStream.foreachRDD { _.count() }  // to enforce evaluation of
the trnasformation
transformedDStream.window(...).foreachRDD( rdd => ... }

On Thu, Dec 31, 2015 at 5:54 AM, Ewan Leith <ew...@realitymine.com>
wrote:

> Yeah it’s awkward, the transforms being done are fairly time sensitive, so
> I don’t want them to wait 60 seconds or more.
>
>
>
> I might have to move the code from a transform into a custom receiver
> instead, so they’ll be processed outside the window length. A buffered
> writer is a good idea too, thanks.
>
>
>
> Thanks,
>
> Ewan
>
>
>
> *From:* Ashic Mahtab [mailto:ashic@live.com]
> *Sent:* 31 December 2015 13:50
> *To:* Ewan Leith <ew...@realitymine.com>; Apache Spark <
> user@spark.apache.org>
> *Subject:* RE: Batch together RDDs for Streaming output, without delaying
> execution of map or transform functions
>
>
>
> Hi Ewan,
>
> Transforms are definitions of what needs to be done - they don't execute
> until and action is triggered. For what you want, I think you might need to
> have an action that writes out rdds to some sort of buffered writer.
>
>
>
> -Ashic.
> ------------------------------
>
> From: ewan.leith@realitymine.com
> To: user@spark.apache.org
> Subject: Batch together RDDs for Streaming output, without delaying
> execution of map or transform functions
> Date: Thu, 31 Dec 2015 11:35:37 +0000
>
> Hi all,
>
>
>
> I’m sure this must have been solved already, but I can’t see anything
> obvious.
>
>
>
> Using Spark Streaming, I’m trying to execute a transform function on a
> DStream at short batch intervals (e.g. 1 second), but only write the
> resulting data to disk using saveAsTextFiles in a larger batch after a
> longer delay (say 60 seconds).
>
>
>
> I thought the ReceiverInputDStream window function might be a good help
> here, but instead, applying it to a transformed DStream causes the
> transform function to only execute at the end of the window too.
>
>
>
> Has anyone got a solution to this?
>
>
>
> Thanks,
>
> Ewan
>
>
>
>
>
>
>