You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Yogs <ma...@gmail.com> on 2015/09/30 09:55:02 UTC

CQs on WindowedStream created on running StreamingContext

Hi,

We intend to run adhoc windowed continuous queries on spark streaming data.
The queries could be registered/deregistered dynamically or can be
submitted through command line. Currently Spark streaming doesn’t allow
adding any new inputs, transformations, and output operations after
starting a StreamingContext. But doing following code changes in
DStream.scala allows me to create an window on DStream even after
StreamingContext has started (in StreamingContextState.ACTIVE).

1) In DStream.validateAtInit()
Allowed adding new inputs, transformations, and output operations after
starting a streaming context
2) In DStream.persist()
Allowed to change storage level of an DStream after streaming context has
started

Ultimately the window api just does slice on the parentRDD and returns
allRDDsInWindow.
We create DataFrames out of these RDDs from this particular
WindowedDStream, and evaluate queries on those DataFrames.

1) Do you see any challenges and consequences with this approach ?
2) Will these on the fly created WindowedDStreams be accounted properly in
Runtime and memory management?
3) What is the reason we do not allow creating new windows with
StreamingContextState.ACTIVE state?
4) Does it make sense to add our own implementation of WindowedDStream in
this case?

- Yogesh

Re: CQs on WindowedStream created on running StreamingContext

Posted by Yogesh Mahajan <ma...@gmail.com>.

Anyone knows about this ? TD ?

-yogesh

> On 30-Sep-2015, at 1:25 pm, Yogs <ma...@gmail.com> wrote:
> 
> Hi, 
> 
> We intend to run adhoc windowed continuous queries on spark streaming data. The queries could be registered/deregistered dynamically or can be submitted through command line. Currently Spark streaming doesn’t allow adding any new inputs, transformations, and output operations after starting a StreamingContext. But doing following code changes in DStream.scala allows me to create an window on DStream even after StreamingContext has started (in StreamingContextState.ACTIVE). 
> 
> 1) In DStream.validateAtInit()
> Allowed adding new inputs, transformations, and output operations after starting a streaming context
> 2) In DStream.persist()
> Allowed to change storage level of an DStream after streaming context has started
> 
> Ultimately the window api just does slice on the parentRDD and returns allRDDsInWindow.
> We create DataFrames out of these RDDs from this particular WindowedDStream, and evaluate queries on those DataFrames. 
> 
> 1) Do you see any challenges and consequences with this approach ? 
> 2) Will these on the fly created WindowedDStreams be accounted properly in Runtime and memory management?
> 3) What is the reason we do not allow creating new windows with StreamingContextState.ACTIVE state?
> 4) Does it make sense to add our own implementation of WindowedDStream in this case?
> 
> - Yogesh 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org