You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/05/04 20:45:51 UTC

Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Hi,

Just wanted opinions on this.

In Spark streaming the parameter

val ssc = new StreamingContext(sparkConf, Seconds(n))

defines the batch or sample interval for the incoming streams

In addition there is windows Length

// window length - The duration of the window below that must be multiple
of batch interval n in = > StreamingContext(sparkConf, Seconds(n))

val windowLength = L

And fibally the sliding interval
// sliding interval - The interval at which the window operation is
performed

val slidingInterval = I

OK so as given the windowLength  L = multiples of n and the slidingInterval
has to be consistent to ensure that we can the head and tail of the window.

So as a heuristic approach for a batch interval of say 10 seconds, I put
the windows length at 3 times  that = 30 seconds and make the
slidinginterval = batch interval = 10.

Obviously these are subjective depending on what is being measured.
However, I believe having slidinginterval = batch interval makes sense?

Appreciate any views on this.

Thanks,

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks Ryan for the correction. Posted to the wrong user list :(



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 5 May 2016 at 19:35, Ryan Harris <Ry...@zionsbancorp.com> wrote:

> This is really outside of the scope of Hive and would probably be better
> addressed by the Spark community, however I can say that this very much
> depends on your use case....
>
>
>
> Take a look at this discussion if you haven't already:
>
> https://groups.google.com/forum/embed/#!topic/spark-users/GQoxJHAAtX4
>
>
>
> Generally speaking, the larger the batch window, the better the overall
> performance, but the streaming data output will be updated less
> frequently.....you will likely run into problems setting your batch window
> < 0.5 sec, and/or when the batch window < the amount of time it takes to
> run the task....
>
>
>
> Beyond that, the window length and sliding interval need to be multiples
> of the batch window, but will depend entirely on your reporting
> requirements.
>
>
>
> it would be perfectly reasonable to have
>
> batch window = 30 secs
>
> window length = 1 hour
>
> sliding interval = 5 mins
>
>
>
> In that case, you'd be creating an output every 5 mins, aggregating data
> that you were collecting every 30 seconds over a previous 1 hour period of
> time...
>
>
>
> could you set the batch window to 5 mins?  Possibly, depending on the data
> source, but perhaps you are already using that source on a more frequent
> basis elsewhere....or maybe you only have a 1 min buffer on the source
> data....lots of possibilities, which is why there is the flexibility and no
> hard/fast rule....
>
>
>
> If you were trying to create continuously streaming output as fast as
> possible, then you would probably (almost always) be setting your sliding
> interval = batch window and then shrinking the batch window as short as
> possible.
>
>
>
> More documentation here:
>
>
> https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html
>
>
>
>
>
>
>
> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
> *Sent:* Thursday, May 05, 2016 4:26 AM
> *To:* user
> *Subject:* Re: Spark Streaming, Batch interval, Windows length and
> Sliding Interval settings
>
>
>
> Any ideas/experience on this?
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
>
> On 4 May 2016 at 21:45, Mich Talebzadeh <mi...@gmail.com> wrote:
>
> Hi,
>
>
>
> Just wanted opinions on this.
>
>
>
> In Spark streaming the parameter
>
>
>
> val ssc = new StreamingContext(sparkConf, Seconds(n))
>
>
>
> defines the batch or sample interval for the incoming streams
>
>
>
> In addition there is windows Length
>
>
>
> // window length - The duration of the window below that must be multiple
> of batch interval n in = > StreamingContext(sparkConf, Seconds(n))
>
>
>
> val windowLength = L
>
>
>
> And fibally the sliding interval
>
> // sliding interval - The interval at which the window operation is
> performed
>
>
>
> val slidingInterval = I
>
>
>
> OK so as given the windowLength  L = multiples of n and the
> slidingInterval has to be consistent to ensure that we can the head and
> tail of the window.
>
>
>
> So as a heuristic approach for a batch interval of say 10 seconds, I put
> the windows length at 3 times  that = 30 seconds and make the
> slidinginterval = batch interval = 10.
>
>
>
> Obviously these are subjective depending on what is being measured.
> However, I believe having slidinginterval = batch interval makes sense?
>
>
>
> Appreciate any views on this.
>
>
>
> Thanks,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

RE: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Posted by Ryan Harris <Ry...@zionsbancorp.com>.
This is really outside of the scope of Hive and would probably be better addressed by the Spark community, however I can say that this very much depends on your use case....

Take a look at this discussion if you haven't already:
https://groups.google.com/forum/embed/#!topic/spark-users/GQoxJHAAtX4

Generally speaking, the larger the batch window, the better the overall performance, but the streaming data output will be updated less frequently.....you will likely run into problems setting your batch window < 0.5 sec, and/or when the batch window < the amount of time it takes to run the task....

Beyond that, the window length and sliding interval need to be multiples of the batch window, but will depend entirely on your reporting requirements.

it would be perfectly reasonable to have
batch window = 30 secs
window length = 1 hour
sliding interval = 5 mins

In that case, you'd be creating an output every 5 mins, aggregating data that you were collecting every 30 seconds over a previous 1 hour period of time...

could you set the batch window to 5 mins?  Possibly, depending on the data source, but perhaps you are already using that source on a more frequent basis elsewhere....or maybe you only have a 1 min buffer on the source data....lots of possibilities, which is why there is the flexibility and no hard/fast rule....

If you were trying to create continuously streaming output as fast as possible, then you would probably (almost always) be setting your sliding interval = batch window and then shrinking the batch window as short as possible.

More documentation here:
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html



From: Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
Sent: Thursday, May 05, 2016 4:26 AM
To: user
Subject: Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Any ideas/experience on this?


Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 4 May 2016 at 21:45, Mich Talebzadeh <mi...@gmail.com>> wrote:
Hi,

Just wanted opinions on this.

In Spark streaming the parameter

val ssc = new StreamingContext(sparkConf, Seconds(n))

defines the batch or sample interval for the incoming streams

In addition there is windows Length

// window length - The duration of the window below that must be multiple of batch interval n in = > StreamingContext(sparkConf, Seconds(n))

val windowLength = L

And fibally the sliding interval
// sliding interval - The interval at which the window operation is performed

val slidingInterval = I

OK so as given the windowLength  L = multiples of n and the slidingInterval has to be consistent to ensure that we can the head and tail of the window.

So as a heuristic approach for a batch interval of say 10 seconds, I put the windows length at 3 times  that = 30 seconds and make the slidinginterval = batch interval = 10.

Obviously these are subjective depending on what is being measured. However, I believe having slidinginterval = batch interval makes sense?

Appreciate any views on this.

Thanks,


Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>




======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately.  Thank you.

Re: Spark Streaming, Batch interval, Windows length and Sliding Interval settings

Posted by Mich Talebzadeh <mi...@gmail.com>.
Any ideas/experience on this?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 4 May 2016 at 21:45, Mich Talebzadeh <mi...@gmail.com> wrote:

> Hi,
>
> Just wanted opinions on this.
>
> In Spark streaming the parameter
>
> val ssc = new StreamingContext(sparkConf, Seconds(n))
>
> defines the batch or sample interval for the incoming streams
>
> In addition there is windows Length
>
> // window length - The duration of the window below that must be multiple
> of batch interval n in = > StreamingContext(sparkConf, Seconds(n))
>
> val windowLength = L
>
> And fibally the sliding interval
> // sliding interval - The interval at which the window operation is
> performed
>
> val slidingInterval = I
>
> OK so as given the windowLength  L = multiples of n and the
> slidingInterval has to be consistent to ensure that we can the head and
> tail of the window.
>
> So as a heuristic approach for a batch interval of say 10 seconds, I put
> the windows length at 3 times  that = 30 seconds and make the
> slidinginterval = batch interval = 10.
>
> Obviously these are subjective depending on what is being measured.
> However, I believe having slidinginterval = batch interval makes sense?
>
> Appreciate any views on this.
>
> Thanks,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>