You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Yogesh Mahajan <ym...@snappydata.io> on 2018/01/31 18:24:51 UTC

Max number of streams supported ?

Hi,

Is there a theoretical limit on number of input streams I can register on a
streaming context? Looking at the code, there is no limit on number of
inputdstreams you can add in a DStreamGraph. It’s just adds into an
ArrayBuffer. I need to know what could be overhead if I have hundreds of
inputdstreams in my application?

Similarly for structured streaming, Would there be any limit on number of
of streaming sources I can have ?

Thanks in advance.
-- 
Sent from my iPhone

Re: Max number of streams supported ?

Posted by Yogesh Mahajan <ym...@snappydata.io>.
Thanks Michael, TD for quick reply. It was helpful. I will let you know the
numbers(limit) based on my experiments.

On Wed, Jan 31, 2018 at 3:10 PM, Tathagata Das <ta...@gmail.com>
wrote:

> Just to clarify a subtle difference between DStreams and Structured
> Streaming. Multiple input streams in a DStreamGraph is likely to mean they
> are all being processed/computed in the same way as there can be only one
> streaming query / context active in the StreamingContext. However, in the
> case of Structured Streaming, there can be any number of independent
> streaming queries (i.e. different computations), and each streaming query
> with any number if separate input sources. So Michael's comment of "each
> stream will have a thread on the driver" is correct when there are many
> independent queries with different computations simultaneously running.
> However if all your streams need to be processed in the same way, then its
> one streaming query with many inputs, and will require one thread.
>
> Hope this helps.
>
> TD
>
> On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrust <michael@databricks.com
> > wrote:
>
>> -dev +user
>>
>>
>>> Similarly for structured streaming, Would there be any limit on number
>>> of of streaming sources I can have ?
>>>
>>
>> There is no fundamental limit, but each stream will have a thread on the
>> driver that is doing coordination of execution.  We comfortably run 20+
>> streams on a single cluster in production, but I have not pushed the
>> limits.  You'd want to test with your specific application.
>>
>
>

Re: Max number of streams supported ?

Posted by Tathagata Das <ta...@gmail.com>.
Just to clarify a subtle difference between DStreams and Structured
Streaming. Multiple input streams in a DStreamGraph is likely to mean they
are all being processed/computed in the same way as there can be only one
streaming query / context active in the StreamingContext. However, in the
case of Structured Streaming, there can be any number of independent
streaming queries (i.e. different computations), and each streaming query
with any number if separate input sources. So Michael's comment of "each
stream will have a thread on the driver" is correct when there are many
independent queries with different computations simultaneously running.
However if all your streams need to be processed in the same way, then its
one streaming query with many inputs, and will require one thread.

Hope this helps.

TD

On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> -dev +user
>
>
>> Similarly for structured streaming, Would there be any limit on number of
>> of streaming sources I can have ?
>>
>
> There is no fundamental limit, but each stream will have a thread on the
> driver that is doing coordination of execution.  We comfortably run 20+
> streams on a single cluster in production, but I have not pushed the
> limits.  You'd want to test with your specific application.
>

Re: Max number of streams supported ?

Posted by Michael Armbrust <mi...@databricks.com>.
-dev +user


> Similarly for structured streaming, Would there be any limit on number of
> of streaming sources I can have ?
>

There is no fundamental limit, but each stream will have a thread on the
driver that is doing coordination of execution.  We comfortably run 20+
streams on a single cluster in production, but I have not pushed the
limits.  You'd want to test with your specific application.

Re: Max number of streams supported ?

Posted by Michael Armbrust <mi...@databricks.com>.
-dev +user


> Similarly for structured streaming, Would there be any limit on number of
> of streaming sources I can have ?
>

There is no fundamental limit, but each stream will have a thread on the
driver that is doing coordination of execution.  We comfortably run 20+
streams on a single cluster in production, but I have not pushed the
limits.  You'd want to test with your specific application.