You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2015/07/31 22:52:44 UTC

What happens when you create more DStreams then nodes in the cluster?

Since one input dstream creates one receiver and one receiver uses one
executor / node.

What happens if you create more Dstreams than nodes in the cluster?

Say I have 30 Dstreams on a 15 node cluster.

Will ~10 streams get assigned to ~10 executors / nodes then the other ~20
streams will be queued for resources or will the other streams just fail
and never run?

Re: What happens when you create more DStreams then nodes in the cluster?

Posted by Ashwin Giridharan <as...@gmail.com>.
@Brandon Each node can host multiple executors. For example, In a 15 node
cluster, if your NodeManager ( In YARN) or equivalent ( MESOS or
Standalone), runs on each of this node and if the node has enough resources
to host say 5 executors, then in total you can have 15*5 executors and each
of this executor can have a DStream Receiver.

But be aware that each of the DStream receiver uses a dedicated core. "the
number of cores allocated to the Spark Streaming application must be more
than the number of receivers. Otherwise the system will receive data, but
not be able to process it"

Thanks,
Ashwin

On Fri, Jul 31, 2015 at 4:52 PM, Brandon White <bw...@gmail.com>
wrote:

> Since one input dstream creates one receiver and one receiver uses one
> executor / node.
>
> What happens if you create more Dstreams than nodes in the cluster?
>
> Say I have 30 Dstreams on a 15 node cluster.
>
> Will ~10 streams get assigned to ~10 executors / nodes then the other ~20
> streams will be queued for resources or will the other streams just fail
> and never run?
>



-- 
Thanks & Regards,
Ashwin Giridharan