You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yan Fang <ya...@gmail.com> on 2014/07/10 20:59:52 UTC

How are the executors used in Spark Streaming in terms of receiver and driver program?

Hi all,

I am working to improve the parallelism of the Spark Streaming application.
But I have problem in understanding how the executors are used and the
application is distributed.

1. In YARN, is one executor equal one container?

2. I saw the statement that a streaming receiver runs on one work machine (
*"n**ote that each input DStream creates a single receiver (running on a
worker machine) that receives a single stream of data"*). Does the "work
machine" mean the executor or physical machine? If I have more receivers
than the executors, will it still work?

3. Is the executor that holds receiver also used for other operations, such
as map, reduce, or fully occupied by the receiver? Similarly, if I run in
yarn-cluster mode, is the executor running driver program used by other
operations too?

4. So if I have a driver program (cluster mode) and streaming receiver, do
I have to have at least 2 executors because the program and streaming
receiver have to be on different executors?

Thank you. Sorry for having so many questions but I do want to understand
how the Spark Streaming distributes in order to assign reasonable
recourse.*_* Thank you again.

Best,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Yan Fang <ya...@gmail.com>.
Thank you, Tathagata. That explains.

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 7:21 PM, Tathagata Das <ta...@gmail.com>
wrote:

> Task slot is equivalent to core number. So one core can only run one task
> at a time.
>
> TD
>
>
> On Fri, Jul 11, 2014 at 1:57 PM, Yan Fang <ya...@gmail.com> wrote:
>
>> Hi Tathagata,
>>
>> Thank you. Is task slot equivalent to the core number? Or actually one
>> core can run multiple tasks at the same time?
>>
>> Best,
>>
>> Fang, Yan
>> yanfang724@gmail.com
>> +1 (206) 849-4108
>>
>>
>> On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das <
>> tathagata.das1565@gmail.com> wrote:
>>
>>> The same executor can be used for both receiving and processing,
>>> irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
>>> down to the number of cores / task slots that executor has. Each receiver
>>> is like a long running task, so each of them occupy a slot. If there are
>>> free slots in the executor then other tasks can be run on them.
>>>
>>> So if you are finding that the other tasks are being run, check how many
>>> cores/task slots the executor has and whether there are more task slots
>>> than the number of input dstream / receivers you are launching.
>>>
>>> @Praveen  your answers were pretty much spot on, thanks for chipping in!
>>>
>>>
>>>
>>>
>>> On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <ya...@gmail.com> wrote:
>>>
>>>> Hi Praveen,
>>>>
>>>> Thank you for the answer. That's interesting because if I only bring up
>>>> one executor for the Spark Streaming, it seems only the receiver is
>>>> working, no other tasks are happening, by checking the log and UI. Maybe
>>>> it's just because the receiving task eats all the resource?, not because
>>>> one executor can only run one receiver?
>>>>
>>>> Fang, Yan
>>>> yanfang724@gmail.com
>>>> +1 (206) 849-4108
>>>>
>>>>
>>>> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <ps...@qubole.com>
>>>> wrote:
>>>>
>>>>> Here are my answers. But am just getting started with Spark Streaming
>>>>> - so please correct me if am wrong.
>>>>> 1) Yes
>>>>> 2) Receivers will run on executors. Its actually a job thats submitted
>>>>> where # of tasks equals # of receivers. An executor can actually run more
>>>>> than one task at the same time. Hence you could have more number of
>>>>> receivers than executors but its not recommended I think.
>>>>> 3) As said in 2, the executor where receiver task is running can be
>>>>> used for map/reduce tasks. In yarn-cluster mode, the driver program is
>>>>> actually run as application master (lives in the first container thats
>>>>> launched) and this is not an executor - hence its not used for other
>>>>> operations.
>>>>> 4) the driver runs in a separate container. I think the same executor
>>>>> can be used for receiver and the processing task also (this part am not
>>>>> very sure)
>>>>>
>>>>>
>>>>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am working to improve the parallelism of the Spark Streaming
>>>>>> application. But I have problem in understanding how the executors are used
>>>>>> and the application is distributed.
>>>>>>
>>>>>> 1. In YARN, is one executor equal one container?
>>>>>>
>>>>>> 2. I saw the statement that a streaming receiver runs on one work
>>>>>> machine (*"n**ote that each input DStream creates a single receiver
>>>>>> (running on a worker machine) that receives a single stream of data"*
>>>>>> ). Does the "work machine" mean the executor or physical machine? If
>>>>>> I have more receivers than the executors, will it still work?
>>>>>>
>>>>>> 3. Is the executor that holds receiver also used for other
>>>>>> operations, such as map, reduce, or fully occupied by the receiver?
>>>>>> Similarly, if I run in yarn-cluster mode, is the executor running driver
>>>>>> program used by other operations too?
>>>>>>
>>>>>> 4. So if I have a driver program (cluster mode) and streaming
>>>>>> receiver, do I have to have at least 2 executors because the program and
>>>>>> streaming receiver have to be on different executors?
>>>>>>
>>>>>> Thank you. Sorry for having so many questions but I do want to
>>>>>> understand how the Spark Streaming distributes in order to assign
>>>>>> reasonable recourse.*_* Thank you again.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Fang, Yan
>>>>>> yanfang724@gmail.com
>>>>>> +1 (206) 849-4108
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Tathagata Das <ta...@gmail.com>.
Task slot is equivalent to core number. So one core can only run one task
at a time.

TD


On Fri, Jul 11, 2014 at 1:57 PM, Yan Fang <ya...@gmail.com> wrote:

> Hi Tathagata,
>
> Thank you. Is task slot equivalent to the core number? Or actually one
> core can run multiple tasks at the same time?
>
> Best,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>
>
> On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> The same executor can be used for both receiving and processing,
>> irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
>> down to the number of cores / task slots that executor has. Each receiver
>> is like a long running task, so each of them occupy a slot. If there are
>> free slots in the executor then other tasks can be run on them.
>>
>> So if you are finding that the other tasks are being run, check how many
>> cores/task slots the executor has and whether there are more task slots
>> than the number of input dstream / receivers you are launching.
>>
>> @Praveen  your answers were pretty much spot on, thanks for chipping in!
>>
>>
>>
>>
>> On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <ya...@gmail.com> wrote:
>>
>>> Hi Praveen,
>>>
>>> Thank you for the answer. That's interesting because if I only bring up
>>> one executor for the Spark Streaming, it seems only the receiver is
>>> working, no other tasks are happening, by checking the log and UI. Maybe
>>> it's just because the receiving task eats all the resource?, not because
>>> one executor can only run one receiver?
>>>
>>> Fang, Yan
>>> yanfang724@gmail.com
>>> +1 (206) 849-4108
>>>
>>>
>>> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <ps...@qubole.com>
>>> wrote:
>>>
>>>> Here are my answers. But am just getting started with Spark Streaming -
>>>> so please correct me if am wrong.
>>>> 1) Yes
>>>> 2) Receivers will run on executors. Its actually a job thats submitted
>>>> where # of tasks equals # of receivers. An executor can actually run more
>>>> than one task at the same time. Hence you could have more number of
>>>> receivers than executors but its not recommended I think.
>>>> 3) As said in 2, the executor where receiver task is running can be
>>>> used for map/reduce tasks. In yarn-cluster mode, the driver program is
>>>> actually run as application master (lives in the first container thats
>>>> launched) and this is not an executor - hence its not used for other
>>>> operations.
>>>> 4) the driver runs in a separate container. I think the same executor
>>>> can be used for receiver and the processing task also (this part am not
>>>> very sure)
>>>>
>>>>
>>>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am working to improve the parallelism of the Spark Streaming
>>>>> application. But I have problem in understanding how the executors are used
>>>>> and the application is distributed.
>>>>>
>>>>> 1. In YARN, is one executor equal one container?
>>>>>
>>>>> 2. I saw the statement that a streaming receiver runs on one work
>>>>> machine (*"n**ote that each input DStream creates a single receiver
>>>>> (running on a worker machine) that receives a single stream of data"*).
>>>>> Does the "work machine" mean the executor or physical machine? If I have
>>>>> more receivers than the executors, will it still work?
>>>>>
>>>>> 3. Is the executor that holds receiver also used for other operations,
>>>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>>>>> in yarn-cluster mode, is the executor running driver program used by other
>>>>> operations too?
>>>>>
>>>>> 4. So if I have a driver program (cluster mode) and streaming
>>>>> receiver, do I have to have at least 2 executors because the program and
>>>>> streaming receiver have to be on different executors?
>>>>>
>>>>> Thank you. Sorry for having so many questions but I do want to
>>>>> understand how the Spark Streaming distributes in order to assign
>>>>> reasonable recourse.*_* Thank you again.
>>>>>
>>>>> Best,
>>>>>
>>>>> Fang, Yan
>>>>> yanfang724@gmail.com
>>>>> +1 (206) 849-4108
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Yan Fang <ya...@gmail.com>.
Hi Tathagata,

Thank you. Is task slot equivalent to the core number? Or actually one core
can run multiple tasks at the same time?

Best,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 1:45 PM, Tathagata Das <ta...@gmail.com>
wrote:

> The same executor can be used for both receiving and processing,
> irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
> down to the number of cores / task slots that executor has. Each receiver
> is like a long running task, so each of them occupy a slot. If there are
> free slots in the executor then other tasks can be run on them.
>
> So if you are finding that the other tasks are being run, check how many
> cores/task slots the executor has and whether there are more task slots
> than the number of input dstream / receivers you are launching.
>
> @Praveen  your answers were pretty much spot on, thanks for chipping in!
>
>
>
>
> On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <ya...@gmail.com> wrote:
>
>> Hi Praveen,
>>
>> Thank you for the answer. That's interesting because if I only bring up
>> one executor for the Spark Streaming, it seems only the receiver is
>> working, no other tasks are happening, by checking the log and UI. Maybe
>> it's just because the receiving task eats all the resource?, not because
>> one executor can only run one receiver?
>>
>> Fang, Yan
>> yanfang724@gmail.com
>> +1 (206) 849-4108
>>
>>
>> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <ps...@qubole.com>
>> wrote:
>>
>>> Here are my answers. But am just getting started with Spark Streaming -
>>> so please correct me if am wrong.
>>> 1) Yes
>>> 2) Receivers will run on executors. Its actually a job thats submitted
>>> where # of tasks equals # of receivers. An executor can actually run more
>>> than one task at the same time. Hence you could have more number of
>>> receivers than executors but its not recommended I think.
>>> 3) As said in 2, the executor where receiver task is running can be used
>>> for map/reduce tasks. In yarn-cluster mode, the driver program is actually
>>> run as application master (lives in the first container thats launched) and
>>> this is not an executor - hence its not used for other operations.
>>> 4) the driver runs in a separate container. I think the same executor
>>> can be used for receiver and the processing task also (this part am not
>>> very sure)
>>>
>>>
>>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am working to improve the parallelism of the Spark Streaming
>>>> application. But I have problem in understanding how the executors are used
>>>> and the application is distributed.
>>>>
>>>> 1. In YARN, is one executor equal one container?
>>>>
>>>> 2. I saw the statement that a streaming receiver runs on one work
>>>> machine (*"n**ote that each input DStream creates a single receiver
>>>> (running on a worker machine) that receives a single stream of data"*).
>>>> Does the "work machine" mean the executor or physical machine? If I have
>>>> more receivers than the executors, will it still work?
>>>>
>>>> 3. Is the executor that holds receiver also used for other operations,
>>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>>>> in yarn-cluster mode, is the executor running driver program used by other
>>>> operations too?
>>>>
>>>> 4. So if I have a driver program (cluster mode) and streaming receiver,
>>>> do I have to have at least 2 executors because the program and streaming
>>>> receiver have to be on different executors?
>>>>
>>>> Thank you. Sorry for having so many questions but I do want to
>>>> understand how the Spark Streaming distributes in order to assign
>>>> reasonable recourse.*_* Thank you again.
>>>>
>>>> Best,
>>>>
>>>> Fang, Yan
>>>> yanfang724@gmail.com
>>>> +1 (206) 849-4108
>>>>
>>>
>>>
>>
>

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Tathagata Das <ta...@gmail.com>.
The same executor can be used for both receiving and processing,
irrespective of the deployment mode (yarn, spark standalone, etc.) It boils
down to the number of cores / task slots that executor has. Each receiver
is like a long running task, so each of them occupy a slot. If there are
free slots in the executor then other tasks can be run on them.

So if you are finding that the other tasks are being run, check how many
cores/task slots the executor has and whether there are more task slots
than the number of input dstream / receivers you are launching.

@Praveen  your answers were pretty much spot on, thanks for chipping in!




On Fri, Jul 11, 2014 at 11:16 AM, Yan Fang <ya...@gmail.com> wrote:

> Hi Praveen,
>
> Thank you for the answer. That's interesting because if I only bring up
> one executor for the Spark Streaming, it seems only the receiver is
> working, no other tasks are happening, by checking the log and UI. Maybe
> it's just because the receiving task eats all the resource?, not because
> one executor can only run one receiver?
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>
>
> On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <ps...@qubole.com>
> wrote:
>
>> Here are my answers. But am just getting started with Spark Streaming -
>> so please correct me if am wrong.
>> 1) Yes
>> 2) Receivers will run on executors. Its actually a job thats submitted
>> where # of tasks equals # of receivers. An executor can actually run more
>> than one task at the same time. Hence you could have more number of
>> receivers than executors but its not recommended I think.
>> 3) As said in 2, the executor where receiver task is running can be used
>> for map/reduce tasks. In yarn-cluster mode, the driver program is actually
>> run as application master (lives in the first container thats launched) and
>> this is not an executor - hence its not used for other operations.
>> 4) the driver runs in a separate container. I think the same executor can
>> be used for receiver and the processing task also (this part am not very
>> sure)
>>
>>
>>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am working to improve the parallelism of the Spark Streaming
>>> application. But I have problem in understanding how the executors are used
>>> and the application is distributed.
>>>
>>> 1. In YARN, is one executor equal one container?
>>>
>>> 2. I saw the statement that a streaming receiver runs on one work
>>> machine (*"n**ote that each input DStream creates a single receiver
>>> (running on a worker machine) that receives a single stream of data"*).
>>> Does the "work machine" mean the executor or physical machine? If I have
>>> more receivers than the executors, will it still work?
>>>
>>> 3. Is the executor that holds receiver also used for other operations,
>>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>>> in yarn-cluster mode, is the executor running driver program used by other
>>> operations too?
>>>
>>> 4. So if I have a driver program (cluster mode) and streaming receiver,
>>> do I have to have at least 2 executors because the program and streaming
>>> receiver have to be on different executors?
>>>
>>> Thank you. Sorry for having so many questions but I do want to
>>> understand how the Spark Streaming distributes in order to assign
>>> reasonable recourse.*_* Thank you again.
>>>
>>> Best,
>>>
>>> Fang, Yan
>>> yanfang724@gmail.com
>>> +1 (206) 849-4108
>>>
>>
>>
>

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Yan Fang <ya...@gmail.com>.
Hi Praveen,

Thank you for the answer. That's interesting because if I only bring up one
executor for the Spark Streaming, it seems only the receiver is working, no
other tasks are happening, by checking the log and UI. Maybe it's just
because the receiving task eats all the resource?, not because one executor
can only run one receiver?

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Fri, Jul 11, 2014 at 6:06 AM, Praveen Seluka <ps...@qubole.com> wrote:

> Here are my answers. But am just getting started with Spark Streaming - so
> please correct me if am wrong.
> 1) Yes
> 2) Receivers will run on executors. Its actually a job thats submitted
> where # of tasks equals # of receivers. An executor can actually run more
> than one task at the same time. Hence you could have more number of
> receivers than executors but its not recommended I think.
> 3) As said in 2, the executor where receiver task is running can be used
> for map/reduce tasks. In yarn-cluster mode, the driver program is actually
> run as application master (lives in the first container thats launched) and
> this is not an executor - hence its not used for other operations.
> 4) the driver runs in a separate container. I think the same executor can
> be used for receiver and the processing task also (this part am not very
> sure)
>
>
>  On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am working to improve the parallelism of the Spark Streaming
>> application. But I have problem in understanding how the executors are used
>> and the application is distributed.
>>
>> 1. In YARN, is one executor equal one container?
>>
>> 2. I saw the statement that a streaming receiver runs on one work machine
>> (*"n**ote that each input DStream creates a single receiver (running on
>> a worker machine) that receives a single stream of data"*). Does the
>> "work machine" mean the executor or physical machine? If I have more
>> receivers than the executors, will it still work?
>>
>> 3. Is the executor that holds receiver also used for other operations,
>> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
>> in yarn-cluster mode, is the executor running driver program used by other
>> operations too?
>>
>> 4. So if I have a driver program (cluster mode) and streaming receiver,
>> do I have to have at least 2 executors because the program and streaming
>> receiver have to be on different executors?
>>
>> Thank you. Sorry for having so many questions but I do want to understand
>> how the Spark Streaming distributes in order to assign reasonable
>> recourse.*_* Thank you again.
>>
>> Best,
>>
>> Fang, Yan
>> yanfang724@gmail.com
>> +1 (206) 849-4108
>>
>
>

Re: How are the executors used in Spark Streaming in terms of receiver and driver program?

Posted by Praveen Seluka <ps...@qubole.com>.
Here are my answers. But am just getting started with Spark Streaming - so
please correct me if am wrong.
1) Yes
2) Receivers will run on executors. Its actually a job thats submitted
where # of tasks equals # of receivers. An executor can actually run more
than one task at the same time. Hence you could have more number of
receivers than executors but its not recommended I think.
3) As said in 2, the executor where receiver task is running can be used
for map/reduce tasks. In yarn-cluster mode, the driver program is actually
run as application master (lives in the first container thats launched) and
this is not an executor - hence its not used for other operations.
4) the driver runs in a separate container. I think the same executor can
be used for receiver and the processing task also (this part am not very
sure)


On Fri, Jul 11, 2014 at 12:29 AM, Yan Fang <ya...@gmail.com> wrote:

> Hi all,
>
> I am working to improve the parallelism of the Spark Streaming
> application. But I have problem in understanding how the executors are used
> and the application is distributed.
>
> 1. In YARN, is one executor equal one container?
>
> 2. I saw the statement that a streaming receiver runs on one work machine (
> *"n**ote that each input DStream creates a single receiver (running on a
> worker machine) that receives a single stream of data"*). Does the "work
> machine" mean the executor or physical machine? If I have more receivers
> than the executors, will it still work?
>
> 3. Is the executor that holds receiver also used for other operations,
> such as map, reduce, or fully occupied by the receiver? Similarly, if I run
> in yarn-cluster mode, is the executor running driver program used by other
> operations too?
>
> 4. So if I have a driver program (cluster mode) and streaming receiver, do
> I have to have at least 2 executors because the program and streaming
> receiver have to be on different executors?
>
> Thank you. Sorry for having so many questions but I do want to understand
> how the Spark Streaming distributes in order to assign reasonable
> recourse.*_* Thank you again.
>
> Best,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>