You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/10/09 17:20:36 UTC

Spark on YARN not utilizing all the YARN containers available

Hi All,

I am using Spark 2.3.1 and using YARN as a cluster manager.

I currently got

1) 6 YARN containers(executors=6) with 4 executor cores for each container.
2) 6 Kafka partitions from one topic.
3) You can assume every other configuration is set to whatever the default
values are.

Spawned a Simple Streaming Query and I see all the tasks get scheduled on
one YARN container. am I missing any config?

Thanks!

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Dillon,

yes we can understand the number of executors that are running but the
question is more around understanding the relation between YARN containers,
their persistence and SPARK excutors.

Regards,
Gourav

On Wed, Oct 10, 2018 at 6:38 AM Dillon Dukek <di...@placed.com>
wrote:

> There is documentation here
> http://spark.apache.org/docs/latest/running-on-yarn.html about running
> spark on YARN. Like I said before you can use either the logs from the
> application or the Spark UI to understand how many executors are running at
> any given time. I don't think I can help much further without more
> information about the specific use case.
>
>
> On Tue, Oct 9, 2018 at 2:54 PM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi Dillon,
>>
>> I do think that there is a setting available where in once YARN sets up
>> the containers then you do not deallocate them, I had used it previously in
>> HIVE, and it just saves processing time in terms of allocating containers.
>> That said I am still trying to understand how do we determine one YARN
>> container = one executor in SPARK.
>>
>> Regards,
>> Gourav
>>
>> On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek
>> <di...@placed.com.invalid> wrote:
>>
>>> I'm still not sure exactly what you are meaning by saying that you have
>>> 6 yarn containers. Yarn should just be aware of the total available
>>> resources in  your cluster and then be able to launch containers based on
>>> the executor requirements you set when you submit your job. If you can, I
>>> think it would be helpful to send me the command you're using to launch
>>> your spark process. You should also be able to use the logs and/or the
>>> spark UI to determine how many executors are running.
>>>
>>> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <
>>> gourav.sengupta@gmail.com> wrote:
>>>
>>>> hi,
>>>>
>>>> may be I am not quite clear in my head on this one. But how do we know
>>>> that 1 yarn container = 1 executor?
>>>>
>>>> Regards,
>>>> Gourav Sengupta
>>>>
>>>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
>>>> <di...@placed.com.invalid> wrote:
>>>>
>>>>> Can you send how you are launching your streaming process? Also what
>>>>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>>>>
>>>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>>>>
>>>>>> I currently got
>>>>>>
>>>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>>>>> container.
>>>>>> 2) 6 Kafka partitions from one topic.
>>>>>> 3) You can assume every other configuration is set to whatever the
>>>>>> default values are.
>>>>>>
>>>>>> Spawned a Simple Streaming Query and I see all the tasks get
>>>>>> scheduled on one YARN container. am I missing any config?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Dillon Dukek <di...@placed.com.INVALID>.
There is documentation here
http://spark.apache.org/docs/latest/running-on-yarn.html about running
spark on YARN. Like I said before you can use either the logs from the
application or the Spark UI to understand how many executors are running at
any given time. I don't think I can help much further without more
information about the specific use case.


On Tue, Oct 9, 2018 at 2:54 PM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi Dillon,
>
> I do think that there is a setting available where in once YARN sets up
> the containers then you do not deallocate them, I had used it previously in
> HIVE, and it just saves processing time in terms of allocating containers.
> That said I am still trying to understand how do we determine one YARN
> container = one executor in SPARK.
>
> Regards,
> Gourav
>
> On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek
> <di...@placed.com.invalid> wrote:
>
>> I'm still not sure exactly what you are meaning by saying that you have 6
>> yarn containers. Yarn should just be aware of the total available resources
>> in  your cluster and then be able to launch containers based on the
>> executor requirements you set when you submit your job. If you can, I think
>> it would be helpful to send me the command you're using to launch your
>> spark process. You should also be able to use the logs and/or the spark UI
>> to determine how many executors are running.
>>
>> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> hi,
>>>
>>> may be I am not quite clear in my head on this one. But how do we know
>>> that 1 yarn container = 1 executor?
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
>>> <di...@placed.com.invalid> wrote:
>>>
>>>> Can you send how you are launching your streaming process? Also what
>>>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>>>
>>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>>>
>>>>> I currently got
>>>>>
>>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>>>> container.
>>>>> 2) 6 Kafka partitions from one topic.
>>>>> 3) You can assume every other configuration is set to whatever the
>>>>> default values are.
>>>>>
>>>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled
>>>>> on one YARN container. am I missing any config?
>>>>>
>>>>> Thanks!
>>>>>
>>>>

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Dillon,

I do think that there is a setting available where in once YARN sets up the
containers then you do not deallocate them, I had used it previously in
HIVE, and it just saves processing time in terms of allocating containers.
That said I am still trying to understand how do we determine one YARN
container = one executor in SPARK.

Regards,
Gourav

On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek <di...@placed.com.invalid>
wrote:

> I'm still not sure exactly what you are meaning by saying that you have 6
> yarn containers. Yarn should just be aware of the total available resources
> in  your cluster and then be able to launch containers based on the
> executor requirements you set when you submit your job. If you can, I think
> it would be helpful to send me the command you're using to launch your
> spark process. You should also be able to use the logs and/or the spark UI
> to determine how many executors are running.
>
> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> hi,
>>
>> may be I am not quite clear in my head on this one. But how do we know
>> that 1 yarn container = 1 executor?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
>> <di...@placed.com.invalid> wrote:
>>
>>> Can you send how you are launching your streaming process? Also what
>>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>>
>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>>
>>>> I currently got
>>>>
>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>>> container.
>>>> 2) 6 Kafka partitions from one topic.
>>>> 3) You can assume every other configuration is set to whatever the
>>>> default values are.
>>>>
>>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled
>>>> on one YARN container. am I missing any config?
>>>>
>>>> Thanks!
>>>>
>>>

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Dillon Dukek <di...@placed.com.INVALID>.
I'm still not sure exactly what you are meaning by saying that you have 6
yarn containers. Yarn should just be aware of the total available resources
in  your cluster and then be able to launch containers based on the
executor requirements you set when you submit your job. If you can, I think
it would be helpful to send me the command you're using to launch your
spark process. You should also be able to use the logs and/or the spark UI
to determine how many executors are running.

On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <go...@gmail.com>
wrote:

> hi,
>
> may be I am not quite clear in my head on this one. But how do we know
> that 1 yarn container = 1 executor?
>
> Regards,
> Gourav Sengupta
>
> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
> <di...@placed.com.invalid> wrote:
>
>> Can you send how you are launching your streaming process? Also what
>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>
>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>
>>> I currently got
>>>
>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>> container.
>>> 2) 6 Kafka partitions from one topic.
>>> 3) You can assume every other configuration is set to whatever the
>>> default values are.
>>>
>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled
>>> on one YARN container. am I missing any config?
>>>
>>> Thanks!
>>>
>>

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Gourav Sengupta <go...@gmail.com>.
hi,

may be I am not quite clear in my head on this one. But how do we know that
1 yarn container = 1 executor?

Regards,
Gourav Sengupta

On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek <di...@placed.com.invalid>
wrote:

> Can you send how you are launching your streaming process? Also what
> environment is this cluster running in (EMR, GCP, self managed, etc)?
>
> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>
>> I currently got
>>
>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>> container.
>> 2) 6 Kafka partitions from one topic.
>> 3) You can assume every other configuration is set to whatever the
>> default values are.
>>
>> Spawned a Simple Streaming Query and I see all the tasks get scheduled on
>> one YARN container. am I missing any config?
>>
>> Thanks!
>>
>

Re: Spark on YARN not utilizing all the YARN containers available

Posted by Dillon Dukek <di...@placed.com.INVALID>.
Can you send how you are launching your streaming process? Also what
environment is this cluster running in (EMR, GCP, self managed, etc)?

On Tue, Oct 9, 2018 at 10:21 AM kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I am using Spark 2.3.1 and using YARN as a cluster manager.
>
> I currently got
>
> 1) 6 YARN containers(executors=6) with 4 executor cores for each
> container.
> 2) 6 Kafka partitions from one topic.
> 3) You can assume every other configuration is set to whatever the default
> values are.
>
> Spawned a Simple Streaming Query and I see all the tasks get scheduled on
> one YARN container. am I missing any config?
>
> Thanks!
>