You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gokula Krishnan D <em...@gmail.com> on 2017/07/20 20:45:08 UTC

Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Hello All,

We are having cluster with 50 Executors each with 4 Cores so can avail max.
200 Executors.

I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and
dynamicallocation=true and it got all the available executors.

In the meantime, submitting another Spark Application (JOB B) with the
scheduler.mode as FAIR and dynamicallocation=true but it got only one
executor.

Normally this situation occurs when any of the JOB runs with the
Scheduler.mode= FIFO.

1) Have your ever faced this issue if so how to overcome this?.

I was in the impression that as soon as I submit the JOB B the Spark
Scheduler should distribute/release few resources from the JOB A and share
it with the JOB A in the Round Robin fashion?.

Appreciate your response !!!.


Thanks & Regards,
Gokula Krishnan* (Gokul)*

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D <em...@gmail.com> wrote:
> Is there anyway can we setup the scheduler mode in Spark Cluster level
> besides application (SC level).

That's called the cluster (or resource) manager. e.g., configure
separate queues in YARN with a maximum number of resources for each.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by Gokula Krishnan D <em...@gmail.com>.
Mark & Ayan, thanks for the inputs.

*Is there anyway can we setup the scheduler mode in Spark Cluster level
besides application (SC level).*

Currently in YARN is in FAIR mode and manually we ensure that Spark
Application also in FAIR mode however noticed that Applications are not
releasing the resources as soon as the tasks are done when we mention
Dynamic allocation = true and did not specify any explicit Executor
allocation.

At this moment, we are specifying the Min and Max Executor allocation at
Spark Application level in order to ensure that all of our ETL Spark
Applications can run parallel without any resource issues.

It would be great if you could throw more insight on the how to set the
preemption within yarn and Spark.

Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Thu, Jul 20, 2017 at 6:46 PM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> As Mark said, scheduler mode works within application ie within a Spark
> Session and Spark context. This is also clear if you think where you set
> the configuration - in a Spark Config which used to build a context.
>
> If you are using Yarn as resource manager, however, you can set YARN with
> fair scheduler. If you do so, both of your applications will get "Fair"
> treatment from Yarn, ie get resources in round robin manner. If you want
> your App A to give up resources while using them, you need to set
> preemption within Yarn and priority of applications so that preemption can
> kick in.
>
> HTH...
>
> Best, Ayan
>
> On Fri, Jul 21, 2017 at 7:11 AM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> The fair scheduler doesn't have anything to do with reallocating resource
>> across Applications.
>>
>> https://spark.apache.org/docs/latest/job-scheduling.html#sch
>> eduling-across-applications
>> https://spark.apache.org/docs/latest/job-scheduling.html#sch
>> eduling-within-an-application
>>
>> On Thu, Jul 20, 2017 at 2:02 PM, Gokula Krishnan D <em...@gmail.com>
>> wrote:
>>
>>> Mark, Thanks for the response.
>>>
>>> Let me rephrase my statements.
>>>
>>> "I am submitting a Spark application(*Application*#A) with
>>> scheduler.mode as FAIR and dynamicallocation=true and it got all the
>>> available executors.
>>>
>>> In the meantime, submitting another Spark Application (*Application*
>>> # B) with the scheduler.mode as FAIR and dynamicallocation=true but it got
>>> only one executor. "
>>>
>>> Thanks & Regards,
>>> Gokula Krishnan* (Gokul)*
>>>
>>> On Thu, Jul 20, 2017 at 4:56 PM, Mark Hamstra <ma...@clearstorydata.com>
>>> wrote:
>>>
>>>> First, Executors are not allocated to Jobs, but rather to Applications.
>>>> If you run multiple Jobs within a single Application, then each of the
>>>> Tasks associated with Stages of those Jobs has the potential to run on any
>>>> of the Application's Executors. Second, once a Task starts running on an
>>>> Executor, it has to complete before another Task can be scheduled using the
>>>> prior Task's resources -- the fair scheduler is not preemptive of running
>>>> Tasks.
>>>>
>>>> On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D <email2dgk@gmail.com
>>>> > wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> We are having cluster with 50 Executors each with 4 Cores so can avail
>>>>> max. 200 Executors.
>>>>>
>>>>> I am submitting a Spark application(JOB A) with scheduler.mode as FAIR
>>>>> and dynamicallocation=true and it got all the available executors.
>>>>>
>>>>> In the meantime, submitting another Spark Application (JOB B) with the
>>>>> scheduler.mode as FAIR and dynamicallocation=true but it got only one
>>>>> executor.
>>>>>
>>>>> Normally this situation occurs when any of the JOB runs with the
>>>>> Scheduler.mode= FIFO.
>>>>>
>>>>> 1) Have your ever faced this issue if so how to overcome this?.
>>>>>
>>>>> I was in the impression that as soon as I submit the JOB B the Spark
>>>>> Scheduler should distribute/release few resources from the JOB A and share
>>>>> it with the JOB A in the Round Robin fashion?.
>>>>>
>>>>> Appreciate your response !!!.
>>>>>
>>>>>
>>>>> Thanks & Regards,
>>>>> Gokula Krishnan* (Gokul)*
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by ayan guha <gu...@gmail.com>.
Hi

As Mark said, scheduler mode works within application ie within a Spark
Session and Spark context. This is also clear if you think where you set
the configuration - in a Spark Config which used to build a context.

If you are using Yarn as resource manager, however, you can set YARN with
fair scheduler. If you do so, both of your applications will get "Fair"
treatment from Yarn, ie get resources in round robin manner. If you want
your App A to give up resources while using them, you need to set
preemption within Yarn and priority of applications so that preemption can
kick in.

HTH...

Best, Ayan

On Fri, Jul 21, 2017 at 7:11 AM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> The fair scheduler doesn't have anything to do with reallocating resource
> across Applications.
>
> https://spark.apache.org/docs/latest/job-scheduling.html#
> scheduling-across-applications
> https://spark.apache.org/docs/latest/job-scheduling.html#
> scheduling-within-an-application
>
> On Thu, Jul 20, 2017 at 2:02 PM, Gokula Krishnan D <em...@gmail.com>
> wrote:
>
>> Mark, Thanks for the response.
>>
>> Let me rephrase my statements.
>>
>> "I am submitting a Spark application(*Application*#A) with
>> scheduler.mode as FAIR and dynamicallocation=true and it got all the
>> available executors.
>>
>> In the meantime, submitting another Spark Application (*Application*
>> # B) with the scheduler.mode as FAIR and dynamicallocation=true but it got
>> only one executor. "
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>> On Thu, Jul 20, 2017 at 4:56 PM, Mark Hamstra <ma...@clearstorydata.com>
>> wrote:
>>
>>> First, Executors are not allocated to Jobs, but rather to Applications.
>>> If you run multiple Jobs within a single Application, then each of the
>>> Tasks associated with Stages of those Jobs has the potential to run on any
>>> of the Application's Executors. Second, once a Task starts running on an
>>> Executor, it has to complete before another Task can be scheduled using the
>>> prior Task's resources -- the fair scheduler is not preemptive of running
>>> Tasks.
>>>
>>> On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D <em...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>> We are having cluster with 50 Executors each with 4 Cores so can avail
>>>> max. 200 Executors.
>>>>
>>>> I am submitting a Spark application(JOB A) with scheduler.mode as FAIR
>>>> and dynamicallocation=true and it got all the available executors.
>>>>
>>>> In the meantime, submitting another Spark Application (JOB B) with the
>>>> scheduler.mode as FAIR and dynamicallocation=true but it got only one
>>>> executor.
>>>>
>>>> Normally this situation occurs when any of the JOB runs with the
>>>> Scheduler.mode= FIFO.
>>>>
>>>> 1) Have your ever faced this issue if so how to overcome this?.
>>>>
>>>> I was in the impression that as soon as I submit the JOB B the Spark
>>>> Scheduler should distribute/release few resources from the JOB A and share
>>>> it with the JOB A in the Round Robin fashion?.
>>>>
>>>> Appreciate your response !!!.
>>>>
>>>>
>>>> Thanks & Regards,
>>>> Gokula Krishnan* (Gokul)*
>>>>
>>>
>>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by Mark Hamstra <ma...@clearstorydata.com>.
The fair scheduler doesn't have anything to do with reallocating resource
across Applications.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-across-applications
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

On Thu, Jul 20, 2017 at 2:02 PM, Gokula Krishnan D <em...@gmail.com>
wrote:

> Mark, Thanks for the response.
>
> Let me rephrase my statements.
>
> "I am submitting a Spark application(*Application*#A) with scheduler.mode
> as FAIR and dynamicallocation=true and it got all the available executors.
>
> In the meantime, submitting another Spark Application (*Application* # B)
> with the scheduler.mode as FAIR and dynamicallocation=true but it got only
> one executor. "
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
> On Thu, Jul 20, 2017 at 4:56 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> First, Executors are not allocated to Jobs, but rather to Applications.
>> If you run multiple Jobs within a single Application, then each of the
>> Tasks associated with Stages of those Jobs has the potential to run on any
>> of the Application's Executors. Second, once a Task starts running on an
>> Executor, it has to complete before another Task can be scheduled using the
>> prior Task's resources -- the fair scheduler is not preemptive of running
>> Tasks.
>>
>> On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D <em...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>> We are having cluster with 50 Executors each with 4 Cores so can avail
>>> max. 200 Executors.
>>>
>>> I am submitting a Spark application(JOB A) with scheduler.mode as FAIR
>>> and dynamicallocation=true and it got all the available executors.
>>>
>>> In the meantime, submitting another Spark Application (JOB B) with the
>>> scheduler.mode as FAIR and dynamicallocation=true but it got only one
>>> executor.
>>>
>>> Normally this situation occurs when any of the JOB runs with the
>>> Scheduler.mode= FIFO.
>>>
>>> 1) Have your ever faced this issue if so how to overcome this?.
>>>
>>> I was in the impression that as soon as I submit the JOB B the Spark
>>> Scheduler should distribute/release few resources from the JOB A and share
>>> it with the JOB A in the Round Robin fashion?.
>>>
>>> Appreciate your response !!!.
>>>
>>>
>>> Thanks & Regards,
>>> Gokula Krishnan* (Gokul)*
>>>
>>
>>
>

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by Gokula Krishnan D <em...@gmail.com>.
Mark, Thanks for the response.

Let me rephrase my statements.

"I am submitting a Spark application(*Application*#A) with scheduler.mode
as FAIR and dynamicallocation=true and it got all the available executors.

In the meantime, submitting another Spark Application (*Application* # B)
with the scheduler.mode as FAIR and dynamicallocation=true but it got only
one executor. "

Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Thu, Jul 20, 2017 at 4:56 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> First, Executors are not allocated to Jobs, but rather to Applications. If
> you run multiple Jobs within a single Application, then each of the Tasks
> associated with Stages of those Jobs has the potential to run on any of the
> Application's Executors. Second, once a Task starts running on an Executor,
> it has to complete before another Task can be scheduled using the prior
> Task's resources -- the fair scheduler is not preemptive of running Tasks.
>
> On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D <em...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> We are having cluster with 50 Executors each with 4 Cores so can avail
>> max. 200 Executors.
>>
>> I am submitting a Spark application(JOB A) with scheduler.mode as FAIR
>> and dynamicallocation=true and it got all the available executors.
>>
>> In the meantime, submitting another Spark Application (JOB B) with the
>> scheduler.mode as FAIR and dynamicallocation=true but it got only one
>> executor.
>>
>> Normally this situation occurs when any of the JOB runs with the
>> Scheduler.mode= FIFO.
>>
>> 1) Have your ever faced this issue if so how to overcome this?.
>>
>> I was in the impression that as soon as I submit the JOB B the Spark
>> Scheduler should distribute/release few resources from the JOB A and share
>> it with the JOB A in the Round Robin fashion?.
>>
>> Appreciate your response !!!.
>>
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>
>

Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

Posted by Mark Hamstra <ma...@clearstorydata.com>.
First, Executors are not allocated to Jobs, but rather to Applications. If
you run multiple Jobs within a single Application, then each of the Tasks
associated with Stages of those Jobs has the potential to run on any of the
Application's Executors. Second, once a Task starts running on an Executor,
it has to complete before another Task can be scheduled using the prior
Task's resources -- the fair scheduler is not preemptive of running Tasks.

On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D <em...@gmail.com>
wrote:

> Hello All,
>
> We are having cluster with 50 Executors each with 4 Cores so can avail
> max. 200 Executors.
>
> I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and
> dynamicallocation=true and it got all the available executors.
>
> In the meantime, submitting another Spark Application (JOB B) with the
> scheduler.mode as FAIR and dynamicallocation=true but it got only one
> executor.
>
> Normally this situation occurs when any of the JOB runs with the
> Scheduler.mode= FIFO.
>
> 1) Have your ever faced this issue if so how to overcome this?.
>
> I was in the impression that as soon as I submit the JOB B the Spark
> Scheduler should distribute/release few resources from the JOB A and share
> it with the JOB A in the Round Robin fashion?.
>
> Appreciate your response !!!.
>
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>