You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arun Luthra <ar...@gmail.com> on 2015/07/03 00:48:06 UTC

Re: Spark launching without all of the requested YARN resources

Thanks Sandy et al, I will try that. I like that I can choose the
minRegisteredResourcesRatio.

On Wed, Jun 24, 2015 at 11:04 AM, Sandy Ryza <sa...@cloudera.com>
wrote:

> Hi Arun,
>
> You can achieve this by
> setting spark.scheduler.maxRegisteredResourcesWaitingTime to some really
> high number and spark.scheduler.minRegisteredResourcesRatio to 1.0.
>
> -Sandy
>
> On Wed, Jun 24, 2015 at 2:21 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>>  On 24 Jun 2015, at 05:55, canan chen <cc...@gmail.com> wrote:
>>
>>  Why do you want it start until all the resources are ready ? Make it
>> start as early as possible should make it complete earlier and increase the
>> utilization of resources
>>
>> On Tue, Jun 23, 2015 at 10:34 PM, Arun Luthra <ar...@gmail.com>
>> wrote:
>>
>>> Sometimes if my Hortonworks yarn-enabled cluster is fairly busy, Spark
>>> (via spark-submit) will begin its processing even though it apparently did
>>> not get all of the requested resources; it is running very slowly.
>>>
>>>  Is there a way to force Spark/YARN to only begin when it has the full
>>> set of resources that I request?
>>>
>>>  Thanks,
>>> Arun
>>>
>>
>>
>>
>>  The "wait until there's space" launch policy is known as Gang
>> Scheduling, https://issues.apache.org/jira/browse/YARN-624 covers what
>> would be needed there.
>>
>>  1. It's not in YARN
>>
>>  2. For analytics workloads, it's not clear you benefit. You would wait
>> a very long time(*) for the requirements to be satisfied. The current YARN
>> scheduling and placement algorithms assume that you'd prefer "timely
>> container launch" to "extended wait for containers in the right place", and
>> expects algorithms to work in a degraded form with a reduced no. of workers
>>
>>  3. Where it really matters is long-lived applications where you need
>> some quorum of container-hosted processes, or if performance collapses
>> utterly below a threshold. Things like HBase on YARN are an example —but
>> Spark streaming could be another.
>>
>>  In the absence of YARN support, it can be implemented in the
>> application by having theYARN-hosted application (here: Spark) get the
>> containers, start up a process on each one, but not actually start
>> accepting/performing work until a threshold of containers is reached/some
>> timeout has occurred.
>>
>>  If you wanted to do that in spark, you could raise the idea on the
>> spark dev lists and see what people think.
>>
>>  -Steve
>>
>>  (*) i.e. forever
>>
>
>