You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by James <al...@gmail.com> on 2015/03/14 09:49:19 UTC

How to avoid using some nodes while running a spark program on yarn

Hello,

I am got a cluster with spark on yarn. Currently some nodes of it are
running a spark streamming program, thus their local space is not enough to
support other application. Thus I wonder is that possible to use a
blacklist to avoid using these nodes when running a new spark program?

Alcaid

Re: How to avoid using some nodes while running a spark program on yarn

Posted by Ted Yu <yu...@gmail.com>.
Out of curiosity, I searched for 'capacity scheduler deadlock' yielded the
following:

[YARN-3265] CapacityScheduler deadlock when computing absolute max avail
capacity (fix for trunk/branch-2)

[YARN-3251] Fix CapacityScheduler deadlock when computing absolute max
avail capacity (short term fix for 2.6.1)

YARN-2456 Possible livelock in CapacityScheduler when RM is recovering apps

Looks like CapacityScheduler should get more stable in the upcoming hadoop
2.7.0 release.

Cheers

On Sat, Mar 14, 2015 at 4:25 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> You won’t be able to use YARN labels on 2.2.0. However, you only need the
> labels if you want to map containers on specific hardware. In your
> scenario, the capacity scheduler in YARN might be the best bet. You can
> setup separate queues for the streaming and other jobs to protect a
> percentage of cluster resources. You can then spread all jobs across the
> cluster while protecting the streaming jobs’ capacity (if your resource
> containers sizes are granular enough).
>
> Simon
>
>
> On Mar 14, 2015, at 9:57 AM, James <al...@gmail.com> wrote:
>
> My hadoop version is 2.2.0, and my spark version is 1.2.0
>
> 2015-03-14 17:22 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
>> Which release of hadoop are you using ?
>>
>> Can you utilize node labels feature ?
>> See YARN-2492 and YARN-796
>>
>> Cheers
>>
>> On Sat, Mar 14, 2015 at 1:49 AM, James <al...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am got a cluster with spark on yarn. Currently some nodes of it are
>>> running a spark streamming program, thus their local space is not enough to
>>> support other application. Thus I wonder is that possible to use a
>>> blacklist to avoid using these nodes when running a new spark program?
>>>
>>> Alcaid
>>>
>>
>>
>
>

Re: How to avoid using some nodes while running a spark program on yarn

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
You won’t be able to use YARN labels on 2.2.0. However, you only need the labels if you want to map containers on specific hardware. In your scenario, the capacity scheduler in YARN might be the best bet. You can setup separate queues for the streaming and other jobs to protect a percentage of cluster resources. You can then spread all jobs across the cluster while protecting the streaming jobs’ capacity (if your resource containers sizes are granular enough).

Simon


> On Mar 14, 2015, at 9:57 AM, James <al...@gmail.com> wrote:
> 
> My hadoop version is 2.2.0, and my spark version is 1.2.0
> 
> 2015-03-14 17:22 GMT+08:00 Ted Yu <yuzhihong@gmail.com <ma...@gmail.com>>:
> Which release of hadoop are you using ?
> 
> Can you utilize node labels feature ?
> See YARN-2492 and YARN-796
> 
> Cheers
> 
> On Sat, Mar 14, 2015 at 1:49 AM, James <alcaid1801@gmail.com <ma...@gmail.com>> wrote:
> Hello, 
> 
> I am got a cluster with spark on yarn. Currently some nodes of it are running a spark streamming program, thus their local space is not enough to support other application. Thus I wonder is that possible to use a blacklist to avoid using these nodes when running a new spark program? 
> 
> Alcaid
> 
> 


Re: How to avoid using some nodes while running a spark program on yarn

Posted by James <al...@gmail.com>.
My hadoop version is 2.2.0, and my spark version is 1.2.0

2015-03-14 17:22 GMT+08:00 Ted Yu <yu...@gmail.com>:

> Which release of hadoop are you using ?
>
> Can you utilize node labels feature ?
> See YARN-2492 and YARN-796
>
> Cheers
>
> On Sat, Mar 14, 2015 at 1:49 AM, James <al...@gmail.com> wrote:
>
>> Hello,
>>
>> I am got a cluster with spark on yarn. Currently some nodes of it are
>> running a spark streamming program, thus their local space is not enough to
>> support other application. Thus I wonder is that possible to use a
>> blacklist to avoid using these nodes when running a new spark program?
>>
>> Alcaid
>>
>
>

Re: How to avoid using some nodes while running a spark program on yarn

Posted by Ted Yu <yu...@gmail.com>.
Which release of hadoop are you using ?

Can you utilize node labels feature ?
See YARN-2492 and YARN-796

Cheers

On Sat, Mar 14, 2015 at 1:49 AM, James <al...@gmail.com> wrote:

> Hello,
>
> I am got a cluster with spark on yarn. Currently some nodes of it are
> running a spark streamming program, thus their local space is not enough to
> support other application. Thus I wonder is that possible to use a
> blacklist to avoid using these nodes when running a new spark program?
>
> Alcaid
>