You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@slider.apache.org by Shrijeet Paliwal <sh...@gmail.com> on 2015/03/23 18:15:16 UTC

Redundant container request from slider causing high load on busy cluster

Hello,

*Context:*

We were seeing very aggressive preemption done by Fair Scheduler and 98% of
preemption activity is triggered due to slider queue's needs. Slider queue
is stable queue i.e its containers don't churn and it has been provided a
fair share guarantee of more than it needs (high weight & min share double
of its steady state needs). So it was puzzling to see it triggering
preemption. When I turned on debug logging of fair scheduler I noticed
scheduler demand update thread reporting unusually high demand from Slider
queue.

Initial thought was a bug in scheduler but later I concluded its Slider's
problem but not due to its own code but due to AMRMClient code. I can
deterministically reproduce the issue on my laptop running a pseudo
yarn+slider setup.  I traced it to an open issue
https://issues.apache.org/jira/browse/YARN-3020.

*The problem: *

1. A region server fails for the first time, slider notices it
and registers a request to RM via AMRMClient for a new container. At this
time AMRMClient caches this allocation request with the 'Resource' (a data
structure with memory, cpu & priority) as key.
(source: AMRMClientImpl.java, cache is remoteRequestsTable)
2. A region server fails again, slider notices it and registers a request
to RM again via AMRMClient for a (one) new container. AMRMClient finds that
similar Resource request (the memory, cpu and priority for RS doesn't
change obviously) in its cache, add +1 to the container count before
putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends up
receiving 2. When containers are allocated, slider keeps one and discards
one.
3. As explained in YARN-3020, with subsequent failures we will keep asking
for more and more containers when in reality we always need one.

For fair scheduler this means demand keeps going up. It doesn't know that
slider ends up discarding the surplus containers. In order to satisfy the
demand it kills mercilessly. Needless to say this will not be just
triggered by container failure, even flexing should trigger this.

*The fix: *

Rumor is that AMRMClient doesn't have a bug, its intended behaviour
(source: comments in  YARN-3020). The claim is that on receiving container
client should clear the cache by calling a method called
'removeContainerRequest'. Slider isn't following the protocol correctly, in
Slider's defense the protocol is not well defined.

Thoughts?
--
Shrijeet

Re: Redundant container request from slider causing high load on busy cluster

Posted by Steve Loughran <st...@hortonworks.com>.

OK, I understand the issue now and will pull it in to the SLIDER-799/SLIDER-611. Tagged as a blocker for the 0.80-incubating release (i.e. along with the rest of that work)

We're already tracking placed requests from histories; I think now we'll need to track all the "open" requests, so that we can have the role history logic decide explicitly which requests to cancel when an allocated or surplus value is needed. This is particularly important now that Gour's work on restart/reconfig is going in -it suddenly becomes a lot more likely that there will be containers/requests with different resource allocations



> On 23 Mar 2015, at 23:07, Shrijeet Paliwal <sh...@gmail.com> wrote:
> 
> If that floats the boat! Here:
> https://issues.apache.org/jira/browse/SLIDER-828
> 
> --
> Shrijeet
> 
> On Mon, Mar 23, 2015 at 3:15 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> The JIRAs you mentioned are somewhat dormant.
>> 
>> You can open a Slider JIRA to track this issue.
>> 
>> Thanks
>> 
>> On Mon, Mar 23, 2015 at 10:44 AM, Shrijeet Paliwal <
>> shrijeet.paliwal@gmail.com> wrote:
>> 
>>> More evidence:
>>> 
>>> Spark is also affected: https://issues.apache.org/jira/browse/SPARK-2687
>>> One more relevant yarn jira:
>>> https://issues.apache.org/jira/browse/YARN-1902
>>> 
>>> --
>>> Shrijeet
>>> 
>>> On Mon, Mar 23, 2015 at 10:15 AM, Shrijeet Paliwal <
>>> shrijeet.paliwal@gmail.com> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> *Context:*
>>>> 
>>>> We were seeing very aggressive preemption done by Fair Scheduler and
>> 98%
>>>> of preemption activity is triggered due to slider queue's needs. Slider
>>>> queue is stable queue i.e its containers don't churn and it has been
>>>> provided a fair share guarantee of more than it needs (high weight &
>> min
>>>> share double of its steady state needs). So it was puzzling to see it
>>>> triggering preemption. When I turned on debug logging of fair
>> scheduler I
>>>> noticed scheduler demand update thread reporting unusually high demand
>>> from
>>>> Slider queue.
>>>> 
>>>> Initial thought was a bug in scheduler but later I concluded its
>> Slider's
>>>> problem but not due to its own code but due to AMRMClient code. I can
>>>> deterministically reproduce the issue on my laptop running a pseudo
>>>> yarn+slider setup.  I traced it to an open issue
>>>> https://issues.apache.org/jira/browse/YARN-3020.
>>>> 
>>>> *The problem: *
>>>> 
>>>> 1. A region server fails for the first time, slider notices it
>>>> and registers a request to RM via AMRMClient for a new container. At
>> this
>>>> time AMRMClient caches this allocation request with the 'Resource' (a
>>> data
>>>> structure with memory, cpu & priority) as key.
>>>> (source: AMRMClientImpl.java, cache is remoteRequestsTable)
>>>> 2. A region server fails again, slider notices it and registers a
>> request
>>>> to RM again via AMRMClient for a (one) new container. AMRMClient finds
>>> that
>>>> similar Resource request (the memory, cpu and priority for RS doesn't
>>>> change obviously) in its cache, add +1 to the container count before
>>>> putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends
>> up
>>>> receiving 2. When containers are allocated, slider keeps one and
>> discards
>>>> one.
>>>> 3. As explained in YARN-3020, with subsequent failures we will keep
>>> asking
>>>> for more and more containers when in reality we always need one.
>>>> 
>>>> For fair scheduler this means demand keeps going up. It doesn't know
>> that
>>>> slider ends up discarding the surplus containers. In order to satisfy
>> the
>>>> demand it kills mercilessly. Needless to say this will not be just
>>>> triggered by container failure, even flexing should trigger this.
>>>> 
>>>> *The fix: *
>>>> 
>>>> Rumor is that AMRMClient doesn't have a bug, its intended behaviour
>>>> (source: comments in  YARN-3020). The claim is that on receiving
>>>> container client should clear the cache by calling a method called
>>>> 'removeContainerRequest'. Slider isn't following the protocol
>> correctly,
>>> in
>>>> Slider's defense the protocol is not well defined.
>>>> 
>>>> Thoughts?
>>>> --
>>>> Shrijeet
>>>> 
>>> 
>>

Re: Redundant container request from slider causing high load on busy cluster

Posted by Shrijeet Paliwal <sh...@gmail.com>.

If that floats the boat! Here:
https://issues.apache.org/jira/browse/SLIDER-828

--
Shrijeet

On Mon, Mar 23, 2015 at 3:15 PM, Ted Yu <yu...@gmail.com> wrote:

> The JIRAs you mentioned are somewhat dormant.
>
> You can open a Slider JIRA to track this issue.
>
> Thanks
>
> On Mon, Mar 23, 2015 at 10:44 AM, Shrijeet Paliwal <
> shrijeet.paliwal@gmail.com> wrote:
>
> > More evidence:
> >
> > Spark is also affected: https://issues.apache.org/jira/browse/SPARK-2687
> > One more relevant yarn jira:
> > https://issues.apache.org/jira/browse/YARN-1902
> >
> > --
> > Shrijeet
> >
> > On Mon, Mar 23, 2015 at 10:15 AM, Shrijeet Paliwal <
> > shrijeet.paliwal@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > *Context:*
> > >
> > > We were seeing very aggressive preemption done by Fair Scheduler and
> 98%
> > > of preemption activity is triggered due to slider queue's needs. Slider
> > > queue is stable queue i.e its containers don't churn and it has been
> > > provided a fair share guarantee of more than it needs (high weight &
> min
> > > share double of its steady state needs). So it was puzzling to see it
> > > triggering preemption. When I turned on debug logging of fair
> scheduler I
> > > noticed scheduler demand update thread reporting unusually high demand
> > from
> > > Slider queue.
> > >
> > > Initial thought was a bug in scheduler but later I concluded its
> Slider's
> > > problem but not due to its own code but due to AMRMClient code. I can
> > > deterministically reproduce the issue on my laptop running a pseudo
> > > yarn+slider setup.  I traced it to an open issue
> > > https://issues.apache.org/jira/browse/YARN-3020.
> > >
> > > *The problem: *
> > >
> > > 1. A region server fails for the first time, slider notices it
> > > and registers a request to RM via AMRMClient for a new container. At
> this
> > > time AMRMClient caches this allocation request with the 'Resource' (a
> > data
> > > structure with memory, cpu & priority) as key.
> > > (source: AMRMClientImpl.java, cache is remoteRequestsTable)
> > > 2. A region server fails again, slider notices it and registers a
> request
> > > to RM again via AMRMClient for a (one) new container. AMRMClient finds
> > that
> > > similar Resource request (the memory, cpu and priority for RS doesn't
> > > change obviously) in its cache, add +1 to the container count before
> > > putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends
> up
> > > receiving 2. When containers are allocated, slider keeps one and
> discards
> > > one.
> > > 3. As explained in YARN-3020, with subsequent failures we will keep
> > asking
> > > for more and more containers when in reality we always need one.
> > >
> > > For fair scheduler this means demand keeps going up. It doesn't know
> that
> > > slider ends up discarding the surplus containers. In order to satisfy
> the
> > > demand it kills mercilessly. Needless to say this will not be just
> > > triggered by container failure, even flexing should trigger this.
> > >
> > > *The fix: *
> > >
> > > Rumor is that AMRMClient doesn't have a bug, its intended behaviour
> > > (source: comments in  YARN-3020). The claim is that on receiving
> > > container client should clear the cache by calling a method called
> > > 'removeContainerRequest'. Slider isn't following the protocol
> correctly,
> > in
> > > Slider's defense the protocol is not well defined.
> > >
> > > Thoughts?
> > > --
> > > Shrijeet
> > >
> >
>

Re: Redundant container request from slider causing high load on busy cluster

Posted by Ted Yu <yu...@gmail.com>.

The JIRAs you mentioned are somewhat dormant.

You can open a Slider JIRA to track this issue.

Thanks

On Mon, Mar 23, 2015 at 10:44 AM, Shrijeet Paliwal <
shrijeet.paliwal@gmail.com> wrote:

> More evidence:
>
> Spark is also affected: https://issues.apache.org/jira/browse/SPARK-2687
> One more relevant yarn jira:
> https://issues.apache.org/jira/browse/YARN-1902
>
> --
> Shrijeet
>
> On Mon, Mar 23, 2015 at 10:15 AM, Shrijeet Paliwal <
> shrijeet.paliwal@gmail.com> wrote:
>
> > Hello,
> >
> > *Context:*
> >
> > We were seeing very aggressive preemption done by Fair Scheduler and 98%
> > of preemption activity is triggered due to slider queue's needs. Slider
> > queue is stable queue i.e its containers don't churn and it has been
> > provided a fair share guarantee of more than it needs (high weight & min
> > share double of its steady state needs). So it was puzzling to see it
> > triggering preemption. When I turned on debug logging of fair scheduler I
> > noticed scheduler demand update thread reporting unusually high demand
> from
> > Slider queue.
> >
> > Initial thought was a bug in scheduler but later I concluded its Slider's
> > problem but not due to its own code but due to AMRMClient code. I can
> > deterministically reproduce the issue on my laptop running a pseudo
> > yarn+slider setup.  I traced it to an open issue
> > https://issues.apache.org/jira/browse/YARN-3020.
> >
> > *The problem: *
> >
> > 1. A region server fails for the first time, slider notices it
> > and registers a request to RM via AMRMClient for a new container. At this
> > time AMRMClient caches this allocation request with the 'Resource' (a
> data
> > structure with memory, cpu & priority) as key.
> > (source: AMRMClientImpl.java, cache is remoteRequestsTable)
> > 2. A region server fails again, slider notices it and registers a request
> > to RM again via AMRMClient for a (one) new container. AMRMClient finds
> that
> > similar Resource request (the memory, cpu and priority for RS doesn't
> > change obviously) in its cache, add +1 to the container count before
> > putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends up
> > receiving 2. When containers are allocated, slider keeps one and discards
> > one.
> > 3. As explained in YARN-3020, with subsequent failures we will keep
> asking
> > for more and more containers when in reality we always need one.
> >
> > For fair scheduler this means demand keeps going up. It doesn't know that
> > slider ends up discarding the surplus containers. In order to satisfy the
> > demand it kills mercilessly. Needless to say this will not be just
> > triggered by container failure, even flexing should trigger this.
> >
> > *The fix: *
> >
> > Rumor is that AMRMClient doesn't have a bug, its intended behaviour
> > (source: comments in  YARN-3020). The claim is that on receiving
> > container client should clear the cache by calling a method called
> > 'removeContainerRequest'. Slider isn't following the protocol correctly,
> in
> > Slider's defense the protocol is not well defined.
> >
> > Thoughts?
> > --
> > Shrijeet
> >
>

Re: Redundant container request from slider causing high load on busy cluster

Posted by Shrijeet Paliwal <sh...@gmail.com>.

More evidence:

Spark is also affected: https://issues.apache.org/jira/browse/SPARK-2687
One more relevant yarn jira: https://issues.apache.org/jira/browse/YARN-1902

--
Shrijeet

On Mon, Mar 23, 2015 at 10:15 AM, Shrijeet Paliwal <
shrijeet.paliwal@gmail.com> wrote:

> Hello,
>
> *Context:*
>
> We were seeing very aggressive preemption done by Fair Scheduler and 98%
> of preemption activity is triggered due to slider queue's needs. Slider
> queue is stable queue i.e its containers don't churn and it has been
> provided a fair share guarantee of more than it needs (high weight & min
> share double of its steady state needs). So it was puzzling to see it
> triggering preemption. When I turned on debug logging of fair scheduler I
> noticed scheduler demand update thread reporting unusually high demand from
> Slider queue.
>
> Initial thought was a bug in scheduler but later I concluded its Slider's
> problem but not due to its own code but due to AMRMClient code. I can
> deterministically reproduce the issue on my laptop running a pseudo
> yarn+slider setup.  I traced it to an open issue
> https://issues.apache.org/jira/browse/YARN-3020.
>
> *The problem: *
>
> 1. A region server fails for the first time, slider notices it
> and registers a request to RM via AMRMClient for a new container. At this
> time AMRMClient caches this allocation request with the 'Resource' (a data
> structure with memory, cpu & priority) as key.
> (source: AMRMClientImpl.java, cache is remoteRequestsTable)
> 2. A region server fails again, slider notices it and registers a request
> to RM again via AMRMClient for a (one) new container. AMRMClient finds that
> similar Resource request (the memory, cpu and priority for RS doesn't
> change obviously) in its cache, add +1 to the container count before
> putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends up
> receiving 2. When containers are allocated, slider keeps one and discards
> one.
> 3. As explained in YARN-3020, with subsequent failures we will keep asking
> for more and more containers when in reality we always need one.
>
> For fair scheduler this means demand keeps going up. It doesn't know that
> slider ends up discarding the surplus containers. In order to satisfy the
> demand it kills mercilessly. Needless to say this will not be just
> triggered by container failure, even flexing should trigger this.
>
> *The fix: *
>
> Rumor is that AMRMClient doesn't have a bug, its intended behaviour
> (source: comments in  YARN-3020). The claim is that on receiving
> container client should clear the cache by calling a method called
> 'removeContainerRequest'. Slider isn't following the protocol correctly, in
> Slider's defense the protocol is not well defined.
>
> Thoughts?
> --
> Shrijeet
>

Re: Redundant container request from slider causing high load on busy cluster

Posted by Steve Loughran <st...@hortonworks.com>.

Hi

If you check out the develop branch from the ASF git repository, I believe it now contains a fix for this.

It also contains SLIDER-799: AM-managed placement escalation, and all but one subtask
of SLIDER-611 -that is, all the enhancements for placement planned for Slider 0.80-incubating

Shrijeet -can you grab this branch, do a local build and see if the problem you are seeing is now fixed?

> On 23 Mar 2015, at 17:15, Shrijeet Paliwal <sh...@gmail.com> wrote:
> 
> Hello,
> 
> *Context:*
> 
> We were seeing very aggressive preemption done by Fair Scheduler and 98% of
> preemption activity is triggered due to slider queue's needs. Slider queue
> is stable queue i.e its containers don't churn and it has been provided a
> fair share guarantee of more than it needs (high weight & min share double
> of its steady state needs). So it was puzzling to see it triggering
> preemption. When I turned on debug logging of fair scheduler I noticed
> scheduler demand update thread reporting unusually high demand from Slider
> queue.
> 
> Initial thought was a bug in scheduler but later I concluded its Slider's
> problem but not due to its own code but due to AMRMClient code. I can
> deterministically reproduce the issue on my laptop running a pseudo
> yarn+slider setup.  I traced it to an open issue
> https://issues.apache.org/jira/browse/YARN-3020.
> 
> *The problem: *
> 
> 1. A region server fails for the first time, slider notices it
> and registers a request to RM via AMRMClient for a new container. At this
> time AMRMClient caches this allocation request with the 'Resource' (a data
> structure with memory, cpu & priority) as key.
> (source: AMRMClientImpl.java, cache is remoteRequestsTable)
> 2. A region server fails again, slider notices it and registers a request
> to RM again via AMRMClient for a (one) new container. AMRMClient finds that
> similar Resource request (the memory, cpu and priority for RS doesn't
> change obviously) in its cache, add +1 to the container count before
> putting it over wire.*NOTE*: Slider didn't need 2 containers, but ends up
> receiving 2. When containers are allocated, slider keeps one and discards
> one.
> 3. As explained in YARN-3020, with subsequent failures we will keep asking
> for more and more containers when in reality we always need one.
> 
> For fair scheduler this means demand keeps going up. It doesn't know that
> slider ends up discarding the surplus containers. In order to satisfy the
> demand it kills mercilessly. Needless to say this will not be just
> triggered by container failure, even flexing should trigger this.
> 
> *The fix: *
> 
> Rumor is that AMRMClient doesn't have a bug, its intended behaviour
> (source: comments in  YARN-3020). The claim is that on receiving container
> client should clear the cache by calling a method called
> 'removeContainerRequest'. Slider isn't following the protocol correctly, in
> Slider's defense the protocol is not well defined.
> 
> Thoughts?
> --
> Shrijeet