You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by Sven Lange-Last <sv...@de.ibm.com> on 2019/07/04 08:36:04 UTC

OpenWhisk invoker overloads - "Rescheduling Run message"

When running Apache OpenWhisk with a high load of heterogeneous action 
invocations - i.e. high variance in memory / time limit and high number of 
different namespaces / actions -, single invokers are occasionally 
overloaded. With this post, I want to share my observations and 
conclusions with the community. As a consequence, I'm planning to provide 
a series of pull requests that are meant to provide more insights when 
running into the invoker overloads.

Please share your observations and conclusions as well. I'm looking 
forward to your feedback.

Invoker overloads can be concluded from the occurrence of the (in)famous 
"Rescheduling Run message" log entry on invokers [1]:

"Rescheduling Run message, too many message in the pool, freePoolSize: 0 
containers and 0 MB, busyPoolSize: 2 containers and 512 MB, 
maxContainersMemory 512 MB, userNamespace: namespace, action: 
ExecutableWhiskAction/namespace/package/action@0.0.1, needed memory: 256 
MB, waiting messages: 0"

Excess activations on overloaded invokers need to wait until any of the 
currently running actions completes. This can take up to the maximum 
action time limit, potentially causing inacceptable wait times for these 
excess activations.

In large OW installations, we see thousands of said log entries at times 
in a few days. Do other OW adoptors share these observations?

I'm aware of two conditions that cause overloaded invokers by design:

1. A controller schedules an activation - but no acknowledgement is 
received from the invoker within the expected time because the invoker 
takes too long to complete the activation. Based on timer expiration, a 
forced acknowledgement removes said activation from the load balancer's 
memory book-keeping. With the released capacity, the load balancer 
schedules a new activation to the invoker - which may still be running the 
action that timed out in the controller before.

2. A controller cannot identify an invoker that has enough free capacity 
to schedule an activation - in particular, this can happen if the 
activation's memory limit is equal to the controller's shard memory size 
on the invoker. If there is at least one usable invoker, the load balancer 
will select a random invoker to schedule the activation. This situation is 
called overload in the load balancer code and will yield a book-keeping 
semaphore with a negative value. Apparently, the selected invoker cannot 
process the scheduled activation.

Did I miss other conditions that cause overloaded invokers by design?

I suspect that there are additional causes for overloaded invokers - 
design flaws in the controller / load balancer or even bugs. I'm 
suggesting to extend log messages and improve existing metrics / introduce 
new metrics to better understand what's going on with overloaded invokers. 
We need to be careful when extending log messages - we must neither 
considerably increase the log volume nor impact performance due to 
additional operations. Overall, the goal is to eventually fix invoker 
overloads. 

I already opened a pull request (together with Sugandha) to add the action 
timeout limit to the invoker assignment message in load balancer plus some 
other changes [2]. Please check the PR for details.

I'm planning further pull requests in these areas:

* At times, we see a non-negligible amount of forced acknowledgements in 
large OW installations. For this reason, I suggest to extend log messages 
in the processCompletion() method [3] in the common load balancer code to 
provide more diagnostic information when forced acknowledgements occur. In 
particular, I want to add more information about the invoker processing 
the activation and the action itself. A metric reflecting forced 
acknowledgements also seems helpful.

* As discussed above, the load balancer will schedule an activation to a 
randomly selected usable invoker if it cannot find a usable invoker that 
has enough free user memory ("overload"). This can also be caused by 
fragmentation where all invokers are running activations with small memory 
limits and the activation to be scheduled has a very high memory limit. 
Even though the invoker pool may have plenty of free user memory in total, 
no single invoker may have enough free memory to fit a large activation.

For this reason, I'm planning to extend the schedule() method [4] in the 
ShardingContainerPoolBalancer to collect more information about scheduling 
that is logged afterwards: how many invokers were visited? Which minimum, 
average and maximum free memory did the usable invokers have that were not 
selected?

* When the "Rescheduling Run message" log entry on invokers [1] occurs, we 
don't know what's currently going on in busy and free container pools. I'm 
planning to extend the log message with more detail information about the 
pools to better understand the scheduling history of this invoker. We need 
to understand which activations currently occupy the invoker.

Please let me know what you think.

[1] 
https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala#L175-L199
[2] https://github.com/apache/incubator-openwhisk/pull/4537
[3] 
https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/CommonLoadBalancer.scala#L217-L222
[4] 
https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/ShardingContainerPoolBalancer.scala#L377-L398


Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-last@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294



Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Tyson Norris <tn...@adobe.com.INVALID>.
   The PR you referenced above contains a lot of 
    other changes. It does not only improve this particular area but also 
    includes a lot of other changes - in particular, it adds a different way 
    of managing containers. Due to the PR's size and complexity, it's very 
    hard to understand and review... Would you be able to split this PR up 
    into smaller changes?

Indeed, I will split the PR ASAP. There are several changes there that are candidates for this, I think, so will try to create PR for each.
Thanks
Tyson
    


Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Tyson Norris <tn...@adobe.com.INVALID>.
Hi Sven (and everyone!) - 
At long last I finally worked through some tests for separating out this feature (that changes how rescheduled Run messages are handled) into an isolated PR. 
Please see https://github.com/apache/openwhisk/pull/4593

I plan to do some more load testing on this, but so far no problems.

Thanks
Tyson

On 7/19/19, 7:49 AM, "Sven Lange-Last" <sv...@de.ibm.com> wrote:

    Hello Tyson,
    
    regarding your feedback:
    
    > Related to the "Rescheduling Run message", one problem we have 
    > encountered in these cases is that the invoker becomes unstable due 
    > ( I think) to a tight message loop, since the message that couldn't 
    > run is immediately resent to the pool to be run, which fails again, 
    > etc. We saw CPU getting pegged, and invoker eventually would crash.
    > I have a PR related to cluster managed resources where, among other 
    > things, this message looping is removed:
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl&amp;data=02%7C01%7Ctnorris%40adobe.com%7Cd94246810f0a49ab04bd08d70c58436e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636991445537142011&amp;sdata=fMJ4pQlV0B51ILNJOb0FCqRPWbN%2Bh8oAJn22behCML8%3D&amp;reserved=0?
    > 
    u=https-3A__github.com_apache_incubator-2Dopenwhisk_pull_4326_files-23diff-2D726b36b3ab8c7cff0b93dead84311839L198&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > 
    siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=KEJSKEJwE-
    > zaTlnh8fovCFY4vY_uWmAQsgDsTkfueRI&e= 
    > 
    > Instead of resending the message to the pool immediately, it just 
    > waits in the runbuffer, and the runbuffer is processed in reaction 
    > to any potential change in resources: NeedWork, ContainerRemoved, 
    > etc. This may add delay to any buffered message(s), but seems to 
    > avoid the catastrophic crash in our systems. 
    
    From my point of view, your proposal on changing processing of rescheduled 
    Run messages makes sense. The PR you referenced above contains a lot of 
    other changes. It does not only improve this particular area but also 
    includes a lot of other changes - in particular, it adds a different way 
    of managing containers. Due to the PR's size and complexity, it's very 
    hard to understand and review... Would you be able to split this PR up 
    into smaller changes?
    
    
    Mit freundlichen Grüßen / Regards,
    
    Sven Lange-Last
    Senior Software Engineer
    IBM Cloud Functions
    Apache OpenWhisk
    
    
    E-mail: sven.lange-last@de.ibm.com
    Find me on:  
    
    
    Schoenaicher Str. 220
    Boeblingen, 71032
    Germany
    
    
    
    
    IBM Deutschland Research & Development GmbH
    Vorsitzende des Aufsichtsrats: Martina Koederitz
    Geschäftsführung: Dirk Wittkopp
    Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
    HRB 243294
    
    
    Tyson Norris <tn...@adobe.com.INVALID> wrote on 2019/07/08 17:52:14:
    
    > From: Tyson Norris <tn...@adobe.com.INVALID>
    > To: "dev@openwhisk.apache.org" <de...@openwhisk.apache.org>
    > Date: 2019/07/08 18:01
    > Subject: [EXTERNAL] Re:  Re: OpenWhisk invoker overloads - 
    > "Rescheduling Run message"
    > 
    > Related to the "Rescheduling Run message", one problem we have 
    > encountered in these cases is that the invoker becomes unstable due 
    > ( I think) to a tight message loop, since the message that couldn't 
    > run is immediately resent to the pool to be run, which fails again, 
    > etc. We saw CPU getting pegged, and invoker eventually would crash.
    > I have a PR related to cluster managed resources where, among other 
    > things, this message looping is removed:
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl&amp;data=02%7C01%7Ctnorris%40adobe.com%7Cd94246810f0a49ab04bd08d70c58436e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636991445537142011&amp;sdata=fMJ4pQlV0B51ILNJOb0FCqRPWbN%2Bh8oAJn22behCML8%3D&amp;reserved=0?
    > 
    u=https-3A__github.com_apache_incubator-2Dopenwhisk_pull_4326_files-23diff-2D726b36b3ab8c7cff0b93dead84311839L198&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > 
    siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=KEJSKEJwE-
    > zaTlnh8fovCFY4vY_uWmAQsgDsTkfueRI&e= 
    > 
    > Instead of resending the message to the pool immediately, it just 
    > waits in the runbuffer, and the runbuffer is processed in reaction 
    > to any potential change in resources: NeedWork, ContainerRemoved, 
    > etc. This may add delay to any buffered message(s), but seems to 
    > avoid the catastrophic crash in our systems. 
    > 
    > Thanks
    > Tyson
    > 
    > On 7/5/19, 1:16 AM, "Sven Lange-Last" <sv...@de.ibm.com> 
    wrote:
    > 
    >     Hello Dominic,
    > 
    >     thanks for your detailed response.
    > 
    >     I guess your understanding is right - just this small correction:
    > 
    >     > So the main issue here is there are too many "Rescheduling 
    > Run" messages 
    >     in invokers?
    > 
    >     It's not the main issue to see these log entries in the invoker. 
    This is 
    >     just the indication that something is going wrong in the invoker - 
    more 
    >     activations are waiting to be processed than the ContainerPool can 
    >     currently serve.
    > 
    >     Actually, there are different reasons why "Rescheduling Run message" 
    log 
    >     entries can show up in the invoker:
    > 
    >     1. Controllers send too many activations to an invoker.
    > 
    >     2. In the invoker, the container pool sends a Run message to a 
    container 
    >     proxy but the container proxy fails to process it properly and hands 
    it 
    >     back to the container pool. Examples: a Run message arrives while 
    the 
    >     proxy is already removing the container; if concurrency>1, the proxy 
    
    >     buffers Run messages and returns them in failure situations.
    > 
    >     Although I'm not 100% sure, I see more indications for reason 1 in 
    our 
    >     logs than for reason 2.
    > 
    >     Regarding hypothesis "#controllers * getInvokerSlot(invoker user 
    memory 
    >     size) > invoker user memory size": I can rule out this hypothesis in 
    our 
    >     environments. We have "#controllers * getInvokerSlot(invoker user 
    memory 
    >     size) = invoker user memory size". I provided PR [1] to be sure 
    about 
    >     that.
    > 
    >     Regarding hypothesis "invoker simply pulls too many Run messages 
    from 
    >     MessageFeed". I think the part you described is perfectly right. The 
    
    >     questions remains why controllers send too many Run messages or a 
    Run 
    >     message with an activation that is larger than free memory capacity 
    >     currently available in the pool.
    > 
    >     The load balancer has a memory book-keeping for all of its invoker 
    shards 
    >     (memory size determined by getInvokerSlot()) so the load balancer is 
    
    >     supposed to only schedule an activation to an invoker if the 
    required 
    >     memory does not exceed controller's shard of the invoker. Even if 
    >     resulting Run messages arrive on the invoker in a changed order, the 
    
    >     invoker's shard free memory should be sufficient.
    > 
    >     Do you see a considerable number of "Rescheduling Run message" 
    > log entries 
    >     in your environments?
    > 
    >     [1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl&amp;data=02%7C01%7Ctnorris%40adobe.com%7Cd94246810f0a49ab04bd08d70c58436e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636991445537152001&amp;sdata=AgO9628fQob9mb%2FA%2B2ELlTKgF7gLqBQrlbGBg0Yz7wk%3D&amp;reserved=0?
    > 
    u=https-3A__nam04.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fincubator-2Dopenwhisk-252Fpull-252F4520-26amp-3Bdata-3D02-257C01-257Ctnorris-2540adobe.com-257Ca7b761bd61e943c82fd308d701211f37-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C1-257C636979114118405554-26amp-3Bsdata-3DtRnHZ-252FN2bXgR4fWSIhvdrzCAvNmPX-252FW-252BY4BwwmVFKl0-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=jf_iaSHvJObTbx-
    > 
    siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=0I0tqwtW56yO7l6zPWNNuSLlZJNYGFsQNsoq56ArSQY&e=
    > 
    > 
    >     Mit freundlichen Grüßen / Regards,
    > 
    >     Sven Lange-Last
    >     Senior Software Engineer
    >     IBM Cloud Functions
    >     Apache OpenWhisk
    > 
    > 
    >     E-mail: sven.lange-last@de.ibm.com
    >     Find me on: 
    > 
    > 
    >     Schoenaicher Str. 220
    >     Boeblingen, 71032
    >     Germany
    > 
    > 
    > 
    > 
    >     IBM Deutschland Research & Development GmbH
    >     Vorsitzende des Aufsichtsrats: Martina Koederitz
    >     Geschäftsführung: Dirk Wittkopp
    >     Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
    > Stuttgart, 
    >     HRB 243294
    > 
    > 
    > 
    > 
    > 
    
    
    


Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Sven Lange-Last <sv...@de.ibm.com>.
Hello Tyson,

regarding your feedback:

> Related to the "Rescheduling Run message", one problem we have 
> encountered in these cases is that the invoker becomes unstable due 
> ( I think) to a tight message loop, since the message that couldn't 
> run is immediately resent to the pool to be run, which fails again, 
> etc. We saw CPU getting pegged, and invoker eventually would crash.
> I have a PR related to cluster managed resources where, among other 
> things, this message looping is removed:
> https://urldefense.proofpoint.com/v2/url?
> 
u=https-3A__github.com_apache_incubator-2Dopenwhisk_pull_4326_files-23diff-2D726b36b3ab8c7cff0b93dead84311839L198&d=DwIGaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=KEJSKEJwE-
> zaTlnh8fovCFY4vY_uWmAQsgDsTkfueRI&e= 
> 
> Instead of resending the message to the pool immediately, it just 
> waits in the runbuffer, and the runbuffer is processed in reaction 
> to any potential change in resources: NeedWork, ContainerRemoved, 
> etc. This may add delay to any buffered message(s), but seems to 
> avoid the catastrophic crash in our systems. 

From my point of view, your proposal on changing processing of rescheduled 
Run messages makes sense. The PR you referenced above contains a lot of 
other changes. It does not only improve this particular area but also 
includes a lot of other changes - in particular, it adds a different way 
of managing containers. Due to the PR's size and complexity, it's very 
hard to understand and review... Would you be able to split this PR up 
into smaller changes?


Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-last@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294


Tyson Norris <tn...@adobe.com.INVALID> wrote on 2019/07/08 17:52:14:

> From: Tyson Norris <tn...@adobe.com.INVALID>
> To: "dev@openwhisk.apache.org" <de...@openwhisk.apache.org>
> Date: 2019/07/08 18:01
> Subject: [EXTERNAL] Re:  Re: OpenWhisk invoker overloads - 
> "Rescheduling Run message"
> 
> Related to the "Rescheduling Run message", one problem we have 
> encountered in these cases is that the invoker becomes unstable due 
> ( I think) to a tight message loop, since the message that couldn't 
> run is immediately resent to the pool to be run, which fails again, 
> etc. We saw CPU getting pegged, and invoker eventually would crash.
> I have a PR related to cluster managed resources where, among other 
> things, this message looping is removed:
> https://urldefense.proofpoint.com/v2/url?
> 
u=https-3A__github.com_apache_incubator-2Dopenwhisk_pull_4326_files-23diff-2D726b36b3ab8c7cff0b93dead84311839L198&d=DwIGaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=KEJSKEJwE-
> zaTlnh8fovCFY4vY_uWmAQsgDsTkfueRI&e= 
> 
> Instead of resending the message to the pool immediately, it just 
> waits in the runbuffer, and the runbuffer is processed in reaction 
> to any potential change in resources: NeedWork, ContainerRemoved, 
> etc. This may add delay to any buffered message(s), but seems to 
> avoid the catastrophic crash in our systems. 
> 
> Thanks
> Tyson
> 
> On 7/5/19, 1:16 AM, "Sven Lange-Last" <sv...@de.ibm.com> 
wrote:
> 
>     Hello Dominic,
> 
>     thanks for your detailed response.
> 
>     I guess your understanding is right - just this small correction:
> 
>     > So the main issue here is there are too many "Rescheduling 
> Run" messages 
>     in invokers?
> 
>     It's not the main issue to see these log entries in the invoker. 
This is 
>     just the indication that something is going wrong in the invoker - 
more 
>     activations are waiting to be processed than the ContainerPool can 
>     currently serve.
> 
>     Actually, there are different reasons why "Rescheduling Run message" 
log 
>     entries can show up in the invoker:
> 
>     1. Controllers send too many activations to an invoker.
> 
>     2. In the invoker, the container pool sends a Run message to a 
container 
>     proxy but the container proxy fails to process it properly and hands 
it 
>     back to the container pool. Examples: a Run message arrives while 
the 
>     proxy is already removing the container; if concurrency>1, the proxy 

>     buffers Run messages and returns them in failure situations.
> 
>     Although I'm not 100% sure, I see more indications for reason 1 in 
our 
>     logs than for reason 2.
> 
>     Regarding hypothesis "#controllers * getInvokerSlot(invoker user 
memory 
>     size) > invoker user memory size": I can rule out this hypothesis in 
our 
>     environments. We have "#controllers * getInvokerSlot(invoker user 
memory 
>     size) = invoker user memory size". I provided PR [1] to be sure 
about 
>     that.
> 
>     Regarding hypothesis "invoker simply pulls too many Run messages 
from 
>     MessageFeed". I think the part you described is perfectly right. The 

>     questions remains why controllers send too many Run messages or a 
Run 
>     message with an activation that is larger than free memory capacity 
>     currently available in the pool.
> 
>     The load balancer has a memory book-keeping for all of its invoker 
shards 
>     (memory size determined by getInvokerSlot()) so the load balancer is 

>     supposed to only schedule an activation to an invoker if the 
required 
>     memory does not exceed controller's shard of the invoker. Even if 
>     resulting Run messages arrive on the invoker in a changed order, the 

>     invoker's shard free memory should be sufficient.
> 
>     Do you see a considerable number of "Rescheduling Run message" 
> log entries 
>     in your environments?
> 
>     [1] https://urldefense.proofpoint.com/v2/url?
> 
u=https-3A__nam04.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fincubator-2Dopenwhisk-252Fpull-252F4520-26amp-3Bdata-3D02-257C01-257Ctnorris-2540adobe.com-257Ca7b761bd61e943c82fd308d701211f37-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C1-257C636979114118405554-26amp-3Bsdata-3DtRnHZ-252FN2bXgR4fWSIhvdrzCAvNmPX-252FW-252BY4BwwmVFKl0-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=jf_iaSHvJObTbx-
> 
siA1ZOg&r=Q324lzlz3X6vUQUlgmuIdvLXO6nnIRzq6I6LyOBKHBs&m=yqwkeUxYxei_G_X3fWA0cYYm47ekuejeO6sRUKUwUos&s=0I0tqwtW56yO7l6zPWNNuSLlZJNYGFsQNsoq56ArSQY&e=
> 
> 
>     Mit freundlichen Grüßen / Regards,
> 
>     Sven Lange-Last
>     Senior Software Engineer
>     IBM Cloud Functions
>     Apache OpenWhisk
> 
> 
>     E-mail: sven.lange-last@de.ibm.com
>     Find me on: 
> 
> 
>     Schoenaicher Str. 220
>     Boeblingen, 71032
>     Germany
> 
> 
> 
> 
>     IBM Deutschland Research & Development GmbH
>     Vorsitzende des Aufsichtsrats: Martina Koederitz
>     Geschäftsführung: Dirk Wittkopp
>     Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
> Stuttgart, 
>     HRB 243294
> 
> 
> 
> 
> 



Re: Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Rodric Rabbah <ro...@gmail.com>.
> Instead of resending the message to the pool immediately, it just waits
in the runbuffer, and the runbuffer is processed in reaction to any
potential change in resources: NeedWork, ContainerRemoved, etc. This may
add delay to any buffered message(s), but seems to avoid the catastrophic
crash in our systems.

This makes sense since the rescheduling is really an indication of
something going badly at the container level from my recollection. A better
solution might be to reschedule the request to another invoker for some
fairness criteria (but not easily guaranteed with the current architecture).

(This is testing my memory but...) we used to see these in a previous
incarnation of the scheduler as a precursor to docker daemon going out to
lunch. (Markus might remember better.)

-r

Re: Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Tyson Norris <tn...@adobe.com.INVALID>.
Related to the "Rescheduling Run message", one problem we have encountered in these cases is that the invoker becomes unstable due ( I think) to a tight message loop, since the message that couldn't run is immediately resent to the pool to be run, which fails again, etc. We saw CPU getting pegged, and invoker eventually would crash.
I have a PR related to cluster managed resources where, among other things, this message looping is removed:
https://github.com/apache/incubator-openwhisk/pull/4326/files#diff-726b36b3ab8c7cff0b93dead84311839L198

Instead of resending the message to the pool immediately, it just waits in the runbuffer, and the runbuffer is processed in reaction to any potential change in resources: NeedWork, ContainerRemoved, etc. This may add delay to any buffered message(s), but seems to avoid the catastrophic crash in our systems. 

Thanks
Tyson

On 7/5/19, 1:16 AM, "Sven Lange-Last" <sv...@de.ibm.com> wrote:

    Hello Dominic,
    
    thanks for your detailed response.
    
    I guess your understanding is right - just this small correction:
    
    > So the main issue here is there are too many "Rescheduling Run" messages 
    in invokers?
    
    It's not the main issue to see these log entries in the invoker. This is 
    just the indication that something is going wrong in the invoker - more 
    activations are waiting to be processed than the ContainerPool can 
    currently serve.
    
    Actually, there are different reasons why "Rescheduling Run message" log 
    entries can show up in the invoker:
    
    1. Controllers send too many activations to an invoker.
    
    2. In the invoker, the container pool sends a Run message to a container 
    proxy but the container proxy fails to process it properly and hands it 
    back to the container pool. Examples: a Run message arrives while the 
    proxy is already removing the container; if concurrency>1, the proxy 
    buffers Run messages and returns them in failure situations.
    
    Although I'm not 100% sure, I see more indications for reason 1 in our 
    logs than for reason 2.
    
    Regarding hypothesis "#controllers * getInvokerSlot(invoker user memory 
    size) > invoker user memory size": I can rule out this hypothesis in our 
    environments. We have "#controllers * getInvokerSlot(invoker user memory 
    size) = invoker user memory size". I provided PR [1] to be sure about 
    that.
    
    Regarding hypothesis "invoker simply pulls too many Run messages from 
    MessageFeed". I think the part you described is perfectly right. The 
    questions remains why controllers send too many Run messages or a Run 
    message with an activation that is larger than free memory capacity 
    currently available in the pool.
    
    The load balancer has a memory book-keeping for all of its invoker shards 
    (memory size determined by getInvokerSlot()) so the load balancer is 
    supposed to only schedule an activation to an invoker if the required 
    memory does not exceed controller's shard of the invoker. Even if 
    resulting Run messages arrive on the invoker in a changed order, the 
    invoker's shard free memory should be sufficient.
    
    Do you see a considerable number of "Rescheduling Run message" log entries 
    in your environments?
    
    [1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fpull%2F4520&amp;data=02%7C01%7Ctnorris%40adobe.com%7Ca7b761bd61e943c82fd308d701211f37%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636979114118405554&amp;sdata=tRnHZ%2FN2bXgR4fWSIhvdrzCAvNmPX%2FW%2BY4BwwmVFKl0%3D&amp;reserved=0
    
    
    Mit freundlichen Grüßen / Regards,
    
    Sven Lange-Last
    Senior Software Engineer
    IBM Cloud Functions
    Apache OpenWhisk
    
    
    E-mail: sven.lange-last@de.ibm.com
    Find me on:  
    
    
    Schoenaicher Str. 220
    Boeblingen, 71032
    Germany
    
    
    
    
    IBM Deutschland Research & Development GmbH
    Vorsitzende des Aufsichtsrats: Martina Koederitz
    Geschäftsführung: Dirk Wittkopp
    Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
    HRB 243294
    
    
    


Re: Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Sven Lange-Last <sv...@de.ibm.com>.
Hello Dominic,

thanks for your detailed response.

I guess your understanding is right - just this small correction:

> So the main issue here is there are too many "Rescheduling Run" messages 
in invokers?

It's not the main issue to see these log entries in the invoker. This is 
just the indication that something is going wrong in the invoker - more 
activations are waiting to be processed than the ContainerPool can 
currently serve.

Actually, there are different reasons why "Rescheduling Run message" log 
entries can show up in the invoker:

1. Controllers send too many activations to an invoker.

2. In the invoker, the container pool sends a Run message to a container 
proxy but the container proxy fails to process it properly and hands it 
back to the container pool. Examples: a Run message arrives while the 
proxy is already removing the container; if concurrency>1, the proxy 
buffers Run messages and returns them in failure situations.

Although I'm not 100% sure, I see more indications for reason 1 in our 
logs than for reason 2.

Regarding hypothesis "#controllers * getInvokerSlot(invoker user memory 
size) > invoker user memory size": I can rule out this hypothesis in our 
environments. We have "#controllers * getInvokerSlot(invoker user memory 
size) = invoker user memory size". I provided PR [1] to be sure about 
that.

Regarding hypothesis "invoker simply pulls too many Run messages from 
MessageFeed". I think the part you described is perfectly right. The 
questions remains why controllers send too many Run messages or a Run 
message with an activation that is larger than free memory capacity 
currently available in the pool.

The load balancer has a memory book-keeping for all of its invoker shards 
(memory size determined by getInvokerSlot()) so the load balancer is 
supposed to only schedule an activation to an invoker if the required 
memory does not exceed controller's shard of the invoker. Even if 
resulting Run messages arrive on the invoker in a changed order, the 
invoker's shard free memory should be sufficient.

Do you see a considerable number of "Rescheduling Run message" log entries 
in your environments?

[1] https://github.com/apache/incubator-openwhisk/pull/4520


Mit freundlichen Grüßen / Regards,

Sven Lange-Last
Senior Software Engineer
IBM Cloud Functions
Apache OpenWhisk


E-mail: sven.lange-last@de.ibm.com
Find me on:  


Schoenaicher Str. 220
Boeblingen, 71032
Germany




IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294



Re: OpenWhisk invoker overloads - "Rescheduling Run message"

Posted by Dominic Kim <st...@gmail.com>.
First of all, I totally agree with adding more descriptive logs.

Let me rephrase your sentences to see whether I understood the issue
correctly.

So the main issue here is there are too many "Rescheduling Run" messages in
invokers?
Since controllers are given their own resource for all invokers and they
schedule activations based on their own semaphore, it is not supposed to
see such messages many times.
But it occasionally happens in the real scene and you suspect this is
because of some flaws or bugs in the system.
Do I understand correctly?

If yes, two things just come up in my mind.

1. getInvokerSlot
https://github.com/apache/incubator-openwhisk/blob/master/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/ShardingContainerPoolBalancer.scala#L477

As you know each invoker memory is divided by all controllers, and if the
given memory is less than the minimum size, controllers are given the
minimum memory size.

private def getInvokerSlot(memory: ByteSize): ByteSize = {
    val newTreshold = if (memory / _clusterSize < MemoryLimit.minMemory) {
      logging.warn(
        this,
        s"registered controllers: ${_clusterSize}: the slots per invoker
fall below the min memory of one action.")(
        TransactionId.loadbalancer)
      MemoryLimit.minMemory      // here
    } else {
      memory / _clusterSize
    }
    newTreshold
  }

This can easily happen with small invokers in a large scale deployment.
For example, invokers have 10GB memory and there are 100 controllers, each
controller would be only given 100MB.
Then all controllers would have 128MB for all invokers and if they schedule
activations to one invoker, then 128MB * 100 can be scheduled to an invoker.
Clearly, it is more than the invoker can handle.



2. MessageFeed

Currently, invokers have their own memory assigned and create containers
based on it.
But the MessageFeed does not care about the memory.
https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/org/apache/openwhisk/core/connector/MessageConsumer.scala#L106

It still uses "MaxPeek" to decide whether it will forward messages to
InvokerReactive.
So if there is only 256MB remaining in the invoker when the InvokerReactive
sends MessageFeed.Processed to the MessageFeed,
MessageFeed will send one message to the invoker regardless of the
remaining memory.
If the message requires a higher MemoryLimit than 256MB, it will induce a
"Rescheduling Run" message.


I hope this helps you to diagnose the issue.
And if I missed or understood wrong anything, kindly let me know.

Best regards
Dominic





2019년 7월 4일 (목) 오후 5:39, Sven Lange-Last <sv...@de.ibm.com>님이 작성:

> When running Apache OpenWhisk with a high load of heterogeneous action
> invocations - i.e. high variance in memory / time limit and high number of
> different namespaces / actions -, single invokers are occasionally
> overloaded. With this post, I want to share my observations and
> conclusions with the community. As a consequence, I'm planning to provide
> a series of pull requests that are meant to provide more insights when
> running into the invoker overloads.
>
> Please share your observations and conclusions as well. I'm looking
> forward to your feedback.
>
> Invoker overloads can be concluded from the occurrence of the (in)famous
> "Rescheduling Run message" log entry on invokers [1]:
>
> "Rescheduling Run message, too many message in the pool, freePoolSize: 0
> containers and 0 MB, busyPoolSize: 2 containers and 512 MB,
> maxContainersMemory 512 MB, userNamespace: namespace, action:
> ExecutableWhiskAction/namespace/package/action@0.0.1, needed memory: 256
> MB, waiting messages: 0"
>
> Excess activations on overloaded invokers need to wait until any of the
> currently running actions completes. This can take up to the maximum
> action time limit, potentially causing inacceptable wait times for these
> excess activations.
>
> In large OW installations, we see thousands of said log entries at times
> in a few days. Do other OW adoptors share these observations?
>
> I'm aware of two conditions that cause overloaded invokers by design:
>
> 1. A controller schedules an activation - but no acknowledgement is
> received from the invoker within the expected time because the invoker
> takes too long to complete the activation. Based on timer expiration, a
> forced acknowledgement removes said activation from the load balancer's
> memory book-keeping. With the released capacity, the load balancer
> schedules a new activation to the invoker - which may still be running the
> action that timed out in the controller before.
>
> 2. A controller cannot identify an invoker that has enough free capacity
> to schedule an activation - in particular, this can happen if the
> activation's memory limit is equal to the controller's shard memory size
> on the invoker. If there is at least one usable invoker, the load balancer
> will select a random invoker to schedule the activation. This situation is
> called overload in the load balancer code and will yield a book-keeping
> semaphore with a negative value. Apparently, the selected invoker cannot
> process the scheduled activation.
>
> Did I miss other conditions that cause overloaded invokers by design?
>
> I suspect that there are additional causes for overloaded invokers -
> design flaws in the controller / load balancer or even bugs. I'm
> suggesting to extend log messages and improve existing metrics / introduce
> new metrics to better understand what's going on with overloaded invokers.
> We need to be careful when extending log messages - we must neither
> considerably increase the log volume nor impact performance due to
> additional operations. Overall, the goal is to eventually fix invoker
> overloads.
>
> I already opened a pull request (together with Sugandha) to add the action
> timeout limit to the invoker assignment message in load balancer plus some
> other changes [2]. Please check the PR for details.
>
> I'm planning further pull requests in these areas:
>
> * At times, we see a non-negligible amount of forced acknowledgements in
> large OW installations. For this reason, I suggest to extend log messages
> in the processCompletion() method [3] in the common load balancer code to
> provide more diagnostic information when forced acknowledgements occur. In
> particular, I want to add more information about the invoker processing
> the activation and the action itself. A metric reflecting forced
> acknowledgements also seems helpful.
>
> * As discussed above, the load balancer will schedule an activation to a
> randomly selected usable invoker if it cannot find a usable invoker that
> has enough free user memory ("overload"). This can also be caused by
> fragmentation where all invokers are running activations with small memory
> limits and the activation to be scheduled has a very high memory limit.
> Even though the invoker pool may have plenty of free user memory in total,
> no single invoker may have enough free memory to fit a large activation.
>
> For this reason, I'm planning to extend the schedule() method [4] in the
> ShardingContainerPoolBalancer to collect more information about scheduling
> that is logged afterwards: how many invokers were visited? Which minimum,
> average and maximum free memory did the usable invokers have that were not
> selected?
>
> * When the "Rescheduling Run message" log entry on invokers [1] occurs, we
> don't know what's currently going on in busy and free container pools. I'm
> planning to extend the log message with more detail information about the
> pools to better understand the scheduling history of this invoker. We need
> to understand which activations currently occupy the invoker.
>
> Please let me know what you think.
>
> [1]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala#L175-L199
> [2] https://github.com/apache/incubator-openwhisk/pull/4537
> [3]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/CommonLoadBalancer.scala#L217-L222
> [4]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/ShardingContainerPoolBalancer.scala#L377-L398
>
>
> Mit freundlichen Grüßen / Regards,
>
> Sven Lange-Last
> Senior Software Engineer
> IBM Cloud Functions
> Apache OpenWhisk
>
>
> E-mail: sven.lange-last@de.ibm.com
> Find me on:
>
>
> Schoenaicher Str. 220
> Boeblingen, 71032
> Germany
>
>
>
>
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,
> HRB 243294
>
>
>