You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by OpenWhisk Team Slack <ra...@apache.org> on 2019/10/12 09:06:59 UTC
[slack-digest] [2019-10-11] #general

2019-10-11 18:21:00 UTC - Mayank Jha: 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570818060012300?thread_ts=1570818060.012300&cid=C3TPCAQG1
----
2019-10-11 22:58:43 UTC - Gohar Irfan Chaudhry: In the scenario where I'm running OpenWhisk using Docker Compose (and in the Makefile for that, if I've removed `--abort-on-container-exit`) will the invoker not be restarted if it dies?
I am trying to play with the fault tolerance of OpenWhisk and in my experiment, I kill the invoker while it is running a function and what I observe is that running `wsk activation get &lt;ID&gt;` gives me `The requested resource does not exist.` (tried waiting quite some time.) Is it the case that OpenWhisk, independently, does not handle an invoker dying and perhaps relies on Kubernetes to do that? (in my setup, I don't have Kubernetes yet)

Any help understanding this would be appreciated. Thanks. :slightly_smiling_face: @Rodric Rabbah @chetanm
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570834723016200
----
2019-10-11 23:07:48 UTC - Rodric Rabbah: the invoker is the one that creates an activation record in most cases (sequences are done by the controller) - so if it’s killed before it has completed this operation, the current implementation does not retry the activation and a record is not created - so it is possible with the current semantics to have an activation id for which a record will not exist.

this would be a great area to improve on - with at least once semantics, it would be much better than the current best effort at most once semantics
+1 : Gohar Irfan Chaudhry
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835268017900
----
2019-10-11 23:08:19 UTC - Rodric Rabbah: do you know how much heap is available?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835299018000?thread_ts=1570818060.012300&cid=C3TPCAQG1
----
2019-10-11 23:09:07 UTC - Gohar Irfan Chaudhry: And if we have a Kubernetes setup, would that take care of the dying invoker and launch another one?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835347018700?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:10:13 UTC - Gohar Irfan Chaudhry: Or is it the case that with any setup, if the invoker dies then all its running functions (that have not completed or have not successfully sent back a `CompletionMessage` to the controller) just get "lost" and we don't report anything back to the user?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835413020400?thread_ts=1570835413.020400&cid=C3TPCAQG1
----
2019-10-11 23:12:57 UTC - Rodric Rabbah: That’s the current implementation.
+1 : Gohar Irfan Chaudhry
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835577020800?thread_ts=1570835413.020400&cid=C3TPCAQG1
----
2019-10-11 23:13:18 UTC - Rodric Rabbah: It should restart. 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835598021200?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:14:16 UTC - Gohar Irfan Chaudhry: And the functions that the dead invoker was responsible for (which had not completed) would be rescheduled on the new invoker (or some other invoker)? So no functions are lost at all?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570835656021400?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:20:15 UTC - Rodric Rabbah: in the current implementation, a message (to invoke a function) is saved to an event bus (Kafka for example). once an invoker pulls the message off the event bus, it is considered activated.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570836015021800?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:20:52 UTC - Rodric Rabbah: so if the invoker is lost at any point thereafter before it writes the activation record, the activation is also “lost” (wsk activation get will always return 404)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570836052022000?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:21:28 UTC - Rodric Rabbah: a better implementation is to advance the cursor on the event bus only when the processing is fully done, and to retry activations when necessary
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570836088022200?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:21:48 UTC - Rodric Rabbah: this would switch from at most once to at least once but is much better for reliability and fault tolerance
+1 : Gohar Irfan Chaudhry, Bill Zong
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570836108022400?thread_ts=1570835347.018700&cid=C3TPCAQG1
----
2019-10-11 23:34:56 UTC - Mayank Jha: The default is set to 64m which worked on Jetson nano board which has similar ram
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570836896022600?thread_ts=1570818060.012300&cid=C3TPCAQG1
----
2019-10-11 23:37:08 UTC - Rodric Rabbah: not entirely sure it’s related since the error is a stack overlfow
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570837028022800?thread_ts=1570818060.012300&cid=C3TPCAQG1
----
2019-10-11 23:39:18 UTC - Rodric Rabbah: @chetanm ?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1570837158023000?thread_ts=1570818060.012300&cid=C3TPCAQG1
----