You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by OpenWhisk Team Slack <ra...@apache.org> on 2019/11/26 09:27:49 UTC
[slack-digest] [2019-11-25] #general

2019-11-25 05:10:08 UTC - chetanm: Which ContainerFactory are you using. From response code it appears some error case. Does controller report any healthy invoker `/invokers` endpoint
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574658608431200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:32:04 UTC - Ali Tariq: i am using defaults, looking at values.yaml it impl: `kubernetes`. I looked at the logs from controller - it looks like invoker is overloaded with requests (`[WebActionsApi] No invokers available [marker:controller_loadbalancer_error:20:2]`)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574659924431400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:36:08 UTC - Ali Tariq: what i don't understand is, shouldn't it simple queue up the extra-requests and continue servicing? instead of sending `down for maintenance response` ? ... plus from my custom logging, i know its not servicing any requests right now - but for some reason its overloaded.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660168431900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:36:57 UTC - chetanm: Its not overloaded but looks like none of the invokers are found to be healthy. Need to check invoker logs if it is able to send health pings
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660217432100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:43:35 UTC - Ali Tariq: i don't see any new logs update in invoker, for any new request i send (request gives back down for maintenance on client side though)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660615432300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:48:08 UTC - chetanm: Yeah looks like currently its tricky to debug this case to determine why invokers are not being considered healthy
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660888432800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:49:07 UTC - chetanm: Check on controller side logs having sid_`invokerHealth`
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660947433000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:49:35 UTC - chetanm: That may give some clue. We need to have some better have to debug through this situation
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660975433200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:36:38 UTC - Ali Tariq: Okay, i kept checking the invoker logs until the invoker becomes available again ... it turns out the is `java.lang.OutOfMemoryError: Java heap space` .  Why would that be the case?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663798433400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:38:29 UTC - chetanm: How big are the response from actions?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663909433800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:39:04 UTC - Ali Tariq: the response are just strings
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663944434000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:39:21 UTC - Ali Tariq: about 20 chars
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663961434200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:40:43 UTC - chetanm: Try increasing the `invoker.jvmHeapMB` which currently defaults to 512MB
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664043434400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:40:56 UTC - Ali Tariq: in the actions ... i connect to a remote server and send logging information, then simple return a finished string response
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664056434600?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:41:40 UTC - Ali Tariq: yeah ... but without knowing the issue , how much should i increase it to?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664100434800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:44:41 UTC - chetanm: Given that you increased the per invoker concurrent container handling the resource requirements would increase. Defaults are there more for basic development settings.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664281435000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:45:40 UTC - chetanm: Try settings it to 2048
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664340435200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:46:02 UTC - chetanm: That should give it more space to work with and those are the defaults used for various test runs
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664362435400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:15:47 UTC - Ali Tariq: changing jvmHeap to `2048` didn't solve the problem, although i did not see any heap exceptions in the invoker-logs this time. the attached snippet show the Unhealthy transitions of `invoker`  from controller's logs (sid_invokerHealth). It just states the state transitions, how can i find details for the cause of these transitions?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666147435800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:16:37 UTC - chetanm: Are all invoker pod up?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666197436300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:18:46 UTC - Ali Tariq: they are up when i send a new burst of requests (i can see) hundreds of new invokers running.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666326436500?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:20:24 UTC - Ali Tariq: not right now! because the invoker is no longer down - (it only happens after i send in a new burst, it will service some chunk of those requests and go down. After some time (5-10 minute), its up again)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666424436700?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:22:57 UTC - Ali Tariq: i just send a burst of 800 requests, i can see 924 invoker-pods in the deployment. And invoker is again down.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666577436900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:24:46 UTC - chetanm: That many invoker pods seems odd. They should be action pods. Invokers are by default configured to be 1 for `KubernetesContainerFactory` mode <https://github.com/apache/openwhisk-deploy-kube/blob/master/helm/openwhisk/values.yaml#L275>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666686437100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:28:30 UTC - Ali Tariq: i believe they are action pods like you said (some shown in the snippet). This is the main invoker pod `owdev-invoker-0                                                 1/1     Running     0          38m`  (not shown in the snippet).
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666910437300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 12:59:39 UTC - volo: hey guys, is openwhisk installable on ECS AWS and docker swarm?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574686779438200
----
2019-11-25 14:44:48 UTC - volo: one more question - is there in openwhisk some management console?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574693088438700
----
2019-11-25 16:07:25 UTC - Rodric Rabbah: There are dashboards you looks at the ops metrics. For example <https://user-images.githubusercontent.com/736614/69112434-0dd79c80-0a35-11ea-9749-761dedc95877.png>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574698045439000
----
2019-11-25 16:07:33 UTC - Rodric Rabbah: can you clarify what you mean by management console
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574698053439500
----
2019-11-25 18:18:29 UTC - Dave Grove: Currently the nginx certificate is the hard-coded in templates/nginx-secret.yaml as the value associated with the `tls.crt` key.  You are right that the current value is expired (actually expired on Oct 1, 2019).    I’ve been meaning to fix this for a while (opened the issue on Oct 1, 2018, last time I had to regenerate an expired cert).  I did make some progress on this last week; hope to be able to finish it off relatively soon.  <https://github.com/apache/openwhisk-deploy-kube/issues/305>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574705909439700?thread_ts=1574608357.430300&cid=C3TPCAQG1
----
2019-11-25 18:26:37 UTC - Dave Grove: Couple of things, you probably need to increase the number of invokers by changing line 275 in values.yaml as Chetan suggested.  Pumping the log processing and input/output of 1000 concurrent actions through a single invoker is probably too much for it to handle.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574706397439900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 18:30:53 UTC - Dave Grove: Also, setting `containerpool.userMemory` to a value of 281600m implies you have about 55GB of RAM available on each worker node to use for user actions.  If you don’t actually have that memory (or at least close to  that memory) you are going to get into all sorts of resource-related problems.   Invoker will try to ask Kubernetes to create more containers, Kubernetes will either refuse (because the resources aren’t there) or it will create them, but your worker node will thrash or start randomly doing OOM kills of containers because it doesn’t have the resources to actually run the container.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574706653440100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----