You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openwhisk.apache.org by 蒋鹏程 <ji...@navercorp.com> on 2020/08/31 09:42:43 UTC

[Discussion]Runtime log collection issue on K8S

Hello guys:

 Recently we are trying to migrate OpenWhisk from docker to K8S, but faced some problems, and one of it is: the performance droped a lot(the tps changed from thousands to dozens in our environment during benchmarks) after migrating to K8S, and the reason we founded is that all action containers will trying to fetch action logs from K8S api server after `run` are finished, and these log requests become very slow when there are logs of concurrent activations comes, which in turn leads to a very low tps of activations.

 To resolve such issue, we tried below 2 ways:


mount a shared volume for all action pods and invoker pods, and make invoker read action logs from shared volume directly

use sidecar container for each action container, read and send action logs to an external database(ElasticSearch here), and then invokers fetch logs from the external database
 


 But both 2 approaches has some problems, the shared volume needs some time(several seconds) to sync file contents and ElasticSearch also need to take some seconds to save action logs and return them to invokers.

 Then we finally come up with an another idea, which I think is preferable than above two, that to make action container capture the output of user's codes and then return them within the `run` request in a specified field like `__OW_LOGS`, so invokers can get action logs directly from the response of `run` uri, and this do improve the tps of activations. But this method also has some flaws needs to be solved.

 Any idea about this? I'm also curious that have you faced same performance issue when run openwhisk on K8S and how did you resolved it?

Sincerely
Jiang PengCheng

Re: [Discussion]Runtime log collection issue on K8S

Posted by David P Grove <gr...@us.ibm.com>.


"蒋鹏程" <ji...@navercorp.com> wrote on 08/31/2020 05:42:43 AM:
>
>  Recently we are trying to migrate OpenWhisk from docker to K8S, but
> faced some problems, and one of it is: the performance droped a
> lot(the tps changed from thousands to dozens in our environment
> during benchmarks) after migrating to K8S, and the reason we founded
> is that all action containers will trying to fetch action logs from
> K8S api server after `run` are finished, and these log requests
> become very slow when there are logs of concurrent activations
> comes, which in turn leads to a very low tps of activations.

Yes. Using the Kubernetes API to retrieve activation logs in the invoker is
a performance disaster.  Only usable for development/debugging.

We do mention that in [1] and suggest some workarounds, but maybe this
needs to be highlighted elsewhere.

--dave

[1]
https://github.com/apache/openwhisk-deploy-kube/blob/master/docs/k8s-custom-build-cluster-scaleup.md

Re: [Discussion]Runtime log collection issue on K8S

Posted by 蒋鹏程 <ji...@navercorp.com>.

I use Logstash as the sidecar container to drain action container's logs to Elastic

-----Original Message-----
From: "Rodric Rabbah"<ro...@gmail.com>
To: <de...@openwhisk.apache.org>;
Cc:
Sent: 2020/9/1周二 12:55 (GMT+08:00)
Subject: Re: [Discussion]Runtime log collection issue on K8S
 
Do you drain logs directly from memory to Elastic or do you use the
FileLogStore which then drains the logs to Elastic?

-r

On Mon, Aug 31, 2020 at 5:43 AM 蒋鹏程 <ji...@navercorp.com> wrote:

> Hello guys:
> 
>  Recently we are trying to migrate OpenWhisk from docker to K8S, but faced
> some problems, and one of it is: the performance droped a lot(the tps
> changed from thousands to dozens in our environment during benchmarks)
> after migrating to K8S, and the reason we founded is that all action
> containers will trying to fetch action logs from K8S api server after `run`
> are finished, and these log requests become very slow when there are logs
> of concurrent activations comes, which in turn leads to a very low tps of
> activations.
> 
>  To resolve such issue, we tried below 2 ways:
> 
>
> mount a shared volume for all action pods and invoker pods, and make
> invoker read action logs from shared volume directly
>
> use sidecar container for each action container, read and send action logs
> to an external database(ElasticSearch here), and then invokers fetch logs
> from the external database
>
>
>
>  But both 2 approaches has some problems, the shared volume needs some
> time(several seconds) to sync file contents and ElasticSearch also need to
> take some seconds to save action logs and return them to invokers.
> 
>  Then we finally come up with an another idea, which I think is preferable
> than above two, that to make action container capture the output of user's
> codes and then return them within the `run` request in a specified field
> like `__OW_LOGS`, so invokers can get action logs directly from the
> response of `run` uri, and this do improve the tps of activations. But this
> method also has some flaws needs to be solved.
> 
>  Any idea about this? I'm also curious that have you faced same
> performance issue when run openwhisk on K8S and how did you resolved it?
> 
> Sincerely
> Jiang PengCheng
>

Re: [Discussion]Runtime log collection issue on K8S

Posted by Rodric Rabbah <ro...@gmail.com>.

Do you drain logs directly from memory to Elastic or do you use the
FileLogStore which then drains the logs to Elastic?

-r

On Mon, Aug 31, 2020 at 5:43 AM 蒋鹏程 <ji...@navercorp.com> wrote:

> Hello guys:
> 
>  Recently we are trying to migrate OpenWhisk from docker to K8S, but faced
> some problems, and one of it is: the performance droped a lot(the tps
> changed from thousands to dozens in our environment during benchmarks)
> after migrating to K8S, and the reason we founded is that all action
> containers will trying to fetch action logs from K8S api server after `run`
> are finished, and these log requests become very slow when there are logs
> of concurrent activations comes, which in turn leads to a very low tps of
> activations.
> 
>  To resolve such issue, we tried below 2 ways:
> 
>
> mount a shared volume for all action pods and invoker pods, and make
> invoker read action logs from shared volume directly
>
> use sidecar container for each action container, read and send action logs
> to an external database(ElasticSearch here), and then invokers fetch logs
> from the external database
>
>
>
>  But both 2 approaches has some problems, the shared volume needs some
> time(several seconds) to sync file contents and ElasticSearch also need to
> take some seconds to save action logs and return them to invokers.
> 
>  Then we finally come up with an another idea, which I think is preferable
> than above two, that to make action container capture the output of user's
> codes and then return them within the `run` request in a specified field
> like `__OW_LOGS`, so invokers can get action logs directly from the
> response of `run` uri, and this do improve the tps of activations. But this
> method also has some flaws needs to be solved.
> 
>  Any idea about this? I'm also curious that have you faced same
> performance issue when run openwhisk on K8S and how did you resolved it?
> 
> Sincerely
> Jiang PengCheng
>