You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by Cosmin Stanciu <st...@adobe.com.INVALID> on 2019/11/08 17:40:46 UTC

Re: Load testing ethos k8s

I think having a test that would give us a baseline of what is expected is great. We can use the exact same test to validate whatever proxy we want to put in the middle.
Personally I wouldn’t ditch nginx just yet, it’s being used with much bigger loads than our 4Mil requests a day and if our configurations are to blame, we should try to fix them. I could take on the task of testing it, by deploying a vanilla version and slowly adding all other components and see where it breaks.

Thx,
Cosmin

From: Tyson Norris <tn...@adobe.com>
Date: Thursday, November 7, 2019 at 5:48 PM
To: Grp-bladerunner-dev <Gr...@adobe.com>
Subject: Re: Load testing ethos k8s

Hi –
I’ve had some good success with load testing k8s today without nginx. (not testing capacity issues, just lots of users, lots of activations).
I’ve wondering about brainstorming about how to deal with nginx – so here is some brainstorming.

  *   Is fixing nginx an option? I’m not sure how to comment, so I will ask if anyone thinks this is worth trying? (beyond all the things we have already tried)
  *   If not, what does caching and /apis look like?
     *   /apis I think can potentially look like one of these (non-exhaustive list):
        *   Contour IngressRoutes + a service for CRUD operations (sync?) – that is, use the ethos contour ingress directly for routing /apis, by creating an API that does the CRUD, which is called from apigmnt actions
        *   Custom envoy instance for /apis, where we stuff the users api data to db, and expose the db to envoy RDS, and the route gets configured to send to controller web action endpoint
        *   Other?
     *   Caching
        *   Since currently caching is enabled inside actions per-response currently, it is not easy to do anything except route everyting (all /api/v1/web and /apis) to caching service – one question is whether it would be better to be explicit about this at the action config level instead of forcing devs to code this into their action? E.g. wsk action create --annotation cache 30s
        *   I googled for and didn’t find a caching filter for envoy – what about using a “standalone” cache like varnish? E.g. /api/v1/web -> varnish -> controller. There is some downside here where all requests route through extra hops, even if only a small portion use caching – but this is similar to what happens today (all web requests will hit redis to check cache), afaik (correct me?)
        *   There is a http cache filter “in progress” for envoy here https://github.com/envoyproxy/envoy/pull/7198
        *   Would be interesting to know what the apigateway team is doing here, if anything?



Thanks

Tyson



From: Tyson Norris <tn...@adobe.com>
Date: Wednesday, November 6, 2019 at 2:52 PM
To: Grp-bladerunner-dev <Gr...@adobe.com>
Subject: Re: Load testing ethos k8s

I update the EON issue with as simplified details as possible to communicate the requirements and questions. Let me know if there are questions from runtime side (or comment in the issue)

Thanks
Tyson

From: Tyson Norris <tn...@adobe.com>
Date: Wednesday, November 6, 2019 at 9:44 AM
To: Grp-bladerunner-dev <Gr...@adobe.com>
Subject: Load testing ethos k8s

Hi –
I’ve been able to reasonably get some load tests going against ethos k8s (with nginx removed, so no /apis, and no caching support), where we end up with resourcequota exceeded errors.
There are still some issues to fix on our side (e.g. to make sure failed activations are always properly retried), but we can effectively push on the cluster node scaling issue now.

I asked about resourcequota updates in this issue https://git.corp.adobe.com/adobe-platform/k8s-infrastructure/issues/1859#issuecomment-2048596
And Dharma mentioned inviting someone (thanks Misha!) to the azure capacity planning meeting, but in the meantime while resourcequota is fixed, I think we will only be able to test load that reaches scaling on a dedicated cluster.

I think we should update https://jira.corp.adobe.com/browse/EON-5854 to indicate the need for overprovisioning+scaling type of setup, as opposed to a fixed “20 nodes” size, so that we can begin to exercise cluster scaling issues.

In the meantime, we can continue to work on invoker issues related to these cases.
Thanks
Tyson

Re: Load testing ethos k8s

Posted by Cosmin Stanciu <st...@adobe.com.INVALID>.
Wrong mailing list that start with "dev" :P  Please disregard.