You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openwhisk.apache.org by Markus Thoemmes <ma...@de.ibm.com> on 2018/07/13 17:29:12 UTC

Proposal on a future architecture of OpenWhisk

Hello OpenWhiskers,

I just published a proposal on a potential future architecture for OpenWhisk that aligns deployments with and without an underlying container orchestrator like Mesos or Kubernetes. It also incooperates some of the proposals that are already out there and tries to give a holistic view of where we want OpenWhisk to go to in the near future. It's designed to keep the APIs stable but is very invasive in its changes under the hood.

This proposal is the outcome of a lot of discussions with fellow colleagues and community members. It is based on experience with the problems the current architecture has. Moreover it aims to remove friction with the deployment topologies on top of a container orchestrator.

Feedback is very very very welcome! The proposal has some gaps and generally does not go into much detail implementationwise. I'd love to see all those gaps filled by the community!

Find the proposal here: https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future+architecture

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by TzuChiao Yeh <su...@gmail.com>.

Hi Markus,

Yes, I agree that storing activation records should be a separated
discussion. Pipe activation records into logging system (elasticsearch,
kibana) will be cool!

But I think I'm not asking these now though, however, thanks for pointing
these out, looks interesting.

I think I got some misunderstanding. Originally, I considered some edge
cases once invoker got failed during responding back with active-ack, but
there's no recovery/retry logic from now (therefore so-called best-effort).
However, whether supporting stronger execution guarantee may not be
discussed here now, but this indeed will be different mechanism if we
bypassing Kafka or not.

Thanks for answering me anyway,
Tzuchiao

On Tue, Jul 17, 2018 at 4:49 PM Markus Thoemmes <ma...@de.ibm.com>
wrote:

> Hi Tzu-Chiao,
>
> great questions although I'd relay those into a seperate discussion. The
> design proposed does not intent to change the way we provide
> oberservibility via persisting activation records. The controller takes
> that responsibility in the design.
>
> It is fair to open a discussion on what our plans for the activation
> record itself are though, in the future. There is a lot of work going on in
> that area currently, with Vadim implementing user-facing metrics (which can
> serve of part of what activation records do) and James implementing
> different ActivationStores with the intention to eventually moving
> activation records to the logging system.
>
> Another angle here is that both of these solutions drop persistence of the
> activation result by default, since it is potentially a large blob.
> Persisting logs into CouchDB doesn't really scale either so there are a
> couple of LogStores to shift that burden away. What remains is largely a
> small, bounded record of some metrics per activation. I'll be happy to see
> a separate proposal + discussion on where we want to take this in the
> future :)
>
> Cheers,
> Markus
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Rodric Rabbah <ro...@gmail.com>.

> On Jul 17, 2018, at 4:49 AM, Markus Thoemmes <ma...@de.ibm.com> wrote:
> 
> The design proposed does not intent to change the way we provide oberservibility via persisting activation records.

It is worth considering how we can provide observability for activations in flight. As it stands today, as a user you get to see when the action has finished (if we record the record successfully). But given an activation id you cannot query the status otherwise: either the record exists, or not found.

-r

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Tzu-Chiao,

great questions although I'd relay those into a seperate discussion. The design proposed does not intent to change the way we provide oberservibility via persisting activation records. The controller takes that responsibility in the design.

It is fair to open a discussion on what our plans for the activation record itself are though, in the future. There is a lot of work going on in that area currently, with Vadim implementing user-facing metrics (which can serve of part of what activation records do) and James implementing different ActivationStores with the intention to eventually moving activation records to the logging system.

Another angle here is that both of these solutions drop persistence of the activation result by default, since it is potentially a large blob. Persisting logs into CouchDB doesn't really scale either so there are a couple of LogStores to shift that burden away. What remains is largely a small, bounded record of some metrics per activation. I'll be happy to see a separate proposal + discussion on where we want to take this in the future :)

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by TzuChiao Yeh <su...@gmail.com>.

Hi Markus,

Awesome work! Thanks for doing this.

One simple question here: due to directly call actions via http calls, do
we still persist activation (i.e. duplicate activations into somewhere
storage)? Since we already provide "best-effort" invocation for users, not
sure persistence is still worth-doing. Or maybe we can provide some
guarantee options in the future?

Thanks,
Tzu-Chiao Yeh (@tz70s)

On Tue, Jul 17, 2018 at 12:42 AM Markus Thoemmes <ma...@de.ibm.com>
wrote:

> Hi Chetan,
>
> > Hi Thomas,
>
> It's Markus Thömmes/Thoemmes respectively :)
>
> > Is this routing round robin for per namespace + action name url or is
> > it for any url? For e.g. if we have controller c1-c3 and request come
> > in order a1,a2,a3, a1 which controller would be handling which action
> > here?
>
> It's for any URL. I'm not sure the general front-door (nginx in our case)
> supports keyed round-robin/least-connected. For sanity, I basically assume
> that every request can land on any controller with no control of how that
> might happen.
>
> Cheers,
> Markus
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Chetan,

> Hi Thomas,

It's Markus Thömmes/Thoemmes respectively :)

> Is this routing round robin for per namespace + action name url or is
> it for any url? For e.g. if we have controller c1-c3 and request come
> in order a1,a2,a3, a1 which controller would be handling which action
> here?

It's for any URL. I'm not sure the general front-door (nginx in our case) supports keyed round-robin/least-connected. For sanity, I basically assume that every request can land on any controller with no control of how that might happen.

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by Chetan Mehrotra <ch...@gmail.com>.

Hi Thomas,

Proposal looks good and consolidates various ideas discussed so far.
Would have a closer look. Have a quick query for now

> Since the front-door schedules round-robin or least-connected

Is this routing round robin for per namespace + action name url or is
it for any url? For e.g. if we have controller c1-c3 and request come
in order a1,a2,a3, a1 which controller would be handling which action
here?
Chetan Mehrotra


On Fri, Jul 13, 2018 at 10:59 PM, Markus Thoemmes
<ma...@de.ibm.com> wrote:
> Hello OpenWhiskers,
>
> I just published a proposal on a potential future architecture for OpenWhisk that aligns deployments with and without an underlying container orchestrator like Mesos or Kubernetes. It also incooperates some of the proposals that are already out there and tries to give a holistic view of where we want OpenWhisk to go to in the near future. It's designed to keep the APIs stable but is very invasive in its changes under the hood.
>
> This proposal is the outcome of a lot of discussions with fellow colleagues and community members. It is based on experience with the problems the current architecture has. Moreover it aims to remove friction with the deployment topologies on top of a container orchestrator.
>
> Feedback is very very very welcome! The proposal has some gaps and generally does not go into much detail implementationwise. I'd love to see all those gaps filled by the community!
>
> Find the proposal here: https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future+architecture
>
> Cheers,
> Markus
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

I agree 100%!

In the face of intra-container concurrency we should go in fully and find a solution that works even there.

Another slight wrinkle: Usually, the log-forwarder (whomever that may be) needs to know which container belongs to which user to namespace logs accordingly. We cannot really set that on the container itself, because there are pre-warm containers. The mapping of container -> userr is immutable though, since we don't reuse containers for different users (today). It would then be plausible to informer the log forwarder about the ContainerID -> user mapping to make it do the right thing.

Note that this specific piece of information cannot really be part of the log's own context since the user must not be able to change it.

@Tyson seems like you already put quite a bit of thought into this. Could you turn this into a proposal in itself to discuss seperately?

Cheers,
Markus

-----Tyson Norris <tn...@adobe.com.INVALID> wrote: -----

>To: "dev@openwhisk.apache.org" <de...@openwhisk.apache.org>
>From: Tyson Norris <tn...@adobe.com.INVALID>
>Date: 07/20/2018 06:24PM
>Subject: Re: Proposal on a future architecture of OpenWhisk
>
>On Logging, I think if you are considering enabling concurrent
>activation processing, you will encounter that the only approach to
>parsing logs to be associated with a specific activationId, is to
>force the log output to be structured, and always include the
>activationId with every log message. This requires a change at the
>action container layer, but the simpler thing to do is to encourage
>action containers to provide a structured logging context that action
>developers can (and must) use to generate logs. 
>
>An example is nodejs container - for the time being, we are hijacking
>the stdout/stderr and injecting the activationId when any developer
>code writes to stdout/stderr (as console.log/console.error). This may
>not work as simply in all action containers, and isn’t great even in
>nodejs. 
>
>I would rather encourage action containers to provide a logging
>context, where action devs use: log.info, log.debug, etc, and this
>logging context does the needful to assert some structure to the log
>format. In general, many (most?) languages have conventions (slf4xyz,
>et al) for this already, and while you lose “random writes to
>stdout”, I haven’t seen this be an actual problem. 
>
>If you don’t deal with interleaved logs (typically because
>activations don’t run concurrently), this this is less of an issue,
>but regardless, writing log parsers is a solved problem that would
>still be good to offload to external (not in OW controller/invoker)
>systems (logstash, fluentd, splunk, etc). This obviously comes with a
>caveat that logs parsing will be delayed, but that is OK from my
>point of view, partly because most logs will never be viewed, and
>partly because the log ingest systems are mostly fast enough already
>to limit this delay to seconds or milliseconds.  
>
>Thanks
>Tyson

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thömmes <me...@googlemail.com.INVALID>.

Hi David,

the system is indeed dynamic and doesn't care if your workload falls
into the "heavy-load" bucket or the "light-load" bucket. In fact, it's
a form of work-stealing where an entity with no resources passes the
request to an entity with resources until that entity eventually
becomes overloaded itself.

I'm not sure the added latency here is of any importance, at least on
the high utilization case (which is your concern if I understand
correctly). As soon as the container of controller0 does no longer
suffice to serve all incoming requests concurrently, it will request
more resources immediately and return a 503 to controller1. The time
to create this container is far higher than the time it takes for the
503 to get back to controller1 so virtually no latency is added to the
request at all.

It's also important that as soon as the number of containers is >= the
number of controllers in the system, no more requests will need
proxying because every controller has at least 1 container.

I should definitly work on writing this whole protocol down and add
pictures and flow-diagrams. That should clarify even further.

Cheers,
Markus
Am Mi., 25. Juli 2018 um 19:12 Uhr schrieb David Breitgand <DA...@il.ibm.com>:
>
> Hi Markus,
>
> Sure, that makes sense, but I think the question is how to optimize a
> tradeoff between the higher variance in the invocation latency (in the
> heavy utilized case) and waste of containers (in the underutilzed case).
>
> If we expect that the workloads will be volatile, switching between light
> utilization and heavy utilization, then maybe a solution should also be
> dynamic.
>
> More specifically:
> 1) Continuously determine a utilization condition by sampling incoming
> traffic
> 2) If utilization is low --> use proxying
> 3) If utilization is high --> switch proxying off
>
> If it's expected that the load will be moderate to high, I would just not
> use proxying at all and risk a bit of waste to gain performance.
>
> Maybe this also can be aligned with SLAs (which currently do not exist),
> but one can think of them as a thing of the future (in line with the
> Rodric's post
> https://medium.com/openwhisk/security-and-serverless-functions-b97618430db6
> ), so that latency/capacity can be traded off differently for different
> actions subject to SLA.
>
> Cheers.
>
> -- david
>
>
>
>
> From:   "Markus Thömmes" <ma...@apache.org>
> To:     dev@openwhisk.apache.org
> Date:   25/07/2018 06:20 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi David,
>
> note that only the first few requests in a "frequent-load" pattern
> would be proxied.
>
> Say you have 2 controllers, 1 container. controller0 owns that
> container. Heavy load comes in and hits two controllers. controller1
> will proxy the first few requests to controller0 because it doesn't
> yet own a container. controller0 will execute as much as it can and
> immediately realize it needs more containers, so it asks for them. The
> requests coming from controller1 will get rejected with a 503, stating
> that controller0 is overloaded itself and controller1 should wait
> because containers have already been requested.
>
> The ContainerManager distributes containers evenly, so even though
> controller1 hasn't asked for more containers just yet, it will get
> some due to the redistribution by the CM. It also starts asking for
> containers itself after it got the 503 by controller0.
>
> Does that make sense? This isn't in the proposal yet but has been
> discussed on this mail-thread. I'll aim to incorporate some of the
> discussions into the proposal sooner or later.
>
> Cheers,
> Markus
> Am Mi., 25. Juli 2018 um 16:54 Uhr schrieb David Breitgand
> <DA...@il.ibm.com>:
> >
> > >> Hope that clarifies
> >
> > Yes, it does, thanks. But I still have a question :)
> >
> > >> if you have N controllers in the system and M
> > containers but N > M and all controllers manage their containers
> > exclusively, you'll end up with controllers not having a container to
> > manage at all.
> >
> > I am not sure how you arrive to this condition. It can only happen if
> the
> > system is very under-utilized. Suppose there is no proxying. As  you
> > write, as action invocations keep coming, every Controller will get at
> > least one request. So, it will at least once ask ContainerManager to
> > allocate a container.
> >
> > I agree that when action invocation is infrequent, it's better to have 1
> > container rather than N for that action.
> >
> > However, if an action invocation is frequent, it's the other way around:
> > you'd prefer having N containers rather than queueing or first going to
> > ContainerManager, getting pointed to a Controller that has happens to
> have
> > all containers for that action busy, which will result in that
> Controller
> > again going to a ContainerManager and asking for a new container.
> >
> > So, how do we differentiate between the two cases? For frequently
> > infrequent actions it saves, but for frequently executed actions,
> proxying
> > will result in performance hit, will it?
> >
> > Thanks.
> >
> > -- david
> >
> >
> >
> >
> > From:   "Markus Thömmes" <me...@googlemail.com.INVALID>
> > To:     dev@openwhisk.apache.org
> > Date:   25/07/2018 04:05 PM
> > Subject:        Re: Proposal on a future architecture of OpenWhisk
> >
> >
> >
> > Hi David,
> >
> > the problem is, that if you have N controllers in the system and M
> > containers but N > M and all controllers manage their containers
> > exclusively, you'll end up with controllers not having a container to
> > manage at all.
> > There's valid, very slow workload that needs to create only 1
> > container, for example a slow cron trigger. Due to the round-robin
> > nature of our front-door, eventually all controllers will get one of
> > those requests at some point. Since they are by design not aware of
> > the containers because they are managed by another controller they'd
> > end-up asking for a newly created container. Given N controllers we'd
> > always create at least N containers for any action eventually. That is
> > wasteful.
> >
> > Instead, requests are proxied to a controller which we know manages a
> > container for the given action (the ContainerManager knows that) and
> > thereby bypass the need to create too many containers. If the load is
> > too much to be handled by the M containers, the controllers managing
> > those M will request new containers, which will get distributed to all
> > controllers. Eventually, given enough load, all controllers will have
> > containers to manage for each action.
> >
> > The ContainerManager only needs to know which controller has which
> > container. It does not need to know in which state these containers
> > are. If they are busy, the controller itself will request more
> > resources accordingly.
> >
> > Hope that clarifies
> >
> > Cheers,
> > Markus
> >
> > 2018-07-25 14:19 GMT+02:00 David Breitgand <DA...@il.ibm.com>:
> > > Hi Markus,
> > >
> > > I'd like to better understand the edge case.
> > >
> > > Citing from the wiki.
> > >
> > >>> Edge case: If an action only has a very small amount of containers
> > > (less than there are Controllers in the system), we have a problem
> with
> > > the method described above.
> > >
> > > Isn't there always at least one controller in the system? I think the
> > > problem is not the number of Controllers, but rather availability of
> > > prewarm containers that these Controllers control. If all containers
> of
> > > this Controller are busy at the moment, and concurrency level per
> > > container is 1 and the invocation hit this controller, it cannot
> execute
> > > the action immediately with one of its containers. Is that the problem
> > > that is being solved?
> > >
> > >>> Since the front-door schedules round-robin or least-connected, it's
> > > impossible to decide to which Controller the request needs to go to
> hit
> > > that has a container available.
> > > In this case, the other Controllers (who didn't get a container) act
> as
> > a
> > > proxy and send the request to a Controller that actually has a
> container
> > > (maybe even via HTTP redirect). The ContainerManager decides which
> > > Controllers will act as a proxy in this case, since its the instance
> > that
> > > distributes the containers.
> > >>>
> > >
> > > When reading your proposal, I was under impression that
> ContainerManager
> > > only knows about existence of containers allocated to the Controllers
> > > (because they asked), but ContainerManager does not know about the
> state
> > > of these containers at every given moment (i.e., whether they are
> being
> > > busy with running some action or not). I don't see Controllers
> updating
> > > ContainerManager about this in your diagrams.
> > >
> > > Thanks.
> > >
> > > -- david
> > >
> > >
> > >
> > >
> > > From:   "Markus Thoemmes" <ma...@de.ibm.com>
> > > To:     dev@openwhisk.apache.org
> > > Date:   23/07/2018 02:21 PM
> > > Subject:        Re: Proposal on a future architecture of OpenWhisk
> > >
> > >
> > >
> > > Hi Dominic,
> > >
> > > let's see if I can clarify the specific points one by one.
> > >
> > >>1. Docker daemon performance issue.
> > >>
> > >>...
> > >>
> > >>That's the reason why I initially thought that a Warmed state would
> > >>be kept
> > >>for more than today's behavior.
> > >>Today, containers would stay in the Warmed state only for 50ms, so it
> > >>introduces PAUSE/RESUME in case action comes with the interval of
> > >>more than
> > >>50 ms such as 1 sec.
> > >>This will lead to more loads on Docker daemon.
> > >
> > > You're right that the docker daemon's throughput is indeed an issue.
> > >
> > > Please note that PAUSE/RESUME are not executed via the docker daemon
> in
> > > performance
> > > tuned environment but rather done via runc, which does not have such a
> > > throughput
> > > issue because it's not a daemon at all. PAUSE/RESUME latencies are
> ~10ms
> > > for each
> > > operation.
> > >
> > > Further, the duration of the pauseGrace is not related to the overall
> > > architecture at
> > > all. Rather, it's a so narrow to safe-guard against users stealing
> > cycles
> > > from the
> > > vendor's infrastructure. It's also a configurable value so you can
> tweak
> > > it as you
> > > want.
> > >
> > > The proposed architecture itself has no impact on the pauseGrace.
> > >
> > >>
> > >>And if the state of containers is changing like today, the state in
> > >>ContainerManager would be frequently changing as well.
> > >>This may induce a synchronization issue among controllers and, among
> > >>ContainerManagers(in case there would be more than one
> > >>ContainerManager).
> > >
> > > The ContainerManager will NOT be informed about pause/unpause state
> > > changes and it
> > > doesn't need to. I agree that such a behavior would generate serious
> > load
> > > on the
> > > ContainerManager, but I think it's unnecessary.
> > >
> > >>2. Proxy case.
> > >>
> > >>...
> > >>
> > >>If it goes this way, ContainerManager should know all the status of
> > >>containers in all controllers to make a right decision and it's not
> > >>easy to
> > >>synchronize all the status of containers in controllers.
> > >>If it does not work like this, how can controller2 proxy requests to
> > >>controller1 without any information about controller1's status?
> > >
> > >
> > > The ContainerManager distributes a list of containers across all
> > > controllers.
> > > If it does not have enough containers at hand to give one to each
> > > controller,
> > > it instead tells controller2 to proxy to controller1, because the
> > > ContainerManager
> > > knows at distribution-time, that controller1 has such a container.
> > >
> > > No synchronization needed between controllers at all.
> > >
> > > If controller1 gets more requests than the single container can
> handle,
> > it
> > > will
> > > request more containers, so eventually controller2 will get its own.
> > >
> > > Please refer to
> > >
> >
> https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E
>
> >
> > >
> > > for more information on that protocol.
> > >
> > >
> > >>3. Intervention among multiple actions
> > >>
> > >>If the concurrency limit is 1, and the container lifecycle is managed
> > >>like
> > >>today, intervention among multiple actions can happen again.
> > >>For example, the maximum number of containers which can be created by
> > >>a
> > >>user is 2, and ActionA and ActionB invocation requests come
> > >>alternatively,
> > >>controllers will try to remove and recreate containers again and
> > >>again.
> > >>I used an example with a small number of max container limit for
> > >>simplicity, but it can happen with a higher limit as well.
> > >>
> > >>And though concurrency limit is more than 1 such as 3, it also can
> > >>happen
> > >>if actions come more quickly than the execution time of actions.
> > >
> > > The controller will never try to delete a container at all, neither
> does
> > > it's
> > > pool of managed containers has a limit.
> > > If it doesn't have a container for ActionA it will request one from
> the
> > > ContainerManager.
> > > If it doesn't have one for ActionB it will request one from the
> > > ContainerManager.
> > >
> > > There will be 2 containers in the system and assuming that the
> > > ContainerManager has enough
> > > resources to keep those 2 containers alive, it will not delete them.
> > >
> > > The controllers by design cannot cause the behavior you're describing.
> > The
> > > architecture is
> > > actually build around fixing this exact issue (eviction due to
> multiple
> > > heavy users in the
> > > system).
> > >
> > >>4. Is concurrency per container controlled by users in a per-action
> > >>based
> > >>way?
> > >>Let me clarify my question about concurrency limit.
> > >>
> > >>If concurrency per container limit is more than 1, there could be
> > >>multiple
> > >>actions being invoked at some point.
> > >>If the action requires high memory footprint such as 200MB or 150MB,
> > >>it can
> > >>crash if the sum of memory usage of concurrent actions exceeds the
> > >>container memory.
> > >>(In our case(here), some users are executing headless-chrome and
> > >>puppeteer
> > >>within actions, so it could happen under the similar situation.)
> > >>
> > >>So I initially thought concurrency per container is controlled by
> > >>users in
> > >>a per-action based way.
> > >>If concurrency per container is only configured by OW operators
> > >>statically,
> > >>some users may not be able to invoke their actions correctly in the
> > >>worst
> > >>case though operators increased the memory of the biggest container
> > >>type.
> > >>
> > >>And not only for this case, there could be some more reasons that
> > >>some
> > >>users just want to invoke their actions without per-container
> > >>concurrency
> > >>but the others want it for better throughput.
> > >>
> > >>So we may need some logic for users to take care of per-container
> > >>concurrency for each actions.
> > >
> > > Yes, the intention is to provide exactly what you're describing, maybe
> I
> > > worded it weirdly
> > > in my last response.
> > >
> > > This is not relevant for the architecture though.
> > >
> > >
> > >>5. Better to wait for the completion rather than creating a new
> > >>container.
> > >>According to the workload, it would be better to wait for the
> > >>previous
> > >>execution rather than creating a new container because it takes upto
> > >>500ms
> > >>~ 1s.
> > >>Even though the concurrency limit is more than 1, it still can happen
> > >>if
> > >>there is no logic to cumulate invocations and decide whether to
> > >>create a
> > >>new container or waiting for the existing container.
> > >
> > > The proposed asynchronous protocol between controller and
> > ContainerManager
> > > accomplishes this by design:
> > >
> > > If a controller does not have the resources to execute the current
> > > request, it requests those resources.
> > > The ContainerManager updates resources asynchronously.
> > > The Controller will schedule the outstanding request as soon as it
> gets
> > > resources for it. It does not care
> > > if those resources are  becoming free because another request finished
> > or
> > > because it got a fresh container
> > > from the ContainerManager. Requests will always be dispatched as soon
> as
> > > resources are free.
> > >
> > >>6. HA of ContainerManager.
> > >>Since it is mandatory to deploy the system without any downtime to
> > >>use it
> > >>for production, we need to support HA of ContainerManager.
> > >>It means the state of ContainerManager should be replicated among
> > >>replicas.
> > >>(No matter which method we use between master/slave or clustering.)
> > >>
> > >>If ContainerManager knows about the status of each container, it
> > >>would not
> > >>be easy to support HA with its eventual consistent nature.
> > >>If it does only know which containers are assigned to which
> > >>controllers, it
> > >>cannot handle the edge case as I mentioned above.
> > >
> > > I agree, HA is mandatory. Since the ContainerManager operates only on
> > the
> > > container creation/deletion path,
> > > we can probably afford to persist its state into something like Redis.
> > If
> > > it crashes, the slave instance
> > > can take over immediately without any eventual-consistency concerns or
> > > downtime.
> > >
> > > Also note that a downtime in the ContainerManager will ONLY cause an
> > > impact on the ability to create containers.
> > > Workloads that already have containers created will continue to work
> > just
> > > fine.
> > >
> > >
> > > Does that answer/mitigate your concerns?
> > >
> > > Cheers,
> > > Markus
> > >
> > >
> > >>To: dev@openwhisk.apache.org
> > >>From: Dominic Kim <st...@gmail.com>
> > >>Date: 07/23/2018 12:48PM
> > >>Subject: Re: Proposal on a future architecture of OpenWhisk
> > >>
> > >>Dear Markus.
> > >>
> > >>I may not correctly understand the direction of new architecture.
> > >>So let me describe my concerns in more details.
> > >>
> > >>Since that is a future architecture of OpenWhisk and requires many
> > >>breaking
> > >>changes, I think it should at least address all known issues.
> > >>So I focused on figuring out whether it handles all issues which are
> > >>reported in my proposal.
> > >>(
> > >>INVALID URI REMOVED
> > >>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
> > >>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
> > >>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
> > >>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
> > >>)
> > >>
> > >>1. Docker daemon performance issue.
> > >>
> > >>The most critical issue is poor performance of docker daemon.
> > >>Since it is not inherently designed for high throughput or concurrent
> > >>processing, Docker daemon shows poor performance in comparison with
> > >>OW.
> > >>In OW(serverless) world, action execution can be finished within 5ms
> > >>~
> > >>10ms, but the Docker daemon shows 100 ~ 500ms latency.
> > >>Still, we can take advantage of Prewarm and Warmed containers, but
> > >>under
> > >>the situation where container creation/deletion/pausing/resuming
> > >>happen
> > >>frequently and the situation lasted for long-term, the requests are
> > >>delayed
> > >>and even the Docker daemon crashed.
> > >>So I think it is important to reduce the loads(requests) against the
> > >>Docker
> > >>daemon.
> > >>
> > >>That's the reason why I initially thought that a Warmed state would
> > >>be kept
> > >>for more than today's behavior.
> > >>Today, containers would stay in the Warmed state only for 50ms, so it
> > >>introduces PAUSE/RESUME in case action comes with the interval of
> > >>more than
> > >>50 ms such as 1 sec.
> > >>This will lead to more loads on Docker daemon.
> > >>
> > >>And if the state of containers is changing like today, the state in
> > >>ContainerManager would be frequently changing as well.
> > >>This may induce a synchronization issue among controllers and, among
> > >>ContainerManagers(in case there would be more than one
> > >>ContainerManager).
> > >>
> > >>So I think containers should be running for more than today's
> > >>pauseGrace
> > >>time.
> > >>With more than 1 concurrency limit per container, it would also be
> > >>better
> > >>to keep containers running(not paused) for more than 50ms.
> > >>
> > >>2. Proxy case.
> > >>
> > >>In the edge case where a container only exists in controller1, how
> > >>can
> > >>controller2 decide to proxy the request to controller1 rather than
> > >>just
> > >>creating its own container?
> > >>If it asks to ContainerManager, ContainerManager should know the
> > >>state of
> > >>the container in controller1.
> > >>If the container in controller1 is already busy, it would be better
> > >>to
> > >>create a new container in controller2 rather than proxying the
> > >>requests to
> > >>controller1.
> > >>
> > >>If it goes this way, ContainerManager should know all the status of
> > >>containers in all controllers to make a right decision and it's not
> > >>easy to
> > >>synchronize all the status of containers in controllers.
> > >>If it does not work like this, how can controller2 proxy requests to
> > >>controller1 without any information about controller1's status?
> > >>
> > >>3. Intervention among multiple actions
> > >>
> > >>If the concurrency limit is 1, and the container lifecycle is managed
> > >>like
> > >>today, intervention among multiple actions can happen again.
> > >>For example, the maximum number of containers which can be created by
> > >>a
> > >>user is 2, and ActionA and ActionB invocation requests come
> > >>alternatively,
> > >>controllers will try to remove and recreate containers again and
> > >>again.
> > >>I used an example with a small number of max container limit for
> > >>simplicity, but it can happen with a higher limit as well.
> > >>
> > >>And though concurrency limit is more than 1 such as 3, it also can
> > >>happen
> > >>if actions come more quickly than the execution time of actions.
> > >>
> > >>4. Is concurrency per container controlled by users in a per-action
> > >>based
> > >>way?
> > >>Let me clarify my question about concurrency limit.
> > >>
> > >>If concurrency per container limit is more than 1, there could be
> > >>multiple
> > >>actions being invoked at some point.
> > >>If the action requires high memory footprint such as 200MB or 150MB,
> > >>it can
> > >>crash if the sum of memory usage of concurrent actions exceeds the
> > >>container memory.
> > >>(In our case(here), some users are executing headless-chrome and
> > >>puppeteer
> > >>within actions, so it could happen under the similar situation.)
> > >>
> > >>So I initially thought concurrency per container is controlled by
> > >>users in
> > >>a per-action based way.
> > >>If concurrency per container is only configured by OW operators
> > >>statically,
> > >>some users may not be able to invoke their actions correctly in the
> > >>worst
> > >>case though operators increased the memory of the biggest container
> > >>type.
> > >>
> > >>And not only for this case, there could be some more reasons that
> > >>some
> > >>users just want to invoke their actions without per-container
> > >>concurrency
> > >>but the others want it for better throughput.
> > >>
> > >>So we may need some logic for users to take care of per-container
> > >>concurrency for each actions.
> > >>
> > >>5. Better to wait for the completion rather than creating a new
> > >>container.
> > >>According to the workload, it would be better to wait for the
> > >>previous
> > >>execution rather than creating a new container because it takes upto
> > >>500ms
> > >>~ 1s.
> > >>Even though the concurrency limit is more than 1, it still can happen
> > >>if
> > >>there is no logic to cumulate invocations and decide whether to
> > >>create a
> > >>new container or waiting for the existing container.
> > >>
> > >>
> > >>6. HA of ContainerManager.
> > >>Since it is mandatory to deploy the system without any downtime to
> > >>use it
> > >>for production, we need to support HA of ContainerManager.
> > >>It means the state of ContainerManager should be replicated among
> > >>replicas.
> > >>(No matter which method we use between master/slave or clustering.)
> > >>
> > >>If ContainerManager knows about the status of each container, it
> > >>would not
> > >>be easy to support HA with its eventual consistent nature.
> > >>If it does only know which containers are assigned to which
> > >>controllers, it
> > >>cannot handle the edge case as I mentioned above.
> > >>
> > >>
> > >>
> > >>Since many parts of the architecture are not addressed yet, I think
> > >>it
> > >>would be better to separate each parts and discuss further deeply.
> > >>But in the big picture, I think we need to figure out whether it can
> > >>handle
> > >>or at least alleviate all known issues or not first.
> > >>
> > >>
> > >>Best regards,
> > >>Dominic
> > >>
> > >>
> > >>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
> > >>
> > >>>
> > >>>
> > >>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
> > >>12:24:07 PM:
> > >>> >
> > >>> > On Logging, I think if you are considering enabling concurrent
> > >>> > activation processing, you will encounter that the only approach
> > >>to
> > >>> > parsing logs to be associated with a specific activationId, is to
> > >>> > force the log output to be structured, and always include the
> > >>> > activationId with every log message. This requires a change at
> > >>the
> > >>> > action container layer, but the simpler thing to do is to
> > >>encourage
> > >>> > action containers to provide a structured logging context that
> > >>> > action developers can (and must) use to generate logs.
> > >>>
> > >>> Good point.  I agree that if there is concurrent activation
> > >>processing in
> > >>> the container, structured logging is the only sensible thing to do.
> > >>>
> > >>>
> > >>> --dave
> > >>>
> > >>
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by David Breitgand <DA...@il.ibm.com>.

Hi Markus, 

Sure, that makes sense, but I think the question is how to optimize a 
tradeoff between the higher variance in the invocation latency (in the 
heavy utilized case) and waste of containers (in the underutilzed case). 

If we expect that the workloads will be volatile, switching between light 
utilization and heavy utilization, then maybe a solution should also be 
dynamic. 

More specifically:
1) Continuously determine a utilization condition by sampling incoming 
traffic 
2) If utilization is low --> use proxying
3) If utilization is high --> switch proxying off 

If it's expected that the load will be moderate to high, I would just not 
use proxying at all and risk a bit of waste to gain performance.

Maybe this also can be aligned with SLAs (which currently do not exist), 
but one can think of them as a thing of the future (in line with the 
Rodric's post 
https://medium.com/openwhisk/security-and-serverless-functions-b97618430db6
), so that latency/capacity can be traded off differently for different 
actions subject to SLA.

Cheers.

-- david 




From:   "Markus Thömmes" <ma...@apache.org>
To:     dev@openwhisk.apache.org
Date:   25/07/2018 06:20 PM
Subject:        Re: Proposal on a future architecture of OpenWhisk



Hi David,

note that only the first few requests in a "frequent-load" pattern
would be proxied.

Say you have 2 controllers, 1 container. controller0 owns that
container. Heavy load comes in and hits two controllers. controller1
will proxy the first few requests to controller0 because it doesn't
yet own a container. controller0 will execute as much as it can and
immediately realize it needs more containers, so it asks for them. The
requests coming from controller1 will get rejected with a 503, stating
that controller0 is overloaded itself and controller1 should wait
because containers have already been requested.

The ContainerManager distributes containers evenly, so even though
controller1 hasn't asked for more containers just yet, it will get
some due to the redistribution by the CM. It also starts asking for
containers itself after it got the 503 by controller0.

Does that make sense? This isn't in the proposal yet but has been
discussed on this mail-thread. I'll aim to incorporate some of the
discussions into the proposal sooner or later.

Cheers,
Markus
Am Mi., 25. Juli 2018 um 16:54 Uhr schrieb David Breitgand 
<DA...@il.ibm.com>:
>
> >> Hope that clarifies
>
> Yes, it does, thanks. But I still have a question :)
>
> >> if you have N controllers in the system and M
> containers but N > M and all controllers manage their containers
> exclusively, you'll end up with controllers not having a container to
> manage at all.
>
> I am not sure how you arrive to this condition. It can only happen if 
the
> system is very under-utilized. Suppose there is no proxying. As  you
> write, as action invocations keep coming, every Controller will get at
> least one request. So, it will at least once ask ContainerManager to
> allocate a container.
>
> I agree that when action invocation is infrequent, it's better to have 1
> container rather than N for that action.
>
> However, if an action invocation is frequent, it's the other way around:
> you'd prefer having N containers rather than queueing or first going to
> ContainerManager, getting pointed to a Controller that has happens to 
have
> all containers for that action busy, which will result in that 
Controller
> again going to a ContainerManager and asking for a new container.
>
> So, how do we differentiate between the two cases? For frequently
> infrequent actions it saves, but for frequently executed actions, 
proxying
> will result in performance hit, will it?
>
> Thanks.
>
> -- david
>
>
>
>
> From:   "Markus Thömmes" <me...@googlemail.com.INVALID>
> To:     dev@openwhisk.apache.org
> Date:   25/07/2018 04:05 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi David,
>
> the problem is, that if you have N controllers in the system and M
> containers but N > M and all controllers manage their containers
> exclusively, you'll end up with controllers not having a container to
> manage at all.
> There's valid, very slow workload that needs to create only 1
> container, for example a slow cron trigger. Due to the round-robin
> nature of our front-door, eventually all controllers will get one of
> those requests at some point. Since they are by design not aware of
> the containers because they are managed by another controller they'd
> end-up asking for a newly created container. Given N controllers we'd
> always create at least N containers for any action eventually. That is
> wasteful.
>
> Instead, requests are proxied to a controller which we know manages a
> container for the given action (the ContainerManager knows that) and
> thereby bypass the need to create too many containers. If the load is
> too much to be handled by the M containers, the controllers managing
> those M will request new containers, which will get distributed to all
> controllers. Eventually, given enough load, all controllers will have
> containers to manage for each action.
>
> The ContainerManager only needs to know which controller has which
> container. It does not need to know in which state these containers
> are. If they are busy, the controller itself will request more
> resources accordingly.
>
> Hope that clarifies
>
> Cheers,
> Markus
>
> 2018-07-25 14:19 GMT+02:00 David Breitgand <DA...@il.ibm.com>:
> > Hi Markus,
> >
> > I'd like to better understand the edge case.
> >
> > Citing from the wiki.
> >
> >>> Edge case: If an action only has a very small amount of containers
> > (less than there are Controllers in the system), we have a problem 
with
> > the method described above.
> >
> > Isn't there always at least one controller in the system? I think the
> > problem is not the number of Controllers, but rather availability of
> > prewarm containers that these Controllers control. If all containers 
of
> > this Controller are busy at the moment, and concurrency level per
> > container is 1 and the invocation hit this controller, it cannot 
execute
> > the action immediately with one of its containers. Is that the problem
> > that is being solved?
> >
> >>> Since the front-door schedules round-robin or least-connected, it's
> > impossible to decide to which Controller the request needs to go to 
hit
> > that has a container available.
> > In this case, the other Controllers (who didn't get a container) act 
as
> a
> > proxy and send the request to a Controller that actually has a 
container
> > (maybe even via HTTP redirect). The ContainerManager decides which
> > Controllers will act as a proxy in this case, since its the instance
> that
> > distributes the containers.
> >>>
> >
> > When reading your proposal, I was under impression that 
ContainerManager
> > only knows about existence of containers allocated to the Controllers
> > (because they asked), but ContainerManager does not know about the 
state
> > of these containers at every given moment (i.e., whether they are 
being
> > busy with running some action or not). I don't see Controllers 
updating
> > ContainerManager about this in your diagrams.
> >
> > Thanks.
> >
> > -- david
> >
> >
> >
> >
> > From:   "Markus Thoemmes" <ma...@de.ibm.com>
> > To:     dev@openwhisk.apache.org
> > Date:   23/07/2018 02:21 PM
> > Subject:        Re: Proposal on a future architecture of OpenWhisk
> >
> >
> >
> > Hi Dominic,
> >
> > let's see if I can clarify the specific points one by one.
> >
> >>1. Docker daemon performance issue.
> >>
> >>...
> >>
> >>That's the reason why I initially thought that a Warmed state would
> >>be kept
> >>for more than today's behavior.
> >>Today, containers would stay in the Warmed state only for 50ms, so it
> >>introduces PAUSE/RESUME in case action comes with the interval of
> >>more than
> >>50 ms such as 1 sec.
> >>This will lead to more loads on Docker daemon.
> >
> > You're right that the docker daemon's throughput is indeed an issue.
> >
> > Please note that PAUSE/RESUME are not executed via the docker daemon 
in
> > performance
> > tuned environment but rather done via runc, which does not have such a
> > throughput
> > issue because it's not a daemon at all. PAUSE/RESUME latencies are 
~10ms
> > for each
> > operation.
> >
> > Further, the duration of the pauseGrace is not related to the overall
> > architecture at
> > all. Rather, it's a so narrow to safe-guard against users stealing
> cycles
> > from the
> > vendor's infrastructure. It's also a configurable value so you can 
tweak
> > it as you
> > want.
> >
> > The proposed architecture itself has no impact on the pauseGrace.
> >
> >>
> >>And if the state of containers is changing like today, the state in
> >>ContainerManager would be frequently changing as well.
> >>This may induce a synchronization issue among controllers and, among
> >>ContainerManagers(in case there would be more than one
> >>ContainerManager).
> >
> > The ContainerManager will NOT be informed about pause/unpause state
> > changes and it
> > doesn't need to. I agree that such a behavior would generate serious
> load
> > on the
> > ContainerManager, but I think it's unnecessary.
> >
> >>2. Proxy case.
> >>
> >>...
> >>
> >>If it goes this way, ContainerManager should know all the status of
> >>containers in all controllers to make a right decision and it's not
> >>easy to
> >>synchronize all the status of containers in controllers.
> >>If it does not work like this, how can controller2 proxy requests to
> >>controller1 without any information about controller1's status?
> >
> >
> > The ContainerManager distributes a list of containers across all
> > controllers.
> > If it does not have enough containers at hand to give one to each
> > controller,
> > it instead tells controller2 to proxy to controller1, because the
> > ContainerManager
> > knows at distribution-time, that controller1 has such a container.
> >
> > No synchronization needed between controllers at all.
> >
> > If controller1 gets more requests than the single container can 
handle,
> it
> > will
> > request more containers, so eventually controller2 will get its own.
> >
> > Please refer to
> >
> 
https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E

>
> >
> > for more information on that protocol.
> >
> >
> >>3. Intervention among multiple actions
> >>
> >>If the concurrency limit is 1, and the container lifecycle is managed
> >>like
> >>today, intervention among multiple actions can happen again.
> >>For example, the maximum number of containers which can be created by
> >>a
> >>user is 2, and ActionA and ActionB invocation requests come
> >>alternatively,
> >>controllers will try to remove and recreate containers again and
> >>again.
> >>I used an example with a small number of max container limit for
> >>simplicity, but it can happen with a higher limit as well.
> >>
> >>And though concurrency limit is more than 1 such as 3, it also can
> >>happen
> >>if actions come more quickly than the execution time of actions.
> >
> > The controller will never try to delete a container at all, neither 
does
> > it's
> > pool of managed containers has a limit.
> > If it doesn't have a container for ActionA it will request one from 
the
> > ContainerManager.
> > If it doesn't have one for ActionB it will request one from the
> > ContainerManager.
> >
> > There will be 2 containers in the system and assuming that the
> > ContainerManager has enough
> > resources to keep those 2 containers alive, it will not delete them.
> >
> > The controllers by design cannot cause the behavior you're describing.
> The
> > architecture is
> > actually build around fixing this exact issue (eviction due to 
multiple
> > heavy users in the
> > system).
> >
> >>4. Is concurrency per container controlled by users in a per-action
> >>based
> >>way?
> >>Let me clarify my question about concurrency limit.
> >>
> >>If concurrency per container limit is more than 1, there could be
> >>multiple
> >>actions being invoked at some point.
> >>If the action requires high memory footprint such as 200MB or 150MB,
> >>it can
> >>crash if the sum of memory usage of concurrent actions exceeds the
> >>container memory.
> >>(In our case(here), some users are executing headless-chrome and
> >>puppeteer
> >>within actions, so it could happen under the similar situation.)
> >>
> >>So I initially thought concurrency per container is controlled by
> >>users in
> >>a per-action based way.
> >>If concurrency per container is only configured by OW operators
> >>statically,
> >>some users may not be able to invoke their actions correctly in the
> >>worst
> >>case though operators increased the memory of the biggest container
> >>type.
> >>
> >>And not only for this case, there could be some more reasons that
> >>some
> >>users just want to invoke their actions without per-container
> >>concurrency
> >>but the others want it for better throughput.
> >>
> >>So we may need some logic for users to take care of per-container
> >>concurrency for each actions.
> >
> > Yes, the intention is to provide exactly what you're describing, maybe 
I
> > worded it weirdly
> > in my last response.
> >
> > This is not relevant for the architecture though.
> >
> >
> >>5. Better to wait for the completion rather than creating a new
> >>container.
> >>According to the workload, it would be better to wait for the
> >>previous
> >>execution rather than creating a new container because it takes upto
> >>500ms
> >>~ 1s.
> >>Even though the concurrency limit is more than 1, it still can happen
> >>if
> >>there is no logic to cumulate invocations and decide whether to
> >>create a
> >>new container or waiting for the existing container.
> >
> > The proposed asynchronous protocol between controller and
> ContainerManager
> > accomplishes this by design:
> >
> > If a controller does not have the resources to execute the current
> > request, it requests those resources.
> > The ContainerManager updates resources asynchronously.
> > The Controller will schedule the outstanding request as soon as it 
gets
> > resources for it. It does not care
> > if those resources are  becoming free because another request finished
> or
> > because it got a fresh container
> > from the ContainerManager. Requests will always be dispatched as soon 
as
> > resources are free.
> >
> >>6. HA of ContainerManager.
> >>Since it is mandatory to deploy the system without any downtime to
> >>use it
> >>for production, we need to support HA of ContainerManager.
> >>It means the state of ContainerManager should be replicated among
> >>replicas.
> >>(No matter which method we use between master/slave or clustering.)
> >>
> >>If ContainerManager knows about the status of each container, it
> >>would not
> >>be easy to support HA with its eventual consistent nature.
> >>If it does only know which containers are assigned to which
> >>controllers, it
> >>cannot handle the edge case as I mentioned above.
> >
> > I agree, HA is mandatory. Since the ContainerManager operates only on
> the
> > container creation/deletion path,
> > we can probably afford to persist its state into something like Redis.
> If
> > it crashes, the slave instance
> > can take over immediately without any eventual-consistency concerns or
> > downtime.
> >
> > Also note that a downtime in the ContainerManager will ONLY cause an
> > impact on the ability to create containers.
> > Workloads that already have containers created will continue to work
> just
> > fine.
> >
> >
> > Does that answer/mitigate your concerns?
> >
> > Cheers,
> > Markus
> >
> >
> >>To: dev@openwhisk.apache.org
> >>From: Dominic Kim <st...@gmail.com>
> >>Date: 07/23/2018 12:48PM
> >>Subject: Re: Proposal on a future architecture of OpenWhisk
> >>
> >>Dear Markus.
> >>
> >>I may not correctly understand the direction of new architecture.
> >>So let me describe my concerns in more details.
> >>
> >>Since that is a future architecture of OpenWhisk and requires many
> >>breaking
> >>changes, I think it should at least address all known issues.
> >>So I focused on figuring out whether it handles all issues which are
> >>reported in my proposal.
> >>(
> >>INVALID URI REMOVED
> >>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
> >>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
> >>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
> >>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
> >>)
> >>
> >>1. Docker daemon performance issue.
> >>
> >>The most critical issue is poor performance of docker daemon.
> >>Since it is not inherently designed for high throughput or concurrent
> >>processing, Docker daemon shows poor performance in comparison with
> >>OW.
> >>In OW(serverless) world, action execution can be finished within 5ms
> >>~
> >>10ms, but the Docker daemon shows 100 ~ 500ms latency.
> >>Still, we can take advantage of Prewarm and Warmed containers, but
> >>under
> >>the situation where container creation/deletion/pausing/resuming
> >>happen
> >>frequently and the situation lasted for long-term, the requests are
> >>delayed
> >>and even the Docker daemon crashed.
> >>So I think it is important to reduce the loads(requests) against the
> >>Docker
> >>daemon.
> >>
> >>That's the reason why I initially thought that a Warmed state would
> >>be kept
> >>for more than today's behavior.
> >>Today, containers would stay in the Warmed state only for 50ms, so it
> >>introduces PAUSE/RESUME in case action comes with the interval of
> >>more than
> >>50 ms such as 1 sec.
> >>This will lead to more loads on Docker daemon.
> >>
> >>And if the state of containers is changing like today, the state in
> >>ContainerManager would be frequently changing as well.
> >>This may induce a synchronization issue among controllers and, among
> >>ContainerManagers(in case there would be more than one
> >>ContainerManager).
> >>
> >>So I think containers should be running for more than today's
> >>pauseGrace
> >>time.
> >>With more than 1 concurrency limit per container, it would also be
> >>better
> >>to keep containers running(not paused) for more than 50ms.
> >>
> >>2. Proxy case.
> >>
> >>In the edge case where a container only exists in controller1, how
> >>can
> >>controller2 decide to proxy the request to controller1 rather than
> >>just
> >>creating its own container?
> >>If it asks to ContainerManager, ContainerManager should know the
> >>state of
> >>the container in controller1.
> >>If the container in controller1 is already busy, it would be better
> >>to
> >>create a new container in controller2 rather than proxying the
> >>requests to
> >>controller1.
> >>
> >>If it goes this way, ContainerManager should know all the status of
> >>containers in all controllers to make a right decision and it's not
> >>easy to
> >>synchronize all the status of containers in controllers.
> >>If it does not work like this, how can controller2 proxy requests to
> >>controller1 without any information about controller1's status?
> >>
> >>3. Intervention among multiple actions
> >>
> >>If the concurrency limit is 1, and the container lifecycle is managed
> >>like
> >>today, intervention among multiple actions can happen again.
> >>For example, the maximum number of containers which can be created by
> >>a
> >>user is 2, and ActionA and ActionB invocation requests come
> >>alternatively,
> >>controllers will try to remove and recreate containers again and
> >>again.
> >>I used an example with a small number of max container limit for
> >>simplicity, but it can happen with a higher limit as well.
> >>
> >>And though concurrency limit is more than 1 such as 3, it also can
> >>happen
> >>if actions come more quickly than the execution time of actions.
> >>
> >>4. Is concurrency per container controlled by users in a per-action
> >>based
> >>way?
> >>Let me clarify my question about concurrency limit.
> >>
> >>If concurrency per container limit is more than 1, there could be
> >>multiple
> >>actions being invoked at some point.
> >>If the action requires high memory footprint such as 200MB or 150MB,
> >>it can
> >>crash if the sum of memory usage of concurrent actions exceeds the
> >>container memory.
> >>(In our case(here), some users are executing headless-chrome and
> >>puppeteer
> >>within actions, so it could happen under the similar situation.)
> >>
> >>So I initially thought concurrency per container is controlled by
> >>users in
> >>a per-action based way.
> >>If concurrency per container is only configured by OW operators
> >>statically,
> >>some users may not be able to invoke their actions correctly in the
> >>worst
> >>case though operators increased the memory of the biggest container
> >>type.
> >>
> >>And not only for this case, there could be some more reasons that
> >>some
> >>users just want to invoke their actions without per-container
> >>concurrency
> >>but the others want it for better throughput.
> >>
> >>So we may need some logic for users to take care of per-container
> >>concurrency for each actions.
> >>
> >>5. Better to wait for the completion rather than creating a new
> >>container.
> >>According to the workload, it would be better to wait for the
> >>previous
> >>execution rather than creating a new container because it takes upto
> >>500ms
> >>~ 1s.
> >>Even though the concurrency limit is more than 1, it still can happen
> >>if
> >>there is no logic to cumulate invocations and decide whether to
> >>create a
> >>new container or waiting for the existing container.
> >>
> >>
> >>6. HA of ContainerManager.
> >>Since it is mandatory to deploy the system without any downtime to
> >>use it
> >>for production, we need to support HA of ContainerManager.
> >>It means the state of ContainerManager should be replicated among
> >>replicas.
> >>(No matter which method we use between master/slave or clustering.)
> >>
> >>If ContainerManager knows about the status of each container, it
> >>would not
> >>be easy to support HA with its eventual consistent nature.
> >>If it does only know which containers are assigned to which
> >>controllers, it
> >>cannot handle the edge case as I mentioned above.
> >>
> >>
> >>
> >>Since many parts of the architecture are not addressed yet, I think
> >>it
> >>would be better to separate each parts and discuss further deeply.
> >>But in the big picture, I think we need to figure out whether it can
> >>handle
> >>or at least alleviate all known issues or not first.
> >>
> >>
> >>Best regards,
> >>Dominic
> >>
> >>
> >>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
> >>
> >>>
> >>>
> >>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
> >>12:24:07 PM:
> >>> >
> >>> > On Logging, I think if you are considering enabling concurrent
> >>> > activation processing, you will encounter that the only approach
> >>to
> >>> > parsing logs to be associated with a specific activationId, is to
> >>> > force the log output to be structured, and always include the
> >>> > activationId with every log message. This requires a change at
> >>the
> >>> > action container layer, but the simpler thing to do is to
> >>encourage
> >>> > action containers to provide a structured logging context that
> >>> > action developers can (and must) use to generate logs.
> >>>
> >>> Good point.  I agree that if there is concurrent activation
> >>processing in
> >>> the container, structured logging is the only sensible thing to do.
> >>>
> >>>
> >>> --dave
> >>>
> >>
> >
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thömmes <ma...@apache.org>.

Hi David,

note that only the first few requests in a "frequent-load" pattern
would be proxied.

Say you have 2 controllers, 1 container. controller0 owns that
container. Heavy load comes in and hits two controllers. controller1
will proxy the first few requests to controller0 because it doesn't
yet own a container. controller0 will execute as much as it can and
immediately realize it needs more containers, so it asks for them. The
requests coming from controller1 will get rejected with a 503, stating
that controller0 is overloaded itself and controller1 should wait
because containers have already been requested.

The ContainerManager distributes containers evenly, so even though
controller1 hasn't asked for more containers just yet, it will get
some due to the redistribution by the CM. It also starts asking for
containers itself after it got the 503 by controller0.

Does that make sense? This isn't in the proposal yet but has been
discussed on this mail-thread. I'll aim to incorporate some of the
discussions into the proposal sooner or later.

Cheers,
Markus
Am Mi., 25. Juli 2018 um 16:54 Uhr schrieb David Breitgand <DA...@il.ibm.com>:
>
> >> Hope that clarifies
>
> Yes, it does, thanks. But I still have a question :)
>
> >> if you have N controllers in the system and M
> containers but N > M and all controllers manage their containers
> exclusively, you'll end up with controllers not having a container to
> manage at all.
>
> I am not sure how you arrive to this condition. It can only happen if the
> system is very under-utilized. Suppose there is no proxying. As  you
> write, as action invocations keep coming, every Controller will get at
> least one request. So, it will at least once ask ContainerManager to
> allocate a container.
>
> I agree that when action invocation is infrequent, it's better to have 1
> container rather than N for that action.
>
> However, if an action invocation is frequent, it's the other way around:
> you'd prefer having N containers rather than queueing or first going to
> ContainerManager, getting pointed to a Controller that has happens to have
> all containers for that action busy, which will result in that Controller
> again going to a ContainerManager and asking for a new container.
>
> So, how do we differentiate between the two cases? For frequently
> infrequent actions it saves, but for frequently executed actions, proxying
> will result in performance hit, will it?
>
> Thanks.
>
> -- david
>
>
>
>
> From:   "Markus Thömmes" <me...@googlemail.com.INVALID>
> To:     dev@openwhisk.apache.org
> Date:   25/07/2018 04:05 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi David,
>
> the problem is, that if you have N controllers in the system and M
> containers but N > M and all controllers manage their containers
> exclusively, you'll end up with controllers not having a container to
> manage at all.
> There's valid, very slow workload that needs to create only 1
> container, for example a slow cron trigger. Due to the round-robin
> nature of our front-door, eventually all controllers will get one of
> those requests at some point. Since they are by design not aware of
> the containers because they are managed by another controller they'd
> end-up asking for a newly created container. Given N controllers we'd
> always create at least N containers for any action eventually. That is
> wasteful.
>
> Instead, requests are proxied to a controller which we know manages a
> container for the given action (the ContainerManager knows that) and
> thereby bypass the need to create too many containers. If the load is
> too much to be handled by the M containers, the controllers managing
> those M will request new containers, which will get distributed to all
> controllers. Eventually, given enough load, all controllers will have
> containers to manage for each action.
>
> The ContainerManager only needs to know which controller has which
> container. It does not need to know in which state these containers
> are. If they are busy, the controller itself will request more
> resources accordingly.
>
> Hope that clarifies
>
> Cheers,
> Markus
>
> 2018-07-25 14:19 GMT+02:00 David Breitgand <DA...@il.ibm.com>:
> > Hi Markus,
> >
> > I'd like to better understand the edge case.
> >
> > Citing from the wiki.
> >
> >>> Edge case: If an action only has a very small amount of containers
> > (less than there are Controllers in the system), we have a problem with
> > the method described above.
> >
> > Isn't there always at least one controller in the system? I think the
> > problem is not the number of Controllers, but rather availability of
> > prewarm containers that these Controllers control. If all containers of
> > this Controller are busy at the moment, and concurrency level per
> > container is 1 and the invocation hit this controller, it cannot execute
> > the action immediately with one of its containers. Is that the problem
> > that is being solved?
> >
> >>> Since the front-door schedules round-robin or least-connected, it's
> > impossible to decide to which Controller the request needs to go to hit
> > that has a container available.
> > In this case, the other Controllers (who didn't get a container) act as
> a
> > proxy and send the request to a Controller that actually has a container
> > (maybe even via HTTP redirect). The ContainerManager decides which
> > Controllers will act as a proxy in this case, since its the instance
> that
> > distributes the containers.
> >>>
> >
> > When reading your proposal, I was under impression that ContainerManager
> > only knows about existence of containers allocated to the Controllers
> > (because they asked), but ContainerManager does not know about the state
> > of these containers at every given moment (i.e., whether they are being
> > busy with running some action or not). I don't see Controllers updating
> > ContainerManager about this in your diagrams.
> >
> > Thanks.
> >
> > -- david
> >
> >
> >
> >
> > From:   "Markus Thoemmes" <ma...@de.ibm.com>
> > To:     dev@openwhisk.apache.org
> > Date:   23/07/2018 02:21 PM
> > Subject:        Re: Proposal on a future architecture of OpenWhisk
> >
> >
> >
> > Hi Dominic,
> >
> > let's see if I can clarify the specific points one by one.
> >
> >>1. Docker daemon performance issue.
> >>
> >>...
> >>
> >>That's the reason why I initially thought that a Warmed state would
> >>be kept
> >>for more than today's behavior.
> >>Today, containers would stay in the Warmed state only for 50ms, so it
> >>introduces PAUSE/RESUME in case action comes with the interval of
> >>more than
> >>50 ms such as 1 sec.
> >>This will lead to more loads on Docker daemon.
> >
> > You're right that the docker daemon's throughput is indeed an issue.
> >
> > Please note that PAUSE/RESUME are not executed via the docker daemon in
> > performance
> > tuned environment but rather done via runc, which does not have such a
> > throughput
> > issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms
> > for each
> > operation.
> >
> > Further, the duration of the pauseGrace is not related to the overall
> > architecture at
> > all. Rather, it's a so narrow to safe-guard against users stealing
> cycles
> > from the
> > vendor's infrastructure. It's also a configurable value so you can tweak
> > it as you
> > want.
> >
> > The proposed architecture itself has no impact on the pauseGrace.
> >
> >>
> >>And if the state of containers is changing like today, the state in
> >>ContainerManager would be frequently changing as well.
> >>This may induce a synchronization issue among controllers and, among
> >>ContainerManagers(in case there would be more than one
> >>ContainerManager).
> >
> > The ContainerManager will NOT be informed about pause/unpause state
> > changes and it
> > doesn't need to. I agree that such a behavior would generate serious
> load
> > on the
> > ContainerManager, but I think it's unnecessary.
> >
> >>2. Proxy case.
> >>
> >>...
> >>
> >>If it goes this way, ContainerManager should know all the status of
> >>containers in all controllers to make a right decision and it's not
> >>easy to
> >>synchronize all the status of containers in controllers.
> >>If it does not work like this, how can controller2 proxy requests to
> >>controller1 without any information about controller1's status?
> >
> >
> > The ContainerManager distributes a list of containers across all
> > controllers.
> > If it does not have enough containers at hand to give one to each
> > controller,
> > it instead tells controller2 to proxy to controller1, because the
> > ContainerManager
> > knows at distribution-time, that controller1 has such a container.
> >
> > No synchronization needed between controllers at all.
> >
> > If controller1 gets more requests than the single container can handle,
> it
> > will
> > request more containers, so eventually controller2 will get its own.
> >
> > Please refer to
> >
> https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E
>
> >
> > for more information on that protocol.
> >
> >
> >>3. Intervention among multiple actions
> >>
> >>If the concurrency limit is 1, and the container lifecycle is managed
> >>like
> >>today, intervention among multiple actions can happen again.
> >>For example, the maximum number of containers which can be created by
> >>a
> >>user is 2, and ActionA and ActionB invocation requests come
> >>alternatively,
> >>controllers will try to remove and recreate containers again and
> >>again.
> >>I used an example with a small number of max container limit for
> >>simplicity, but it can happen with a higher limit as well.
> >>
> >>And though concurrency limit is more than 1 such as 3, it also can
> >>happen
> >>if actions come more quickly than the execution time of actions.
> >
> > The controller will never try to delete a container at all, neither does
> > it's
> > pool of managed containers has a limit.
> > If it doesn't have a container for ActionA it will request one from the
> > ContainerManager.
> > If it doesn't have one for ActionB it will request one from the
> > ContainerManager.
> >
> > There will be 2 containers in the system and assuming that the
> > ContainerManager has enough
> > resources to keep those 2 containers alive, it will not delete them.
> >
> > The controllers by design cannot cause the behavior you're describing.
> The
> > architecture is
> > actually build around fixing this exact issue (eviction due to multiple
> > heavy users in the
> > system).
> >
> >>4. Is concurrency per container controlled by users in a per-action
> >>based
> >>way?
> >>Let me clarify my question about concurrency limit.
> >>
> >>If concurrency per container limit is more than 1, there could be
> >>multiple
> >>actions being invoked at some point.
> >>If the action requires high memory footprint such as 200MB or 150MB,
> >>it can
> >>crash if the sum of memory usage of concurrent actions exceeds the
> >>container memory.
> >>(In our case(here), some users are executing headless-chrome and
> >>puppeteer
> >>within actions, so it could happen under the similar situation.)
> >>
> >>So I initially thought concurrency per container is controlled by
> >>users in
> >>a per-action based way.
> >>If concurrency per container is only configured by OW operators
> >>statically,
> >>some users may not be able to invoke their actions correctly in the
> >>worst
> >>case though operators increased the memory of the biggest container
> >>type.
> >>
> >>And not only for this case, there could be some more reasons that
> >>some
> >>users just want to invoke their actions without per-container
> >>concurrency
> >>but the others want it for better throughput.
> >>
> >>So we may need some logic for users to take care of per-container
> >>concurrency for each actions.
> >
> > Yes, the intention is to provide exactly what you're describing, maybe I
> > worded it weirdly
> > in my last response.
> >
> > This is not relevant for the architecture though.
> >
> >
> >>5. Better to wait for the completion rather than creating a new
> >>container.
> >>According to the workload, it would be better to wait for the
> >>previous
> >>execution rather than creating a new container because it takes upto
> >>500ms
> >>~ 1s.
> >>Even though the concurrency limit is more than 1, it still can happen
> >>if
> >>there is no logic to cumulate invocations and decide whether to
> >>create a
> >>new container or waiting for the existing container.
> >
> > The proposed asynchronous protocol between controller and
> ContainerManager
> > accomplishes this by design:
> >
> > If a controller does not have the resources to execute the current
> > request, it requests those resources.
> > The ContainerManager updates resources asynchronously.
> > The Controller will schedule the outstanding request as soon as it gets
> > resources for it. It does not care
> > if those resources are  becoming free because another request finished
> or
> > because it got a fresh container
> > from the ContainerManager. Requests will always be dispatched as soon as
> > resources are free.
> >
> >>6. HA of ContainerManager.
> >>Since it is mandatory to deploy the system without any downtime to
> >>use it
> >>for production, we need to support HA of ContainerManager.
> >>It means the state of ContainerManager should be replicated among
> >>replicas.
> >>(No matter which method we use between master/slave or clustering.)
> >>
> >>If ContainerManager knows about the status of each container, it
> >>would not
> >>be easy to support HA with its eventual consistent nature.
> >>If it does only know which containers are assigned to which
> >>controllers, it
> >>cannot handle the edge case as I mentioned above.
> >
> > I agree, HA is mandatory. Since the ContainerManager operates only on
> the
> > container creation/deletion path,
> > we can probably afford to persist its state into something like Redis.
> If
> > it crashes, the slave instance
> > can take over immediately without any eventual-consistency concerns or
> > downtime.
> >
> > Also note that a downtime in the ContainerManager will ONLY cause an
> > impact on the ability to create containers.
> > Workloads that already have containers created will continue to work
> just
> > fine.
> >
> >
> > Does that answer/mitigate your concerns?
> >
> > Cheers,
> > Markus
> >
> >
> >>To: dev@openwhisk.apache.org
> >>From: Dominic Kim <st...@gmail.com>
> >>Date: 07/23/2018 12:48PM
> >>Subject: Re: Proposal on a future architecture of OpenWhisk
> >>
> >>Dear Markus.
> >>
> >>I may not correctly understand the direction of new architecture.
> >>So let me describe my concerns in more details.
> >>
> >>Since that is a future architecture of OpenWhisk and requires many
> >>breaking
> >>changes, I think it should at least address all known issues.
> >>So I focused on figuring out whether it handles all issues which are
> >>reported in my proposal.
> >>(
> >>INVALID URI REMOVED
> >>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
> >>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
> >>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
> >>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
> >>)
> >>
> >>1. Docker daemon performance issue.
> >>
> >>The most critical issue is poor performance of docker daemon.
> >>Since it is not inherently designed for high throughput or concurrent
> >>processing, Docker daemon shows poor performance in comparison with
> >>OW.
> >>In OW(serverless) world, action execution can be finished within 5ms
> >>~
> >>10ms, but the Docker daemon shows 100 ~ 500ms latency.
> >>Still, we can take advantage of Prewarm and Warmed containers, but
> >>under
> >>the situation where container creation/deletion/pausing/resuming
> >>happen
> >>frequently and the situation lasted for long-term, the requests are
> >>delayed
> >>and even the Docker daemon crashed.
> >>So I think it is important to reduce the loads(requests) against the
> >>Docker
> >>daemon.
> >>
> >>That's the reason why I initially thought that a Warmed state would
> >>be kept
> >>for more than today's behavior.
> >>Today, containers would stay in the Warmed state only for 50ms, so it
> >>introduces PAUSE/RESUME in case action comes with the interval of
> >>more than
> >>50 ms such as 1 sec.
> >>This will lead to more loads on Docker daemon.
> >>
> >>And if the state of containers is changing like today, the state in
> >>ContainerManager would be frequently changing as well.
> >>This may induce a synchronization issue among controllers and, among
> >>ContainerManagers(in case there would be more than one
> >>ContainerManager).
> >>
> >>So I think containers should be running for more than today's
> >>pauseGrace
> >>time.
> >>With more than 1 concurrency limit per container, it would also be
> >>better
> >>to keep containers running(not paused) for more than 50ms.
> >>
> >>2. Proxy case.
> >>
> >>In the edge case where a container only exists in controller1, how
> >>can
> >>controller2 decide to proxy the request to controller1 rather than
> >>just
> >>creating its own container?
> >>If it asks to ContainerManager, ContainerManager should know the
> >>state of
> >>the container in controller1.
> >>If the container in controller1 is already busy, it would be better
> >>to
> >>create a new container in controller2 rather than proxying the
> >>requests to
> >>controller1.
> >>
> >>If it goes this way, ContainerManager should know all the status of
> >>containers in all controllers to make a right decision and it's not
> >>easy to
> >>synchronize all the status of containers in controllers.
> >>If it does not work like this, how can controller2 proxy requests to
> >>controller1 without any information about controller1's status?
> >>
> >>3. Intervention among multiple actions
> >>
> >>If the concurrency limit is 1, and the container lifecycle is managed
> >>like
> >>today, intervention among multiple actions can happen again.
> >>For example, the maximum number of containers which can be created by
> >>a
> >>user is 2, and ActionA and ActionB invocation requests come
> >>alternatively,
> >>controllers will try to remove and recreate containers again and
> >>again.
> >>I used an example with a small number of max container limit for
> >>simplicity, but it can happen with a higher limit as well.
> >>
> >>And though concurrency limit is more than 1 such as 3, it also can
> >>happen
> >>if actions come more quickly than the execution time of actions.
> >>
> >>4. Is concurrency per container controlled by users in a per-action
> >>based
> >>way?
> >>Let me clarify my question about concurrency limit.
> >>
> >>If concurrency per container limit is more than 1, there could be
> >>multiple
> >>actions being invoked at some point.
> >>If the action requires high memory footprint such as 200MB or 150MB,
> >>it can
> >>crash if the sum of memory usage of concurrent actions exceeds the
> >>container memory.
> >>(In our case(here), some users are executing headless-chrome and
> >>puppeteer
> >>within actions, so it could happen under the similar situation.)
> >>
> >>So I initially thought concurrency per container is controlled by
> >>users in
> >>a per-action based way.
> >>If concurrency per container is only configured by OW operators
> >>statically,
> >>some users may not be able to invoke their actions correctly in the
> >>worst
> >>case though operators increased the memory of the biggest container
> >>type.
> >>
> >>And not only for this case, there could be some more reasons that
> >>some
> >>users just want to invoke their actions without per-container
> >>concurrency
> >>but the others want it for better throughput.
> >>
> >>So we may need some logic for users to take care of per-container
> >>concurrency for each actions.
> >>
> >>5. Better to wait for the completion rather than creating a new
> >>container.
> >>According to the workload, it would be better to wait for the
> >>previous
> >>execution rather than creating a new container because it takes upto
> >>500ms
> >>~ 1s.
> >>Even though the concurrency limit is more than 1, it still can happen
> >>if
> >>there is no logic to cumulate invocations and decide whether to
> >>create a
> >>new container or waiting for the existing container.
> >>
> >>
> >>6. HA of ContainerManager.
> >>Since it is mandatory to deploy the system without any downtime to
> >>use it
> >>for production, we need to support HA of ContainerManager.
> >>It means the state of ContainerManager should be replicated among
> >>replicas.
> >>(No matter which method we use between master/slave or clustering.)
> >>
> >>If ContainerManager knows about the status of each container, it
> >>would not
> >>be easy to support HA with its eventual consistent nature.
> >>If it does only know which containers are assigned to which
> >>controllers, it
> >>cannot handle the edge case as I mentioned above.
> >>
> >>
> >>
> >>Since many parts of the architecture are not addressed yet, I think
> >>it
> >>would be better to separate each parts and discuss further deeply.
> >>But in the big picture, I think we need to figure out whether it can
> >>handle
> >>or at least alleviate all known issues or not first.
> >>
> >>
> >>Best regards,
> >>Dominic
> >>
> >>
> >>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
> >>
> >>>
> >>>
> >>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
> >>12:24:07 PM:
> >>> >
> >>> > On Logging, I think if you are considering enabling concurrent
> >>> > activation processing, you will encounter that the only approach
> >>to
> >>> > parsing logs to be associated with a specific activationId, is to
> >>> > force the log output to be structured, and always include the
> >>> > activationId with every log message. This requires a change at
> >>the
> >>> > action container layer, but the simpler thing to do is to
> >>encourage
> >>> > action containers to provide a structured logging context that
> >>> > action developers can (and must) use to generate logs.
> >>>
> >>> Good point.  I agree that if there is concurrent activation
> >>processing in
> >>> the container, structured logging is the only sensible thing to do.
> >>>
> >>>
> >>> --dave
> >>>
> >>
> >
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by David Breitgand <DA...@il.ibm.com>.

>> Hope that clarifies

Yes, it does, thanks. But I still have a question :)

>> if you have N controllers in the system and M
containers but N > M and all controllers manage their containers
exclusively, you'll end up with controllers not having a container to
manage at all.

I am not sure how you arrive to this condition. It can only happen if the 
system is very under-utilized. Suppose there is no proxying. As  you 
write, as action invocations keep coming, every Controller will get at 
least one request. So, it will at least once ask ContainerManager to 
allocate a container.

I agree that when action invocation is infrequent, it's better to have 1 
container rather than N for that action. 

However, if an action invocation is frequent, it's the other way around: 
you'd prefer having N containers rather than queueing or first going to 
ContainerManager, getting pointed to a Controller that has happens to have 
all containers for that action busy, which will result in that Controller 
again going to a ContainerManager and asking for a new container. 

So, how do we differentiate between the two cases? For frequently 
infrequent actions it saves, but for frequently executed actions, proxying 
will result in performance hit, will it?

Thanks.

-- david 




From:   "Markus Thömmes" <me...@googlemail.com.INVALID>
To:     dev@openwhisk.apache.org
Date:   25/07/2018 04:05 PM
Subject:        Re: Proposal on a future architecture of OpenWhisk



Hi David,

the problem is, that if you have N controllers in the system and M
containers but N > M and all controllers manage their containers
exclusively, you'll end up with controllers not having a container to
manage at all.
There's valid, very slow workload that needs to create only 1
container, for example a slow cron trigger. Due to the round-robin
nature of our front-door, eventually all controllers will get one of
those requests at some point. Since they are by design not aware of
the containers because they are managed by another controller they'd
end-up asking for a newly created container. Given N controllers we'd
always create at least N containers for any action eventually. That is
wasteful.

Instead, requests are proxied to a controller which we know manages a
container for the given action (the ContainerManager knows that) and
thereby bypass the need to create too many containers. If the load is
too much to be handled by the M containers, the controllers managing
those M will request new containers, which will get distributed to all
controllers. Eventually, given enough load, all controllers will have
containers to manage for each action.

The ContainerManager only needs to know which controller has which
container. It does not need to know in which state these containers
are. If they are busy, the controller itself will request more
resources accordingly.

Hope that clarifies

Cheers,
Markus

2018-07-25 14:19 GMT+02:00 David Breitgand <DA...@il.ibm.com>:
> Hi Markus,
>
> I'd like to better understand the edge case.
>
> Citing from the wiki.
>
>>> Edge case: If an action only has a very small amount of containers
> (less than there are Controllers in the system), we have a problem with
> the method described above.
>
> Isn't there always at least one controller in the system? I think the
> problem is not the number of Controllers, but rather availability of
> prewarm containers that these Controllers control. If all containers of
> this Controller are busy at the moment, and concurrency level per
> container is 1 and the invocation hit this controller, it cannot execute
> the action immediately with one of its containers. Is that the problem
> that is being solved?
>
>>> Since the front-door schedules round-robin or least-connected, it's
> impossible to decide to which Controller the request needs to go to hit
> that has a container available.
> In this case, the other Controllers (who didn't get a container) act as 
a
> proxy and send the request to a Controller that actually has a container
> (maybe even via HTTP redirect). The ContainerManager decides which
> Controllers will act as a proxy in this case, since its the instance 
that
> distributes the containers.
>>>
>
> When reading your proposal, I was under impression that ContainerManager
> only knows about existence of containers allocated to the Controllers
> (because they asked), but ContainerManager does not know about the state
> of these containers at every given moment (i.e., whether they are being
> busy with running some action or not). I don't see Controllers updating
> ContainerManager about this in your diagrams.
>
> Thanks.
>
> -- david
>
>
>
>
> From:   "Markus Thoemmes" <ma...@de.ibm.com>
> To:     dev@openwhisk.apache.org
> Date:   23/07/2018 02:21 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi Dominic,
>
> let's see if I can clarify the specific points one by one.
>
>>1. Docker daemon performance issue.
>>
>>...
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>
> You're right that the docker daemon's throughput is indeed an issue.
>
> Please note that PAUSE/RESUME are not executed via the docker daemon in
> performance
> tuned environment but rather done via runc, which does not have such a
> throughput
> issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms
> for each
> operation.
>
> Further, the duration of the pauseGrace is not related to the overall
> architecture at
> all. Rather, it's a so narrow to safe-guard against users stealing 
cycles
> from the
> vendor's infrastructure. It's also a configurable value so you can tweak
> it as you
> want.
>
> The proposed architecture itself has no impact on the pauseGrace.
>
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>
> The ContainerManager will NOT be informed about pause/unpause state
> changes and it
> doesn't need to. I agree that such a behavior would generate serious 
load
> on the
> ContainerManager, but I think it's unnecessary.
>
>>2. Proxy case.
>>
>>...
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>
>
> The ContainerManager distributes a list of containers across all
> controllers.
> If it does not have enough containers at hand to give one to each
> controller,
> it instead tells controller2 to proxy to controller1, because the
> ContainerManager
> knows at distribution-time, that controller1 has such a container.
>
> No synchronization needed between controllers at all.
>
> If controller1 gets more requests than the single container can handle, 
it
> will
> request more containers, so eventually controller2 will get its own.
>
> Please refer to
> 
https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E

>
> for more information on that protocol.
>
>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>
> The controller will never try to delete a container at all, neither does
> it's
> pool of managed containers has a limit.
> If it doesn't have a container for ActionA it will request one from the
> ContainerManager.
> If it doesn't have one for ActionB it will request one from the
> ContainerManager.
>
> There will be 2 containers in the system and assuming that the
> ContainerManager has enough
> resources to keep those 2 containers alive, it will not delete them.
>
> The controllers by design cannot cause the behavior you're describing. 
The
> architecture is
> actually build around fixing this exact issue (eviction due to multiple
> heavy users in the
> system).
>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>
> Yes, the intention is to provide exactly what you're describing, maybe I
> worded it weirdly
> in my last response.
>
> This is not relevant for the architecture though.
>
>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>
> The proposed asynchronous protocol between controller and 
ContainerManager
> accomplishes this by design:
>
> If a controller does not have the resources to execute the current
> request, it requests those resources.
> The ContainerManager updates resources asynchronously.
> The Controller will schedule the outstanding request as soon as it gets
> resources for it. It does not care
> if those resources are  becoming free because another request finished 
or
> because it got a fresh container
> from the ContainerManager. Requests will always be dispatched as soon as
> resources are free.
>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>
> I agree, HA is mandatory. Since the ContainerManager operates only on 
the
> container creation/deletion path,
> we can probably afford to persist its state into something like Redis. 
If
> it crashes, the slave instance
> can take over immediately without any eventual-consistency concerns or
> downtime.
>
> Also note that a downtime in the ContainerManager will ONLY cause an
> impact on the ability to create containers.
> Workloads that already have containers created will continue to work 
just
> fine.
>
>
> Does that answer/mitigate your concerns?
>
> Cheers,
> Markus
>
>
>>To: dev@openwhisk.apache.org
>>From: Dominic Kim <st...@gmail.com>
>>Date: 07/23/2018 12:48PM
>>Subject: Re: Proposal on a future architecture of OpenWhisk
>>
>>Dear Markus.
>>
>>I may not correctly understand the direction of new architecture.
>>So let me describe my concerns in more details.
>>
>>Since that is a future architecture of OpenWhisk and requires many
>>breaking
>>changes, I think it should at least address all known issues.
>>So I focused on figuring out whether it handles all issues which are
>>reported in my proposal.
>>(
>>INVALID URI REMOVED
>>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
>>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
>>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
>>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
>>)
>>
>>1. Docker daemon performance issue.
>>
>>The most critical issue is poor performance of docker daemon.
>>Since it is not inherently designed for high throughput or concurrent
>>processing, Docker daemon shows poor performance in comparison with
>>OW.
>>In OW(serverless) world, action execution can be finished within 5ms
>>~
>>10ms, but the Docker daemon shows 100 ~ 500ms latency.
>>Still, we can take advantage of Prewarm and Warmed containers, but
>>under
>>the situation where container creation/deletion/pausing/resuming
>>happen
>>frequently and the situation lasted for long-term, the requests are
>>delayed
>>and even the Docker daemon crashed.
>>So I think it is important to reduce the loads(requests) against the
>>Docker
>>daemon.
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>>
>>So I think containers should be running for more than today's
>>pauseGrace
>>time.
>>With more than 1 concurrency limit per container, it would also be
>>better
>>to keep containers running(not paused) for more than 50ms.
>>
>>2. Proxy case.
>>
>>In the edge case where a container only exists in controller1, how
>>can
>>controller2 decide to proxy the request to controller1 rather than
>>just
>>creating its own container?
>>If it asks to ContainerManager, ContainerManager should know the
>>state of
>>the container in controller1.
>>If the container in controller1 is already busy, it would be better
>>to
>>create a new container in controller2 rather than proxying the
>>requests to
>>controller1.
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>>
>>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>>
>>
>>
>>Since many parts of the architecture are not addressed yet, I think
>>it
>>would be better to separate each parts and discuss further deeply.
>>But in the big picture, I think we need to figure out whether it can
>>handle
>>or at least alleviate all known issues or not first.
>>
>>
>>Best regards,
>>Dominic
>>
>>
>>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
>>
>>>
>>>
>>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
>>12:24:07 PM:
>>> >
>>> > On Logging, I think if you are considering enabling concurrent
>>> > activation processing, you will encounter that the only approach
>>to
>>> > parsing logs to be associated with a specific activationId, is to
>>> > force the log output to be structured, and always include the
>>> > activationId with every log message. This requires a change at
>>the
>>> > action container layer, but the simpler thing to do is to
>>encourage
>>> > action containers to provide a structured logging context that
>>> > action developers can (and must) use to generate logs.
>>>
>>> Good point.  I agree that if there is concurrent activation
>>processing in
>>> the container, structured logging is the only sensible thing to do.
>>>
>>>
>>> --dave
>>>
>>
>
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thömmes <me...@googlemail.com.INVALID>.

Hi David,

the problem is, that if you have N controllers in the system and M
containers but N > M and all controllers manage their containers
exclusively, you'll end up with controllers not having a container to
manage at all.
There's valid, very slow workload that needs to create only 1
container, for example a slow cron trigger. Due to the round-robin
nature of our front-door, eventually all controllers will get one of
those requests at some point. Since they are by design not aware of
the containers because they are managed by another controller they'd
end-up asking for a newly created container. Given N controllers we'd
always create at least N containers for any action eventually. That is
wasteful.

Instead, requests are proxied to a controller which we know manages a
container for the given action (the ContainerManager knows that) and
thereby bypass the need to create too many containers. If the load is
too much to be handled by the M containers, the controllers managing
those M will request new containers, which will get distributed to all
controllers. Eventually, given enough load, all controllers will have
containers to manage for each action.

The ContainerManager only needs to know which controller has which
container. It does not need to know in which state these containers
are. If they are busy, the controller itself will request more
resources accordingly.

Hope that clarifies

Cheers,
Markus

2018-07-25 14:19 GMT+02:00 David Breitgand <DA...@il.ibm.com>:
> Hi Markus,
>
> I'd like to better understand the edge case.
>
> Citing from the wiki.
>
>>> Edge case: If an action only has a very small amount of containers
> (less than there are Controllers in the system), we have a problem with
> the method described above.
>
> Isn't there always at least one controller in the system? I think the
> problem is not the number of Controllers, but rather availability of
> prewarm containers that these Controllers control. If all containers of
> this Controller are busy at the moment, and concurrency level per
> container is 1 and the invocation hit this controller, it cannot execute
> the action immediately with one of its containers. Is that the problem
> that is being solved?
>
>>> Since the front-door schedules round-robin or least-connected, it's
> impossible to decide to which Controller the request needs to go to hit
> that has a container available.
> In this case, the other Controllers (who didn't get a container) act as a
> proxy and send the request to a Controller that actually has a container
> (maybe even via HTTP redirect). The ContainerManager decides which
> Controllers will act as a proxy in this case, since its the instance that
> distributes the containers.
>>>
>
> When reading your proposal, I was under impression that ContainerManager
> only knows about existence of containers allocated to the Controllers
> (because they asked), but ContainerManager does not know about the state
> of these containers at every given moment (i.e., whether they are being
> busy with running some action or not). I don't see Controllers updating
> ContainerManager about this in your diagrams.
>
> Thanks.
>
> -- david
>
>
>
>
> From:   "Markus Thoemmes" <ma...@de.ibm.com>
> To:     dev@openwhisk.apache.org
> Date:   23/07/2018 02:21 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi Dominic,
>
> let's see if I can clarify the specific points one by one.
>
>>1. Docker daemon performance issue.
>>
>>...
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>
> You're right that the docker daemon's throughput is indeed an issue.
>
> Please note that PAUSE/RESUME are not executed via the docker daemon in
> performance
> tuned environment but rather done via runc, which does not have such a
> throughput
> issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms
> for each
> operation.
>
> Further, the duration of the pauseGrace is not related to the overall
> architecture at
> all. Rather, it's a so narrow to safe-guard against users stealing cycles
> from the
> vendor's infrastructure. It's also a configurable value so you can tweak
> it as you
> want.
>
> The proposed architecture itself has no impact on the pauseGrace.
>
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>
> The ContainerManager will NOT be informed about pause/unpause state
> changes and it
> doesn't need to. I agree that such a behavior would generate serious load
> on the
> ContainerManager, but I think it's unnecessary.
>
>>2. Proxy case.
>>
>>...
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>
>
> The ContainerManager distributes a list of containers across all
> controllers.
> If it does not have enough containers at hand to give one to each
> controller,
> it instead tells controller2 to proxy to controller1, because the
> ContainerManager
> knows at distribution-time, that controller1 has such a container.
>
> No synchronization needed between controllers at all.
>
> If controller1 gets more requests than the single container can handle, it
> will
> request more containers, so eventually controller2 will get its own.
>
> Please refer to
> https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E
>
> for more information on that protocol.
>
>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>
> The controller will never try to delete a container at all, neither does
> it's
> pool of managed containers has a limit.
> If it doesn't have a container for ActionA it will request one from the
> ContainerManager.
> If it doesn't have one for ActionB it will request one from the
> ContainerManager.
>
> There will be 2 containers in the system and assuming that the
> ContainerManager has enough
> resources to keep those 2 containers alive, it will not delete them.
>
> The controllers by design cannot cause the behavior you're describing. The
> architecture is
> actually build around fixing this exact issue (eviction due to multiple
> heavy users in the
> system).
>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>
> Yes, the intention is to provide exactly what you're describing, maybe I
> worded it weirdly
> in my last response.
>
> This is not relevant for the architecture though.
>
>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>
> The proposed asynchronous protocol between controller and ContainerManager
> accomplishes this by design:
>
> If a controller does not have the resources to execute the current
> request, it requests those resources.
> The ContainerManager updates resources asynchronously.
> The Controller will schedule the outstanding request as soon as it gets
> resources for it. It does not care
> if those resources are  becoming free because another request finished or
> because it got a fresh container
> from the ContainerManager. Requests will always be dispatched as soon as
> resources are free.
>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>
> I agree, HA is mandatory. Since the ContainerManager operates only on the
> container creation/deletion path,
> we can probably afford to persist its state into something like Redis. If
> it crashes, the slave instance
> can take over immediately without any eventual-consistency concerns or
> downtime.
>
> Also note that a downtime in the ContainerManager will ONLY cause an
> impact on the ability to create containers.
> Workloads that already have containers created will continue to work just
> fine.
>
>
> Does that answer/mitigate your concerns?
>
> Cheers,
> Markus
>
>
>>To: dev@openwhisk.apache.org
>>From: Dominic Kim <st...@gmail.com>
>>Date: 07/23/2018 12:48PM
>>Subject: Re: Proposal on a future architecture of OpenWhisk
>>
>>Dear Markus.
>>
>>I may not correctly understand the direction of new architecture.
>>So let me describe my concerns in more details.
>>
>>Since that is a future architecture of OpenWhisk and requires many
>>breaking
>>changes, I think it should at least address all known issues.
>>So I focused on figuring out whether it handles all issues which are
>>reported in my proposal.
>>(
>>INVALID URI REMOVED
>>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
>>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
>>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
>>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
>>)
>>
>>1. Docker daemon performance issue.
>>
>>The most critical issue is poor performance of docker daemon.
>>Since it is not inherently designed for high throughput or concurrent
>>processing, Docker daemon shows poor performance in comparison with
>>OW.
>>In OW(serverless) world, action execution can be finished within 5ms
>>~
>>10ms, but the Docker daemon shows 100 ~ 500ms latency.
>>Still, we can take advantage of Prewarm and Warmed containers, but
>>under
>>the situation where container creation/deletion/pausing/resuming
>>happen
>>frequently and the situation lasted for long-term, the requests are
>>delayed
>>and even the Docker daemon crashed.
>>So I think it is important to reduce the loads(requests) against the
>>Docker
>>daemon.
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>>
>>So I think containers should be running for more than today's
>>pauseGrace
>>time.
>>With more than 1 concurrency limit per container, it would also be
>>better
>>to keep containers running(not paused) for more than 50ms.
>>
>>2. Proxy case.
>>
>>In the edge case where a container only exists in controller1, how
>>can
>>controller2 decide to proxy the request to controller1 rather than
>>just
>>creating its own container?
>>If it asks to ContainerManager, ContainerManager should know the
>>state of
>>the container in controller1.
>>If the container in controller1 is already busy, it would be better
>>to
>>create a new container in controller2 rather than proxying the
>>requests to
>>controller1.
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>>
>>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>>
>>
>>
>>Since many parts of the architecture are not addressed yet, I think
>>it
>>would be better to separate each parts and discuss further deeply.
>>But in the big picture, I think we need to figure out whether it can
>>handle
>>or at least alleviate all known issues or not first.
>>
>>
>>Best regards,
>>Dominic
>>
>>
>>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
>>
>>>
>>>
>>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
>>12:24:07 PM:
>>> >
>>> > On Logging, I think if you are considering enabling concurrent
>>> > activation processing, you will encounter that the only approach
>>to
>>> > parsing logs to be associated with a specific activationId, is to
>>> > force the log output to be structured, and always include the
>>> > activationId with every log message. This requires a change at
>>the
>>> > action container layer, but the simpler thing to do is to
>>encourage
>>> > action containers to provide a structured logging context that
>>> > action developers can (and must) use to generate logs.
>>>
>>> Good point.  I agree that if there is concurrent activation
>>processing in
>>> the container, structured logging is the only sensible thing to do.
>>>
>>>
>>> --dave
>>>
>>
>
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by David Breitgand <DA...@il.ibm.com>.

Hi Markus, 

I'd like to better understand the edge case.

Citing from the wiki.

>> Edge case: If an action only has a very small amount of containers 
(less than there are Controllers in the system), we have a problem with 
the method described above. 

Isn't there always at least one controller in the system? I think the 
problem is not the number of Controllers, but rather availability of 
prewarm containers that these Controllers control. If all containers of 
this Controller are busy at the moment, and concurrency level per 
container is 1 and the invocation hit this controller, it cannot execute 
the action immediately with one of its containers. Is that the problem 
that is being solved? 

>> Since the front-door schedules round-robin or least-connected, it's 
impossible to decide to which Controller the request needs to go to hit 
that has a container available.
In this case, the other Controllers (who didn't get a container) act as a 
proxy and send the request to a Controller that actually has a container 
(maybe even via HTTP redirect). The ContainerManager decides which 
Controllers will act as a proxy in this case, since its the instance that 
distributes the containers. 
>>

When reading your proposal, I was under impression that ContainerManager 
only knows about existence of containers allocated to the Controllers 
(because they asked), but ContainerManager does not know about the state 
of these containers at every given moment (i.e., whether they are being 
busy with running some action or not). I don't see Controllers updating 
ContainerManager about this in your diagrams. 

Thanks.

-- david 




From:   "Markus Thoemmes" <ma...@de.ibm.com>
To:     dev@openwhisk.apache.org
Date:   23/07/2018 02:21 PM
Subject:        Re: Proposal on a future architecture of OpenWhisk



Hi Dominic,

let's see if I can clarify the specific points one by one.

>1. Docker daemon performance issue.
>
>...
>
>That's the reason why I initially thought that a Warmed state would
>be kept
>for more than today's behavior.
>Today, containers would stay in the Warmed state only for 50ms, so it
>introduces PAUSE/RESUME in case action comes with the interval of
>more than
>50 ms such as 1 sec.
>This will lead to more loads on Docker daemon.

You're right that the docker daemon's throughput is indeed an issue.

Please note that PAUSE/RESUME are not executed via the docker daemon in 
performance
tuned environment but rather done via runc, which does not have such a 
throughput
issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms 
for each
operation.

Further, the duration of the pauseGrace is not related to the overall 
architecture at
all. Rather, it's a so narrow to safe-guard against users stealing cycles 
from the
vendor's infrastructure. It's also a configurable value so you can tweak 
it as you
want.

The proposed architecture itself has no impact on the pauseGrace.

>
>And if the state of containers is changing like today, the state in
>ContainerManager would be frequently changing as well.
>This may induce a synchronization issue among controllers and, among
>ContainerManagers(in case there would be more than one
>ContainerManager).

The ContainerManager will NOT be informed about pause/unpause state 
changes and it
doesn't need to. I agree that such a behavior would generate serious load 
on the
ContainerManager, but I think it's unnecessary.

>2. Proxy case.
>
>...
>
>If it goes this way, ContainerManager should know all the status of
>containers in all controllers to make a right decision and it's not
>easy to
>synchronize all the status of containers in controllers.
>If it does not work like this, how can controller2 proxy requests to
>controller1 without any information about controller1's status?


The ContainerManager distributes a list of containers across all 
controllers.
If it does not have enough containers at hand to give one to each 
controller,
it instead tells controller2 to proxy to controller1, because the 
ContainerManager
knows at distribution-time, that controller1 has such a container.

No synchronization needed between controllers at all.

If controller1 gets more requests than the single container can handle, it 
will
request more containers, so eventually controller2 will get its own.

Please refer to 
https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E

for more information on that protocol.


>3. Intervention among multiple actions
>
>If the concurrency limit is 1, and the container lifecycle is managed
>like
>today, intervention among multiple actions can happen again.
>For example, the maximum number of containers which can be created by
>a
>user is 2, and ActionA and ActionB invocation requests come
>alternatively,
>controllers will try to remove and recreate containers again and
>again.
>I used an example with a small number of max container limit for
>simplicity, but it can happen with a higher limit as well.
>
>And though concurrency limit is more than 1 such as 3, it also can
>happen
>if actions come more quickly than the execution time of actions.

The controller will never try to delete a container at all, neither does 
it's
pool of managed containers has a limit.
If it doesn't have a container for ActionA it will request one from the 
ContainerManager.
If it doesn't have one for ActionB it will request one from the 
ContainerManager.

There will be 2 containers in the system and assuming that the 
ContainerManager has enough
resources to keep those 2 containers alive, it will not delete them.

The controllers by design cannot cause the behavior you're describing. The 
architecture is
actually build around fixing this exact issue (eviction due to multiple 
heavy users in the
system).

>4. Is concurrency per container controlled by users in a per-action
>based
>way?
>Let me clarify my question about concurrency limit.
>
>If concurrency per container limit is more than 1, there could be
>multiple
>actions being invoked at some point.
>If the action requires high memory footprint such as 200MB or 150MB,
>it can
>crash if the sum of memory usage of concurrent actions exceeds the
>container memory.
>(In our case(here), some users are executing headless-chrome and
>puppeteer
>within actions, so it could happen under the similar situation.)
>
>So I initially thought concurrency per container is controlled by
>users in
>a per-action based way.
>If concurrency per container is only configured by OW operators
>statically,
>some users may not be able to invoke their actions correctly in the
>worst
>case though operators increased the memory of the biggest container
>type.
>
>And not only for this case, there could be some more reasons that
>some
>users just want to invoke their actions without per-container
>concurrency
>but the others want it for better throughput.
>
>So we may need some logic for users to take care of per-container
>concurrency for each actions.

Yes, the intention is to provide exactly what you're describing, maybe I 
worded it weirdly
in my last response.

This is not relevant for the architecture though.


>5. Better to wait for the completion rather than creating a new
>container.
>According to the workload, it would be better to wait for the
>previous
>execution rather than creating a new container because it takes upto
>500ms
>~ 1s.
>Even though the concurrency limit is more than 1, it still can happen
>if
>there is no logic to cumulate invocations and decide whether to
>create a
>new container or waiting for the existing container.

The proposed asynchronous protocol between controller and ContainerManager 
accomplishes this by design:

If a controller does not have the resources to execute the current 
request, it requests those resources.
The ContainerManager updates resources asynchronously.
The Controller will schedule the outstanding request as soon as it gets 
resources for it. It does not care
if those resources are  becoming free because another request finished or 
because it got a fresh container
from the ContainerManager. Requests will always be dispatched as soon as 
resources are free.

>6. HA of ContainerManager.
>Since it is mandatory to deploy the system without any downtime to
>use it
>for production, we need to support HA of ContainerManager.
>It means the state of ContainerManager should be replicated among
>replicas.
>(No matter which method we use between master/slave or clustering.)
>
>If ContainerManager knows about the status of each container, it
>would not
>be easy to support HA with its eventual consistent nature.
>If it does only know which containers are assigned to which
>controllers, it
>cannot handle the edge case as I mentioned above.

I agree, HA is mandatory. Since the ContainerManager operates only on the 
container creation/deletion path,
we can probably afford to persist its state into something like Redis. If 
it crashes, the slave instance
can take over immediately without any eventual-consistency concerns or 
downtime.

Also note that a downtime in the ContainerManager will ONLY cause an 
impact on the ability to create containers.
Workloads that already have containers created will continue to work just 
fine.


Does that answer/mitigate your concerns?

Cheers,
Markus

 
>To: dev@openwhisk.apache.org
>From: Dominic Kim <st...@gmail.com>
>Date: 07/23/2018 12:48PM
>Subject: Re: Proposal on a future architecture of OpenWhisk
>
>Dear Markus.
>
>I may not correctly understand the direction of new architecture.
>So let me describe my concerns in more details.
>
>Since that is a future architecture of OpenWhisk and requires many
>breaking
>changes, I think it should at least address all known issues.
>So I focused on figuring out whether it handles all issues which are
>reported in my proposal.
>(
>INVALID URI REMOVED
>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
>)
>
>1. Docker daemon performance issue.
>
>The most critical issue is poor performance of docker daemon.
>Since it is not inherently designed for high throughput or concurrent
>processing, Docker daemon shows poor performance in comparison with
>OW.
>In OW(serverless) world, action execution can be finished within 5ms
>~
>10ms, but the Docker daemon shows 100 ~ 500ms latency.
>Still, we can take advantage of Prewarm and Warmed containers, but
>under
>the situation where container creation/deletion/pausing/resuming
>happen
>frequently and the situation lasted for long-term, the requests are
>delayed
>and even the Docker daemon crashed.
>So I think it is important to reduce the loads(requests) against the
>Docker
>daemon.
>
>That's the reason why I initially thought that a Warmed state would
>be kept
>for more than today's behavior.
>Today, containers would stay in the Warmed state only for 50ms, so it
>introduces PAUSE/RESUME in case action comes with the interval of
>more than
>50 ms such as 1 sec.
>This will lead to more loads on Docker daemon.
>
>And if the state of containers is changing like today, the state in
>ContainerManager would be frequently changing as well.
>This may induce a synchronization issue among controllers and, among
>ContainerManagers(in case there would be more than one
>ContainerManager).
>
>So I think containers should be running for more than today's
>pauseGrace
>time.
>With more than 1 concurrency limit per container, it would also be
>better
>to keep containers running(not paused) for more than 50ms.
>
>2. Proxy case.
>
>In the edge case where a container only exists in controller1, how
>can
>controller2 decide to proxy the request to controller1 rather than
>just
>creating its own container?
>If it asks to ContainerManager, ContainerManager should know the
>state of
>the container in controller1.
>If the container in controller1 is already busy, it would be better
>to
>create a new container in controller2 rather than proxying the
>requests to
>controller1.
>
>If it goes this way, ContainerManager should know all the status of
>containers in all controllers to make a right decision and it's not
>easy to
>synchronize all the status of containers in controllers.
>If it does not work like this, how can controller2 proxy requests to
>controller1 without any information about controller1's status?
>
>3. Intervention among multiple actions
>
>If the concurrency limit is 1, and the container lifecycle is managed
>like
>today, intervention among multiple actions can happen again.
>For example, the maximum number of containers which can be created by
>a
>user is 2, and ActionA and ActionB invocation requests come
>alternatively,
>controllers will try to remove and recreate containers again and
>again.
>I used an example with a small number of max container limit for
>simplicity, but it can happen with a higher limit as well.
>
>And though concurrency limit is more than 1 such as 3, it also can
>happen
>if actions come more quickly than the execution time of actions.
>
>4. Is concurrency per container controlled by users in a per-action
>based
>way?
>Let me clarify my question about concurrency limit.
>
>If concurrency per container limit is more than 1, there could be
>multiple
>actions being invoked at some point.
>If the action requires high memory footprint such as 200MB or 150MB,
>it can
>crash if the sum of memory usage of concurrent actions exceeds the
>container memory.
>(In our case(here), some users are executing headless-chrome and
>puppeteer
>within actions, so it could happen under the similar situation.)
>
>So I initially thought concurrency per container is controlled by
>users in
>a per-action based way.
>If concurrency per container is only configured by OW operators
>statically,
>some users may not be able to invoke their actions correctly in the
>worst
>case though operators increased the memory of the biggest container
>type.
>
>And not only for this case, there could be some more reasons that
>some
>users just want to invoke their actions without per-container
>concurrency
>but the others want it for better throughput.
>
>So we may need some logic for users to take care of per-container
>concurrency for each actions.
>
>5. Better to wait for the completion rather than creating a new
>container.
>According to the workload, it would be better to wait for the
>previous
>execution rather than creating a new container because it takes upto
>500ms
>~ 1s.
>Even though the concurrency limit is more than 1, it still can happen
>if
>there is no logic to cumulate invocations and decide whether to
>create a
>new container or waiting for the existing container.
>
>
>6. HA of ContainerManager.
>Since it is mandatory to deploy the system without any downtime to
>use it
>for production, we need to support HA of ContainerManager.
>It means the state of ContainerManager should be replicated among
>replicas.
>(No matter which method we use between master/slave or clustering.)
>
>If ContainerManager knows about the status of each container, it
>would not
>be easy to support HA with its eventual consistent nature.
>If it does only know which containers are assigned to which
>controllers, it
>cannot handle the edge case as I mentioned above.
>
>
>
>Since many parts of the architecture are not addressed yet, I think
>it
>would be better to separate each parts and discuss further deeply.
>But in the big picture, I think we need to figure out whether it can
>handle
>or at least alleviate all known issues or not first.
>
>
>Best regards,
>Dominic
>
>
>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
>
>>
>>
>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
>12:24:07 PM:
>> >
>> > On Logging, I think if you are considering enabling concurrent
>> > activation processing, you will encounter that the only approach
>to
>> > parsing logs to be associated with a specific activationId, is to
>> > force the log output to be structured, and always include the
>> > activationId with every log message. This requires a change at
>the
>> > action container layer, but the simpler thing to do is to
>encourage
>> > action containers to provide a structured logging context that
>> > action developers can (and must) use to generate logs.
>>
>> Good point.  I agree that if there is concurrent activation
>processing in
>> the container, structured logging is the only sensible thing to do.
>>
>>
>> --dave
>>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Dominic,

let's see if I can clarify the specific points one by one.

>1. Docker daemon performance issue.
>
>...
>
>That's the reason why I initially thought that a Warmed state would
>be kept
>for more than today's behavior.
>Today, containers would stay in the Warmed state only for 50ms, so it
>introduces PAUSE/RESUME in case action comes with the interval of
>more than
>50 ms such as 1 sec.
>This will lead to more loads on Docker daemon.

You're right that the docker daemon's throughput is indeed an issue.

Please note that PAUSE/RESUME are not executed via the docker daemon in performance
tuned environment but rather done via runc, which does not have such a throughput
issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms for each
operation.

Further, the duration of the pauseGrace is not related to the overall architecture at
all. Rather, it's a so narrow to safe-guard against users stealing cycles from the
vendor's infrastructure. It's also a configurable value so you can tweak it as you
want.

The proposed architecture itself has no impact on the pauseGrace.

>
>And if the state of containers is changing like today, the state in
>ContainerManager would be frequently changing as well.
>This may induce a synchronization issue among controllers and, among
>ContainerManagers(in case there would be more than one
>ContainerManager).

The ContainerManager will NOT be informed about pause/unpause state changes and it
doesn't need to. I agree that such a behavior would generate serious load on the
ContainerManager, but I think it's unnecessary.

>2. Proxy case.
>
>...
>
>If it goes this way, ContainerManager should know all the status of
>containers in all controllers to make a right decision and it's not
>easy to
>synchronize all the status of containers in controllers.
>If it does not work like this, how can controller2 proxy requests to
>controller1 without any information about controller1's status?


The ContainerManager distributes a list of containers across all controllers.
If it does not have enough containers at hand to give one to each controller,
it instead tells controller2 to proxy to controller1, because the ContainerManager
knows at distribution-time, that controller1 has such a container.

No synchronization needed between controllers at all.

If controller1 gets more requests than the single container can handle, it will
request more containers, so eventually controller2 will get its own.

Please refer to https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E
for more information on that protocol.


>3. Intervention among multiple actions
>
>If the concurrency limit is 1, and the container lifecycle is managed
>like
>today, intervention among multiple actions can happen again.
>For example, the maximum number of containers which can be created by
>a
>user is 2, and ActionA and ActionB invocation requests come
>alternatively,
>controllers will try to remove and recreate containers again and
>again.
>I used an example with a small number of max container limit for
>simplicity, but it can happen with a higher limit as well.
>
>And though concurrency limit is more than 1 such as 3, it also can
>happen
>if actions come more quickly than the execution time of actions.

The controller will never try to delete a container at all, neither does it's
pool of managed containers has a limit.
If it doesn't have a container for ActionA it will request one from the ContainerManager.
If it doesn't have one for ActionB it will request one from the ContainerManager.

There will be 2 containers in the system and assuming that the ContainerManager has enough
resources to keep those 2 containers alive, it will not delete them.

The controllers by design cannot cause the behavior you're describing. The architecture is
actually build around fixing this exact issue (eviction due to multiple heavy users in the
system).

>4. Is concurrency per container controlled by users in a per-action
>based
>way?
>Let me clarify my question about concurrency limit.
>
>If concurrency per container limit is more than 1, there could be
>multiple
>actions being invoked at some point.
>If the action requires high memory footprint such as 200MB or 150MB,
>it can
>crash if the sum of memory usage of concurrent actions exceeds the
>container memory.
>(In our case(here), some users are executing headless-chrome and
>puppeteer
>within actions, so it could happen under the similar situation.)
>
>So I initially thought concurrency per container is controlled by
>users in
>a per-action based way.
>If concurrency per container is only configured by OW operators
>statically,
>some users may not be able to invoke their actions correctly in the
>worst
>case though operators increased the memory of the biggest container
>type.
>
>And not only for this case, there could be some more reasons that
>some
>users just want to invoke their actions without per-container
>concurrency
>but the others want it for better throughput.
>
>So we may need some logic for users to take care of per-container
>concurrency for each actions.

Yes, the intention is to provide exactly what you're describing, maybe I worded it weirdly
in my last response.

This is not relevant for the architecture though.


>5. Better to wait for the completion rather than creating a new
>container.
>According to the workload, it would be better to wait for the
>previous
>execution rather than creating a new container because it takes upto
>500ms
>~ 1s.
>Even though the concurrency limit is more than 1, it still can happen
>if
>there is no logic to cumulate invocations and decide whether to
>create a
>new container or waiting for the existing container.

The proposed asynchronous protocol between controller and ContainerManager accomplishes this by design:

If a controller does not have the resources to execute the current request, it requests those resources.
The ContainerManager updates resources asynchronously.
The Controller will schedule the outstanding request as soon as it gets resources for it. It does not care
if those resources are  becoming free because another request finished or because it got a fresh container
from the ContainerManager. Requests will always be dispatched as soon as resources are free.

>6. HA of ContainerManager.
>Since it is mandatory to deploy the system without any downtime to
>use it
>for production, we need to support HA of ContainerManager.
>It means the state of ContainerManager should be replicated among
>replicas.
>(No matter which method we use between master/slave or clustering.)
>
>If ContainerManager knows about the status of each container, it
>would not
>be easy to support HA with its eventual consistent nature.
>If it does only know which containers are assigned to which
>controllers, it
>cannot handle the edge case as I mentioned above.

I agree, HA is mandatory. Since the ContainerManager operates only on the container creation/deletion path,
we can probably afford to persist its state into something like Redis. If it crashes, the slave instance
can take over immediately without any eventual-consistency concerns or downtime.

Also note that a downtime in the ContainerManager will ONLY cause an impact on the ability to create containers.
Workloads that already have containers created will continue to work just fine.


Does that answer/mitigate your concerns?

Cheers,
Markus

   
>To: dev@openwhisk.apache.org
>From: Dominic Kim <st...@gmail.com>
>Date: 07/23/2018 12:48PM
>Subject: Re: Proposal on a future architecture of OpenWhisk
>
>Dear Markus.
>
>I may not correctly understand the direction of new architecture.
>So let me describe my concerns in more details.
>
>Since that is a future architecture of OpenWhisk and requires many
>breaking
>changes, I think it should at least address all known issues.
>So I focused on figuring out whether it handles all issues which are
>reported in my proposal.
>(
>INVALID URI REMOVED
>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
>)
>
>1. Docker daemon performance issue.
>
>The most critical issue is poor performance of docker daemon.
>Since it is not inherently designed for high throughput or concurrent
>processing, Docker daemon shows poor performance in comparison with
>OW.
>In OW(serverless) world, action execution can be finished within 5ms
>~
>10ms, but the Docker daemon shows 100 ~ 500ms latency.
>Still, we can take advantage of Prewarm and Warmed containers, but
>under
>the situation where container creation/deletion/pausing/resuming
>happen
>frequently and the situation lasted for long-term, the requests are
>delayed
>and even the Docker daemon crashed.
>So I think it is important to reduce the loads(requests) against the
>Docker
>daemon.
>
>That's the reason why I initially thought that a Warmed state would
>be kept
>for more than today's behavior.
>Today, containers would stay in the Warmed state only for 50ms, so it
>introduces PAUSE/RESUME in case action comes with the interval of
>more than
>50 ms such as 1 sec.
>This will lead to more loads on Docker daemon.
>
>And if the state of containers is changing like today, the state in
>ContainerManager would be frequently changing as well.
>This may induce a synchronization issue among controllers and, among
>ContainerManagers(in case there would be more than one
>ContainerManager).
>
>So I think containers should be running for more than today's
>pauseGrace
>time.
>With more than 1 concurrency limit per container, it would also be
>better
>to keep containers running(not paused) for more than 50ms.
>
>2. Proxy case.
>
>In the edge case where a container only exists in controller1, how
>can
>controller2 decide to proxy the request to controller1 rather than
>just
>creating its own container?
>If it asks to ContainerManager, ContainerManager should know the
>state of
>the container in controller1.
>If the container in controller1 is already busy, it would be better
>to
>create a new container in controller2 rather than proxying the
>requests to
>controller1.
>
>If it goes this way, ContainerManager should know all the status of
>containers in all controllers to make a right decision and it's not
>easy to
>synchronize all the status of containers in controllers.
>If it does not work like this, how can controller2 proxy requests to
>controller1 without any information about controller1's status?
>
>3. Intervention among multiple actions
>
>If the concurrency limit is 1, and the container lifecycle is managed
>like
>today, intervention among multiple actions can happen again.
>For example, the maximum number of containers which can be created by
>a
>user is 2, and ActionA and ActionB invocation requests come
>alternatively,
>controllers will try to remove and recreate containers again and
>again.
>I used an example with a small number of max container limit for
>simplicity, but it can happen with a higher limit as well.
>
>And though concurrency limit is more than 1 such as 3, it also can
>happen
>if actions come more quickly than the execution time of actions.
>
>4. Is concurrency per container controlled by users in a per-action
>based
>way?
>Let me clarify my question about concurrency limit.
>
>If concurrency per container limit is more than 1, there could be
>multiple
>actions being invoked at some point.
>If the action requires high memory footprint such as 200MB or 150MB,
>it can
>crash if the sum of memory usage of concurrent actions exceeds the
>container memory.
>(In our case(here), some users are executing headless-chrome and
>puppeteer
>within actions, so it could happen under the similar situation.)
>
>So I initially thought concurrency per container is controlled by
>users in
>a per-action based way.
>If concurrency per container is only configured by OW operators
>statically,
>some users may not be able to invoke their actions correctly in the
>worst
>case though operators increased the memory of the biggest container
>type.
>
>And not only for this case, there could be some more reasons that
>some
>users just want to invoke their actions without per-container
>concurrency
>but the others want it for better throughput.
>
>So we may need some logic for users to take care of per-container
>concurrency for each actions.
>
>5. Better to wait for the completion rather than creating a new
>container.
>According to the workload, it would be better to wait for the
>previous
>execution rather than creating a new container because it takes upto
>500ms
>~ 1s.
>Even though the concurrency limit is more than 1, it still can happen
>if
>there is no logic to cumulate invocations and decide whether to
>create a
>new container or waiting for the existing container.
>
>
>6. HA of ContainerManager.
>Since it is mandatory to deploy the system without any downtime to
>use it
>for production, we need to support HA of ContainerManager.
>It means the state of ContainerManager should be replicated among
>replicas.
>(No matter which method we use between master/slave or clustering.)
>
>If ContainerManager knows about the status of each container, it
>would not
>be easy to support HA with its eventual consistent nature.
>If it does only know which containers are assigned to which
>controllers, it
>cannot handle the edge case as I mentioned above.
>
>
>
>Since many parts of the architecture are not addressed yet, I think
>it
>would be better to separate each parts and discuss further deeply.
>But in the big picture, I think we need to figure out whether it can
>handle
>or at least alleviate all known issues or not first.
>
>
>Best regards,
>Dominic
>
>
>2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:
>
>>
>>
>> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018
>12:24:07 PM:
>> >
>> > On Logging, I think if you are considering enabling concurrent
>> > activation processing, you will encounter that the only approach
>to
>> > parsing logs to be associated with a specific activationId, is to
>> > force the log output to be structured, and always include the
>> > activationId with every log message. This requires a change at
>the
>> > action container layer, but the simpler thing to do is to
>encourage
>> > action containers to provide a structured logging context that
>> > action developers can (and must) use to generate logs.
>>
>> Good point.  I agree that if there is concurrent activation
>processing in
>> the container, structured logging is the only sensible thing to do.
>>
>>
>> --dave
>>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Dominic Kim <st...@gmail.com>.

Dear Markus.

I may not correctly understand the direction of new architecture.
So let me describe my concerns in more details.

Since that is a future architecture of OpenWhisk and requires many breaking
changes, I think it should at least address all known issues.
So I focused on figuring out whether it handles all issues which are
reported in my proposal.
(
https://cwiki.apache.org/confluence/display/OPENWHISK/Autonomous+Container+Scheduling
)

1. Docker daemon performance issue.

The most critical issue is poor performance of docker daemon.
Since it is not inherently designed for high throughput or concurrent
processing, Docker daemon shows poor performance in comparison with OW.
In OW(serverless) world, action execution can be finished within 5ms ~
10ms, but the Docker daemon shows 100 ~ 500ms latency.
Still, we can take advantage of Prewarm and Warmed containers, but under
the situation where container creation/deletion/pausing/resuming happen
frequently and the situation lasted for long-term, the requests are delayed
and even the Docker daemon crashed.
So I think it is important to reduce the loads(requests) against the Docker
daemon.

That's the reason why I initially thought that a Warmed state would be kept
for more than today's behavior.
Today, containers would stay in the Warmed state only for 50ms, so it
introduces PAUSE/RESUME in case action comes with the interval of more than
50 ms such as 1 sec.
This will lead to more loads on Docker daemon.

And if the state of containers is changing like today, the state in
ContainerManager would be frequently changing as well.
This may induce a synchronization issue among controllers and, among
ContainerManagers(in case there would be more than one ContainerManager).

So I think containers should be running for more than today's pauseGrace
time.
With more than 1 concurrency limit per container, it would also be better
to keep containers running(not paused) for more than 50ms.

2. Proxy case.

In the edge case where a container only exists in controller1, how can
controller2 decide to proxy the request to controller1 rather than just
creating its own container?
If it asks to ContainerManager, ContainerManager should know the state of
the container in controller1.
If the container in controller1 is already busy, it would be better to
create a new container in controller2 rather than proxying the requests to
controller1.

If it goes this way, ContainerManager should know all the status of
containers in all controllers to make a right decision and it's not easy to
synchronize all the status of containers in controllers.
If it does not work like this, how can controller2 proxy requests to
controller1 without any information about controller1's status?

3. Intervention among multiple actions

If the concurrency limit is 1, and the container lifecycle is managed like
today, intervention among multiple actions can happen again.
For example, the maximum number of containers which can be created by a
user is 2, and ActionA and ActionB invocation requests come alternatively,
controllers will try to remove and recreate containers again and again.
I used an example with a small number of max container limit for
simplicity, but it can happen with a higher limit as well.

And though concurrency limit is more than 1 such as 3, it also can happen
if actions come more quickly than the execution time of actions.

4. Is concurrency per container controlled by users in a per-action based
way?
Let me clarify my question about concurrency limit.

If concurrency per container limit is more than 1, there could be multiple
actions being invoked at some point.
If the action requires high memory footprint such as 200MB or 150MB, it can
crash if the sum of memory usage of concurrent actions exceeds the
container memory.
(In our case(here), some users are executing headless-chrome and puppeteer
within actions, so it could happen under the similar situation.)

So I initially thought concurrency per container is controlled by users in
a per-action based way.
If concurrency per container is only configured by OW operators statically,
some users may not be able to invoke their actions correctly in the worst
case though operators increased the memory of the biggest container type.

And not only for this case, there could be some more reasons that some
users just want to invoke their actions without per-container concurrency
but the others want it for better throughput.

So we may need some logic for users to take care of per-container
concurrency for each actions.

5. Better to wait for the completion rather than creating a new container.
According to the workload, it would be better to wait for the previous
execution rather than creating a new container because it takes upto 500ms
~ 1s.
Even though the concurrency limit is more than 1, it still can happen if
there is no logic to cumulate invocations and decide whether to create a
new container or waiting for the existing container.


6. HA of ContainerManager.
Since it is mandatory to deploy the system without any downtime to use it
for production, we need to support HA of ContainerManager.
It means the state of ContainerManager should be replicated among replicas.
(No matter which method we use between master/slave or clustering.)

If ContainerManager knows about the status of each container, it would not
be easy to support HA with its eventual consistent nature.
If it does only know which containers are assigned to which controllers, it
cannot handle the edge case as I mentioned above.



Since many parts of the architecture are not addressed yet, I think it
would be better to separate each parts and discuss further deeply.
But in the big picture, I think we need to figure out whether it can handle
or at least alleviate all known issues or not first.


Best regards,
Dominic


2018-07-21 1:36 GMT+09:00 David P Grove <gr...@us.ibm.com>:

>
>
> Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018 12:24:07 PM:
> >
> > On Logging, I think if you are considering enabling concurrent
> > activation processing, you will encounter that the only approach to
> > parsing logs to be associated with a specific activationId, is to
> > force the log output to be structured, and always include the
> > activationId with every log message. This requires a change at the
> > action container layer, but the simpler thing to do is to encourage
> > action containers to provide a structured logging context that
> > action developers can (and must) use to generate logs.
>
> Good point.  I agree that if there is concurrent activation processing in
> the container, structured logging is the only sensible thing to do.
>
>
> --dave
>

Re: Proposal on a future architecture of OpenWhisk

Posted by David P Grove <gr...@us.ibm.com>.


Tyson Norris <tn...@adobe.com.INVALID> wrote on 07/20/2018 12:24:07 PM:
>
> On Logging, I think if you are considering enabling concurrent
> activation processing, you will encounter that the only approach to
> parsing logs to be associated with a specific activationId, is to
> force the log output to be structured, and always include the
> activationId with every log message. This requires a change at the
> action container layer, but the simpler thing to do is to encourage
> action containers to provide a structured logging context that
> action developers can (and must) use to generate logs.

Good point.  I agree that if there is concurrent activation processing in
the container, structured logging is the only sensible thing to do.


--dave

Re: Proposal on a future architecture of OpenWhisk

Posted by Tyson Norris <tn...@adobe.com.INVALID>.

On Logging, I think if you are considering enabling concurrent activation processing, you will encounter that the only approach to parsing logs to be associated with a specific activationId, is to force the log output to be structured, and always include the activationId with every log message. This requires a change at the action container layer, but the simpler thing to do is to encourage action containers to provide a structured logging context that action developers can (and must) use to generate logs. 

An example is nodejs container - for the time being, we are hijacking the stdout/stderr and injecting the activationId when any developer code writes to stdout/stderr (as console.log/console.error). This may not work as simply in all action containers, and isn’t great even in nodejs. 

I would rather encourage action containers to provide a logging context, where action devs use: log.info, log.debug, etc, and this logging context does the needful to assert some structure to the log format. In general, many (most?) languages have conventions (slf4xyz, et al) for this already, and while you lose “random writes to stdout”, I haven’t seen this be an actual problem. 

If you don’t deal with interleaved logs (typically because activations don’t run concurrently), this this is less of an issue, but regardless, writing log parsers is a solved problem that would still be good to offload to external (not in OW controller/invoker) systems (logstash, fluentd, splunk, etc). This obviously comes with a caveat that logs parsing will be delayed, but that is OK from my point of view, partly because most logs will never be viewed, and partly because the log ingest systems are mostly fast enough already to limit this delay to seconds or milliseconds.  

Thanks
Tyson
> On Jul 20, 2018, at 8:46 AM, David P Grove <gr...@us.ibm.com> wrote:
> 
> 
> Rethinking the architecture to more fully exploit the capabilities of the
> underlying container orchestration platforms is pretty exciting.  I think
> there are lots of interesting ideas to explore about how best to schedule
> the workload.
> 
> As brought out in the architecture proposal [1], although it is logically
> an orthogonal issue, improving the log processing for user containers is a
> key piece of this roadmap.  The initial experiences with the
> KubernetesContainerFactory indicate that post-facto log enrichment to add
> the activation id to each log line is a serious bottleneck.  It adds
> complexity to the system and measurably reduces system performance by
> delaying the re-use of action containers until the logs can be extracted
> and processing.
> 
> I believe what we really want is to be using an openwhisk-aware log driver
> that will dynamically inject the current activation id into every log line
> as soon as it is written.  Then the user container logs, already properly
> enriched when they are generated, can be fed directly into the platform
> logging system with no post-processing needed.
> 
> If the low-level container runtime is docker 17.09 or better, I think we
> could probably achieve this by writing a logging driver plugin [2] that
> extends docker's json logging driver.  For non-blackbox containers, I think
> we "just" need the /run method to update a shared location that is
> accessible to the logging driver plugin with the current activation id
> before it invokes the user code.  As log lines are produced, that location
> is read and the string with the activation id gets injected into the json
> formatted log line as it is produced.   For blackbox containers, we could
> have our dockerskeleton do the same thing, but the user would have to opt
> in somehow to the protocol if they were using their own action runner.
> Warning:  I haven't looked into how flushing works with these drivers, so
> I'm not sure that this really works....we need to make sure we don't enrich
> a log line with the wrong activation id because of delayed flushing.
> 
> If we're running on Kubernetes, we might decide that instead of using a
> logging driver plugin, to use a streaming sidecar container as shown in [3]
> and have the controller interact with the sidecar to update the current
> activation id (or have the sidecar read it from a shared memory location
> that is updated by /run to minimize the differences between deployment
> platforms).  I'm not sure this really works as well, since the sidecar
> might fall behind in processing the logs, so we might still need a
> handshake somewhere.
> 
> A third option would be to extend our current sentineled log design by also
> writing a "START_WHISK_ACTIVATION_LOG <ACTIVATION_ID>" line in the /run
> method before invoking the user code.  We'd still have to post-process the
> log files, but it could be decoupled from the critical path since the
> post-processor would have the activation id available to it in the log
> files (and thus would not need to handshake with the controller at all,
> thus we could offload all logging to a node-level log processing/forwarding
> agent).
> 
> Option 3 would be really easy to implement and is independent of the
> details of the low-level log driver, but doesn't eliminate the need to
> post-process the logs. It just makes it easier to move that processing off
> any critical path.
> 
> Thoughts?
> 
> --dave
> 
> [1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2FOpenWhisk%2Bfuture&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=3CAF0psQZ%2BTQCOuhSycTUaTYfcTgh3eIPlw1IDYvLWU%3D&amp;reserved=0
> +architecture
> [2] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.docker.com%2Fv17.09%2Fengine%2Fadmin%2Flogging%2Fplugins%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=%2FZjzrOh65Bak8iDMvk8FgftPDjwdJCEKvYzTPOy0o3U%3D&amp;reserved=0
> [3] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fconcepts%2Fcluster-administration%2Flogging%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=DaYQhz70aF0TVgGpSqzhOZZQC8XNT0znAMMDbUkHlSo%3D&amp;reserved=0

Re: Proposal on a future architecture of OpenWhisk

Posted by David P Grove <gr...@us.ibm.com>.

Rethinking the architecture to more fully exploit the capabilities of the
underlying container orchestration platforms is pretty exciting.  I think
there are lots of interesting ideas to explore about how best to schedule
the workload.

As brought out in the architecture proposal [1], although it is logically
an orthogonal issue, improving the log processing for user containers is a
key piece of this roadmap.  The initial experiences with the
KubernetesContainerFactory indicate that post-facto log enrichment to add
the activation id to each log line is a serious bottleneck.  It adds
complexity to the system and measurably reduces system performance by
delaying the re-use of action containers until the logs can be extracted
and processing.

I believe what we really want is to be using an openwhisk-aware log driver
that will dynamically inject the current activation id into every log line
as soon as it is written.  Then the user container logs, already properly
enriched when they are generated, can be fed directly into the platform
logging system with no post-processing needed.

If the low-level container runtime is docker 17.09 or better, I think we
could probably achieve this by writing a logging driver plugin [2] that
extends docker's json logging driver.  For non-blackbox containers, I think
we "just" need the /run method to update a shared location that is
accessible to the logging driver plugin with the current activation id
before it invokes the user code.  As log lines are produced, that location
is read and the string with the activation id gets injected into the json
formatted log line as it is produced.   For blackbox containers, we could
have our dockerskeleton do the same thing, but the user would have to opt
in somehow to the protocol if they were using their own action runner.
Warning:  I haven't looked into how flushing works with these drivers, so
I'm not sure that this really works....we need to make sure we don't enrich
a log line with the wrong activation id because of delayed flushing.

If we're running on Kubernetes, we might decide that instead of using a
logging driver plugin, to use a streaming sidecar container as shown in [3]
and have the controller interact with the sidecar to update the current
activation id (or have the sidecar read it from a shared memory location
that is updated by /run to minimize the differences between deployment
platforms).  I'm not sure this really works as well, since the sidecar
might fall behind in processing the logs, so we might still need a
handshake somewhere.

A third option would be to extend our current sentineled log design by also
writing a "START_WHISK_ACTIVATION_LOG <ACTIVATION_ID>" line in the /run
method before invoking the user code.  We'd still have to post-process the
log files, but it could be decoupled from the critical path since the
post-processor would have the activation id available to it in the log
files (and thus would not need to handshake with the controller at all,
thus we could offload all logging to a node-level log processing/forwarding
agent).

Option 3 would be really easy to implement and is independent of the
details of the low-level log driver, but doesn't eliminate the need to
post-process the logs. It just makes it easier to move that processing off
any critical path.

Thoughts?

--dave

[1] https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future
+architecture
[2] https://docs.docker.com/v17.09/engine/admin/logging/plugins/
[3] https://kubernetes.io/docs/concepts/cluster-administration/logging/

Re: Proposal on a future architecture of OpenWhisk

Posted by Rodric Rabbah <ro...@gmail.com>.

Hi Mark

This is precisely captured by the serverless contract article I published recently:

https://medium.com/openwhisk/the-serverless-contract-44329fab10fb

Queue, reject, or add capacity as three potential resolutions under load. 

-r

> On Jul 18, 2018, at 8:16 AM, Martin Gencur <mg...@redhat.com> wrote:
> 
> Hi Markus,
> thinking about scalability and the edge case. When there are not enough containers and new controllers are being created, and all of them redirect traffic to the controllers with containers, doesn't it mean overloading the available containers a lot? I'm curious how we throttle the traffic in this case.
> 
> I guess the other approach would be to block creating new controllers when there are no containers available as long as we don't want to overload the existing containers. And keep the overflowing workload in Kafka as well.
> 
> Thanks,
> Martin Gencur
> QE, Red Hat

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Chetan,

>As activations may be bulky (1 MB max) it may not be possible to keep
>them in memory even if there is small incoming rate depending on how
>fast they get consumed. Currently usage of Kafka takes the pressure
>of
>Controller and helps in keeping them stable. So I suspect we may need
>to make use of buffer more often to keep pressure off Controllers,
>specially for heterogeneous loads and when system is making full use
>of cluster capacity.

Note that we can also excert client side buffering in this case. If we enable end-to-end streaming of parameters/results (which is very much facilitated with my proposal, though not impossible even in the current architecture), you can refuse to consume the HTTP requests entity until you are able to pass it downstream to a container. That way, nothing (or close-to-nothing) is buffered in memory of the controller. In an overload scenario, where the wait time for resources is expected to be high, this can be altered to buffer into something like an overflow queue. In steady state, the controller should not buffer more than necessary.

To your proposal: If I understand correctly, what you are laying out is heavily reliant on detaching the Invoker from the hosts where containers run (since you assume a ContainerPool to handle a lot more containers than today. This assumption moves the general architecture quite close to what I am proposing. Essentially, you propose a load-balancing algorithm in front of the controllers (in my picture), since these own their respective Containerpool (so to say).

In essence, this would then be a solution for the concerns Dominic mentioned with load imbalance in between controllers. This is exactly what we've been discussing earlier (where I proposed pubsub vs. Kafka) etc. and I believe both solutions do go in a similar direction.

Sorry if I simplified this too much and let me know if I'm overlooking an aspect here.

Cheers,
Markus
 
   
 
Mit freundlichen Grüßen / Kind regards  
   
 
Markus Thoemmes    
Software Engineer - IBM Bluemix  
E904   	 		 			  			 		 		 			   		 		 			 Phone: 			 +49-172-2684500 			  IBM Deutschland GmbH 			  		 		 			 Email: 			 markus.thoemmes@de.ibm.com 			  Am Fichtenberg 1 		 		 			   			   			  71083 Herrenberg 		 		 			  			 		 		 			 IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
 			Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
 			Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 		 	         

-----Chetan Mehrotra <ch...@gmail.com> wrote: -----

>To: dev@openwhisk.apache.org
>From: Chetan Mehrotra <ch...@gmail.com>
>Date: 07/20/2018 02:13PM
>Subject: Re: Proposal on a future architecture of OpenWhisk
>
>Hi Markus,
>
>Some of the points below are more of hunch and thinking out loud and
>hence may not be fully objective or valid !.
>
>> The buffer I described above is a per action "invoke me once
>resources are available" buffer
>
>> 1. Do we need to persist an in-memory queue that waits for
>resources to be created by the ContainerManager?
>> 2. Do we need a shared queue between the Controllers to enable
>work-stealing in cases where multiple Controllers wait for resources?
>
>As activations may be bulky (1 MB max) it may not be possible to keep
>them in memory even if there is small incoming rate depending on how
>fast they get consumed. Currently usage of Kafka takes the pressure
>of
>Controller and helps in keeping them stable. So I suspect we may need
>to make use of buffer more often to keep pressure off Controllers,
>specially for heterogeneous loads and when system is making full use
>of cluster capacity.
>
>> That could potentially open up the possibility to use a technology
>more geared towards Pub/Sub, where subscribers per action are more
>cheap to implement than on Kafka?
>
>That would certainly be one option to consider. System like RabbitMQ
>can support lots of queues. They may pose a problem on how to
>efficiently consumer resources from all such queues.
>
>Another option I was thinking was to have a design similar to current
>involving controller and invoker but having a single queue shared
>between controller and invoker and using action as the partition key
>and multiple invokers forming a consumer group. This enables using
>Kafka inbuilt support for horizontal scaling by adding new invoker
>and
>having Kafka assign partitions to it.
>
>Such a design would have following key aspects
>
>A - Supporting Queue Per Action (sort of!)
>--------------------------------------------
>One way to do that is have topic per action which does not work well
>with Kafka. Instead of that we can have each Invoker maintain a
>dedicated ContainerPool per action which independently reads message
>from the partition hosting its action. While doing that it would
>filter out activations for other actions which happen to land on same
>partition. An Invoker would spin of this independent container pool
>after observing the rate of activation per action crosses certain
>threshold. We can leverage this per action pool for better
>autoscaling
>where it request more containers from the container orchestrator
>(k8s,
>mesos) based on rate of activations and lag. Here system can learn
>itself from usage pattern on how to allocate resources.
>
>This may lead to multiple streaming/replay of same message from Kafka
>but given the high rate with which messages can be fetched from
>single
>partition it may work out fine. This would also require some custom
>offset management where primary offset belongs to consumer bound to
>Invoker but also records the offset per action
>
>B - Hot Partition Handling
>----------------------------------
>
>A known drawback of using non random partitioning is hot partitions
>whereby some Invoker may not get much work but some invoker may get
>lots of work. This would managed by 2 things
>
>1. High Container Density per Invoker - If we can reduce the Invoker
>state it should be possible to have a single Invoker handle lot more
>container concurrently. So even a hot Invoker would still be able to
>make use of container distributed across multiple hosts and thus make
>optimal use of cluster resources
>
>2. Reassign Activations - If a action specific container pool sees
>that its not able to keep up with rate of incoming activations of the
>action it can then pull it out and add it back to same queue but then
>assign to a less loaded invoker's partition directly.
>
>Thoughts above digress from current proposal being discussed and may
>not be presented in a very coherent way. However want to dump them
>down to see if they make any sense or not. If there is some potential
>then I can try to draft a more detailed proposal on wiki. Key motive
>here is to get a design where we have independent container pool
>pulling activations independently and auto scaling themselves based
>on
>allowed limits and get us closer to a consumer per action topic kind
>of design in a dynamic way!
>
>Chetan Mehrotra
>
>On Thu, Jul 19, 2018 at 6:06 PM Markus Thoemmes
><ma...@de.ibm.com> wrote:
>>
>> Hi Chetan,
>>
>> >Currently one aspect which is not clear is does Controller has
>access
>> >to
>> >
>> >1. Pool of prewarm containers - Container of base image where
>/init
>> >is
>> >yet not done. So these containers can then be initialized within
>> >Controller
>> >2. OR Pool of warm container bound to specific user+action. These
>> >containers would possibly have been initialized by
>ContainerManager
>> >and then it allocates them to controller.
>>
>> The latter case is what I had in mind. The controller only knows
>containers that are already ready to call /run on.
>>
>> Pre-Warm containers are an implementation detail to the Controller.
>The ContainerManager can keep them around to be able to answer demand
>for specific resources more quickly, but the Controller doesn't care.
>It only knows warm containers.
>>
>> >Can you elaborate this bit more i.e. how scale up logic would work
>> >and
>> >is asynchronous?
>> >
>> >I think above aspect (type of pool) would have bearing on scale up
>> >logic. If an action was not in use so far then when first request
>> >comes (i.e. 0-1 scale up case) would Controller ask
>ContainerManager
>> >for specific action container and then wait for its setup and then
>> >execute it. OR if it has a generic pool then it takes one and
>> >initializes it and use it. And if its not done synchronously then
>> >would such an action be put to overflow queue.
>>
>> In this specific example, the Controller will request a container
>from the ContainerManager and buffer the request until it finally has
>capacity to execute it. All subsequent requests will be put on the
>same buffer and a Container will be requested for each of them.
>>
>> Whether we put this buffer in an overflow queue (aka persist it)
>remains to be decided. If we keep it in memory, we have roughly the
>same guarantees as today. As Rodric mentioned though, we can improve
>certain failure scenarios (like waiting for a container in this case)
>by making this buffer more persistent. I'm not mentioning Kafka here
>for a reason, because in this case any persistent buffer is just
>fine.
>>
>> Also note that this is not necessarily the case of the overflow
>queue. The overflow queue is used for arbitrary requests once the
>ContainerManager cannot create more resources and thus requests need
>to wait.
>>
>> The buffer I described above is a per action "invoke me once
>resources are available" buffer, that could potentially be designed
>to be per Controller to not have the challenge of scaling it out.
>That of course has its downsides in itself, for instance: A buffer
>that spans all controllers would enable work-stealing between
>controllers with missing capacity and could mitigate some of
>load-imbalances that Dominic mentioned. We are entering then the same
>area that his proposal enters: The need of a queue per action.
>>
>> Conclusion is, we have 2 perspectives to look at this:
>>
>> 1. Do we need to persist an in-memory queue that waits for
>resources to be created by the ContainerManager?
>> 2. Do we need a shared queue between the Controllers to enable
>work-stealing in cases where multiple Controllers wait for resources?
>>
>> An important thing to note here: Since all of this is no longer
>happening on the critical path (stuff gets put on the queue only if
>it needs to wait for resources anyway), we can afford a solution that
>isn't as perfomant as Kafka might be. That could potentially open up
>the possibility to use a technology more geared towards Pub/Sub,
>where subscribers per action are more cheap to implement than on
>Kafka?
>>
>> Does that make sense? Hope that helps :). Thanks for the questions!
>>
>> Cheers,
>> Markus
>>
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Chetan Mehrotra <ch...@gmail.com>.

Hi Markus,

Some of the points below are more of hunch and thinking out loud and
hence may not be fully objective or valid !.

> The buffer I described above is a per action "invoke me once resources are available" buffer

> 1. Do we need to persist an in-memory queue that waits for resources to be created by the ContainerManager?
> 2. Do we need a shared queue between the Controllers to enable work-stealing in cases where multiple Controllers wait for resources?

As activations may be bulky (1 MB max) it may not be possible to keep
them in memory even if there is small incoming rate depending on how
fast they get consumed. Currently usage of Kafka takes the pressure of
Controller and helps in keeping them stable. So I suspect we may need
to make use of buffer more often to keep pressure off Controllers,
specially for heterogeneous loads and when system is making full use
of cluster capacity.

> That could potentially open up the possibility to use a technology more geared towards Pub/Sub, where subscribers per action are more cheap to implement than on Kafka?

That would certainly be one option to consider. System like RabbitMQ
can support lots of queues. They may pose a problem on how to
efficiently consumer resources from all such queues.

Another option I was thinking was to have a design similar to current
involving controller and invoker but having a single queue shared
between controller and invoker and using action as the partition key
and multiple invokers forming a consumer group. This enables using
Kafka inbuilt support for horizontal scaling by adding new invoker and
having Kafka assign partitions to it.

Such a design would have following key aspects

A - Supporting Queue Per Action (sort of!)
--------------------------------------------
One way to do that is have topic per action which does not work well
with Kafka. Instead of that we can have each Invoker maintain a
dedicated ContainerPool per action which independently reads message
from the partition hosting its action. While doing that it would
filter out activations for other actions which happen to land on same
partition. An Invoker would spin of this independent container pool
after observing the rate of activation per action crosses certain
threshold. We can leverage this per action pool for better autoscaling
where it request more containers from the container orchestrator (k8s,
mesos) based on rate of activations and lag. Here system can learn
itself from usage pattern on how to allocate resources.

This may lead to multiple streaming/replay of same message from Kafka
but given the high rate with which messages can be fetched from single
partition it may work out fine. This would also require some custom
offset management where primary offset belongs to consumer bound to
Invoker but also records the offset per action

B - Hot Partition Handling
----------------------------------

A known drawback of using non random partitioning is hot partitions
whereby some Invoker may not get much work but some invoker may get
lots of work. This would managed by 2 things

1. High Container Density per Invoker - If we can reduce the Invoker
state it should be possible to have a single Invoker handle lot more
container concurrently. So even a hot Invoker would still be able to
make use of container distributed across multiple hosts and thus make
optimal use of cluster resources

2. Reassign Activations - If a action specific container pool sees
that its not able to keep up with rate of incoming activations of the
action it can then pull it out and add it back to same queue but then
assign to a less loaded invoker's partition directly.

Thoughts above digress from current proposal being discussed and may
not be presented in a very coherent way. However want to dump them
down to see if they make any sense or not. If there is some potential
then I can try to draft a more detailed proposal on wiki. Key motive
here is to get a design where we have independent container pool
pulling activations independently and auto scaling themselves based on
allowed limits and get us closer to a consumer per action topic kind
of design in a dynamic way!

Chetan Mehrotra

On Thu, Jul 19, 2018 at 6:06 PM Markus Thoemmes
<ma...@de.ibm.com> wrote:
>
> Hi Chetan,
>
> >Currently one aspect which is not clear is does Controller has access
> >to
> >
> >1. Pool of prewarm containers - Container of base image where /init
> >is
> >yet not done. So these containers can then be initialized within
> >Controller
> >2. OR Pool of warm container bound to specific user+action. These
> >containers would possibly have been initialized by ContainerManager
> >and then it allocates them to controller.
>
> The latter case is what I had in mind. The controller only knows containers that are already ready to call /run on.
>
> Pre-Warm containers are an implementation detail to the Controller. The ContainerManager can keep them around to be able to answer demand for specific resources more quickly, but the Controller doesn't care. It only knows warm containers.
>
> >Can you elaborate this bit more i.e. how scale up logic would work
> >and
> >is asynchronous?
> >
> >I think above aspect (type of pool) would have bearing on scale up
> >logic. If an action was not in use so far then when first request
> >comes (i.e. 0-1 scale up case) would Controller ask ContainerManager
> >for specific action container and then wait for its setup and then
> >execute it. OR if it has a generic pool then it takes one and
> >initializes it and use it. And if its not done synchronously then
> >would such an action be put to overflow queue.
>
> In this specific example, the Controller will request a container from the ContainerManager and buffer the request until it finally has capacity to execute it. All subsequent requests will be put on the same buffer and a Container will be requested for each of them.
>
> Whether we put this buffer in an overflow queue (aka persist it) remains to be decided. If we keep it in memory, we have roughly the same guarantees as today. As Rodric mentioned though, we can improve certain failure scenarios (like waiting for a container in this case) by making this buffer more persistent. I'm not mentioning Kafka here for a reason, because in this case any persistent buffer is just fine.
>
> Also note that this is not necessarily the case of the overflow queue. The overflow queue is used for arbitrary requests once the ContainerManager cannot create more resources and thus requests need to wait.
>
> The buffer I described above is a per action "invoke me once resources are available" buffer, that could potentially be designed to be per Controller to not have the challenge of scaling it out. That of course has its downsides in itself, for instance: A buffer that spans all controllers would enable work-stealing between controllers with missing capacity and could mitigate some of load-imbalances that Dominic mentioned. We are entering then the same area that his proposal enters: The need of a queue per action.
>
> Conclusion is, we have 2 perspectives to look at this:
>
> 1. Do we need to persist an in-memory queue that waits for resources to be created by the ContainerManager?
> 2. Do we need a shared queue between the Controllers to enable work-stealing in cases where multiple Controllers wait for resources?
>
> An important thing to note here: Since all of this is no longer happening on the critical path (stuff gets put on the queue only if it needs to wait for resources anyway), we can afford a solution that isn't as perfomant as Kafka might be. That could potentially open up the possibility to use a technology more geared towards Pub/Sub, where subscribers per action are more cheap to implement than on Kafka?
>
> Does that make sense? Hope that helps :). Thanks for the questions!
>
> Cheers,
> Markus
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Chetan,

>Currently one aspect which is not clear is does Controller has access
>to
>
>1. Pool of prewarm containers - Container of base image where /init
>is
>yet not done. So these containers can then be initialized within
>Controller
>2. OR Pool of warm container bound to specific user+action. These
>containers would possibly have been initialized by ContainerManager
>and then it allocates them to controller.

The latter case is what I had in mind. The controller only knows containers that are already ready to call /run on.

Pre-Warm containers are an implementation detail to the Controller. The ContainerManager can keep them around to be able to answer demand for specific resources more quickly, but the Controller doesn't care. It only knows warm containers.

>Can you elaborate this bit more i.e. how scale up logic would work
>and
>is asynchronous?
>
>I think above aspect (type of pool) would have bearing on scale up
>logic. If an action was not in use so far then when first request
>comes (i.e. 0-1 scale up case) would Controller ask ContainerManager
>for specific action container and then wait for its setup and then
>execute it. OR if it has a generic pool then it takes one and
>initializes it and use it. And if its not done synchronously then
>would such an action be put to overflow queue.

In this specific example, the Controller will request a container from the ContainerManager and buffer the request until it finally has capacity to execute it. All subsequent requests will be put on the same buffer and a Container will be requested for each of them. 

Whether we put this buffer in an overflow queue (aka persist it) remains to be decided. If we keep it in memory, we have roughly the same guarantees as today. As Rodric mentioned though, we can improve certain failure scenarios (like waiting for a container in this case) by making this buffer more persistent. I'm not mentioning Kafka here for a reason, because in this case any persistent buffer is just fine.

Also note that this is not necessarily the case of the overflow queue. The overflow queue is used for arbitrary requests once the ContainerManager cannot create more resources and thus requests need to wait.

The buffer I described above is a per action "invoke me once resources are available" buffer, that could potentially be designed to be per Controller to not have the challenge of scaling it out. That of course has its downsides in itself, for instance: A buffer that spans all controllers would enable work-stealing between controllers with missing capacity and could mitigate some of load-imbalances that Dominic mentioned. We are entering then the same area that his proposal enters: The need of a queue per action.

Conclusion is, we have 2 perspectives to look at this:

1. Do we need to persist an in-memory queue that waits for resources to be created by the ContainerManager?
2. Do we need a shared queue between the Controllers to enable work-stealing in cases where multiple Controllers wait for resources?
 
An important thing to note here: Since all of this is no longer happening on the critical path (stuff gets put on the queue only if it needs to wait for resources anyway), we can afford a solution that isn't as perfomant as Kafka might be. That could potentially open up the possibility to use a technology more geared towards Pub/Sub, where subscribers per action are more cheap to implement than on Kafka?

Does that make sense? Hope that helps :). Thanks for the questions!

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by Chetan Mehrotra <ch...@gmail.com>.

Hi Markus,

Currently one aspect which is not clear is does Controller has access to

1. Pool of prewarm containers - Container of base image where /init is
yet not done. So these containers can then be initialized within
Controller
2. OR Pool of warm container bound to specific user+action. These
containers would possibly have been initialized by ContainerManager
and then it allocates them to controller.

> The scaleup model stays exactly the same as today! If you have 200 simultaneous invocations (assuming a per-container concurrency limit of 1) we will create 200 containers to handle that load (given the requests are truly simultaneous --> arrive at the same time). Containers are NOT created in a synchronous way and there's no need to sequentialize their creation. Does something in the proposal hint to that? If so, we should fix that immediately.

Can you elaborate this bit more i.e. how scale up logic would work and
is asynchronous?

I think above aspect (type of pool) would have bearing on scale up
logic. If an action was not in use so far then when first request
comes (i.e. 0-1 scale up case) would Controller ask ContainerManager
for specific action container and then wait for its setup and then
execute it. OR if it has a generic pool then it takes one and
initializes it and use it. And if its not done synchronously then
would such an action be put to overflow queue.

Chetan Mehrotra

On Thu, Jul 19, 2018 at 2:39 PM Markus Thoemmes
<ma...@de.ibm.com> wrote:
>
> Hi Dominic,
>
> >Ah yes. Now I remember I wondered why OS doesn't support
> >"at-least-once"
> >semantic.
> >This is the question apart from the new architecture, but is this
> >because
> >of the case that user can execute the non-idempotent action?
> >So though an invoker is failed, still action could be executed and it
> >could
> >cause some side effects such as repeating the action which requires
> >"at-most-once" semantic more than once?
>
> Exactly. Once we pass the HTTP request into the container, we cannot know whether the action has already caused a side-effect. At that point it's not safe to retry (hence /run doesn't allow for retries vs. /init does) and in doubt we need to abort.
> We could imagine the user to state idempotency of an action so it's safe for us to retry, but that's a different can of worms and imho unrelated to the architecture as you say.
>
> >BTW, how would long warmed containers be kept in the new
> >architecture? Is
> >it a 1 or 2 order of magnitude in seconds?
>
> I don't see a reason to change this behavior from what we have today. Could be configurable and potentially be hours. The only concerns are:
> - Scale-down of worker nodes is inhibited if we keep containers around a long time --> costs the vendor money
> - If the system is full with warm containers and we want to evict one to make space for a different container, removing and recreating a container is more expensive than just creating.
>
> >In the new architecture, concurrency limit is controlled by users in
> >a
> >per-action based way?
>
> That's not necessarily architecture related, but Tyson is implementing this, yes. Note that this is "concurrency per container" not "concurrency per action" (which could be a second knob to turn).
>
> In a nutshell:
> - concurrency per container: The amount of parallel HTTP requests allowed for a single container (this is what Tyson is implementing)
> - concurrency per action: You could potentially limit the maximum amount of concurrent invocations running for each action (which is distinct from the above, because this could mean to limit the amount of containers created vs. limiting the amount of parallel HTTP requests to a SINGLE container)
>
> >So in case a user wants to execute the long-running action, does he
> >configure the concurreny limit for the action?
>
> Long running isn't related to concurrency I think.
>
> >
> >And if concurrency limit is 1, in case action container is possessed,
> >wouldn't controllers request a container again and again?
> >And if it only allows container creation in a synchronous
> >way(creating one
> >by one), couldn't it be a burden in case a user wants a huge number
> >of(100~200) simultaneous invocations?
>
> The scaleup model stays exactly the same as today! If you have 200 simultaneous invocations (assuming a per-container concurrency limit of 1) we will create 200 containers to handle that load (given the requests are truly simultaneous --> arrive at the same time). Containers are NOT created in a synchronous way and there's no need to sequentialize their creation. Does something in the proposal hint to that? If so, we should fix that immediately.
>
> No need to apologize, this is great engagement, exactly what we need here. Keep it up!
>
> Cheers,
> Markus
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Rodric Rabbah <ro...@gmail.com>.

Regarding at least or at most once: 

Functions should be stateless and the burden is on the action for external side effects anyway... so it’s plausible with these in mind that we contemplate shifting modes (a la lambda). There are cases though that we should retry that are safer: in flight requests which are lost before they reach the container http end point, and failures to assign a container.

-r

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Dominic,

>Ah yes. Now I remember I wondered why OS doesn't support
>"at-least-once"
>semantic.
>This is the question apart from the new architecture, but is this
>because
>of the case that user can execute the non-idempotent action?
>So though an invoker is failed, still action could be executed and it
>could
>cause some side effects such as repeating the action which requires
>"at-most-once" semantic more than once?

Exactly. Once we pass the HTTP request into the container, we cannot know whether the action has already caused a side-effect. At that point it's not safe to retry (hence /run doesn't allow for retries vs. /init does) and in doubt we need to abort.
We could imagine the user to state idempotency of an action so it's safe for us to retry, but that's a different can of worms and imho unrelated to the architecture as you say.

>BTW, how would long warmed containers be kept in the new
>architecture? Is
>it a 1 or 2 order of magnitude in seconds?

I don't see a reason to change this behavior from what we have today. Could be configurable and potentially be hours. The only concerns are: 
- Scale-down of worker nodes is inhibited if we keep containers around a long time --> costs the vendor money
- If the system is full with warm containers and we want to evict one to make space for a different container, removing and recreating a container is more expensive than just creating.

>In the new architecture, concurrency limit is controlled by users in
>a
>per-action based way?

That's not necessarily architecture related, but Tyson is implementing this, yes. Note that this is "concurrency per container" not "concurrency per action" (which could be a second knob to turn).

In a nutshell:
- concurrency per container: The amount of parallel HTTP requests allowed for a single container (this is what Tyson is implementing)
- concurrency per action: You could potentially limit the maximum amount of concurrent invocations running for each action (which is distinct from the above, because this could mean to limit the amount of containers created vs. limiting the amount of parallel HTTP requests to a SINGLE container)

>So in case a user wants to execute the long-running action, does he
>configure the concurreny limit for the action?

Long running isn't related to concurrency I think.

>
>And if concurrency limit is 1, in case action container is possessed,
>wouldn't controllers request a container again and again?
>And if it only allows container creation in a synchronous
>way(creating one
>by one), couldn't it be a burden in case a user wants a huge number
>of(100~200) simultaneous invocations?

The scaleup model stays exactly the same as today! If you have 200 simultaneous invocations (assuming a per-container concurrency limit of 1) we will create 200 containers to handle that load (given the requests are truly simultaneous --> arrive at the same time). Containers are NOT created in a synchronous way and there's no need to sequentialize their creation. Does something in the proposal hint to that? If so, we should fix that immediately.

No need to apologize, this is great engagement, exactly what we need here. Keep it up!

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by Dominic Kim <st...@gmail.com>.

Dear Markus.
Thank you for the quick response.

> In the proposal, the semantics of talking to the container do not change
from what we have today. If the http request fails for whatever reason
while "in-flight", processing cannot be completed, just like if an invoker
crashes. Note that we commit the message immediatly after reading it today,
which leads to "at-most-once" semantics. OpenWhisk does not support
"at-least-once" today. To do so, you'd retry from the outside.

Ah yes. Now I remember I wondered why OS doesn't support "at-least-once"
semantic.
This is the question apart from the new architecture, but is this because
of the case that user can execute the non-idempotent action?
So though an invoker is failed, still action could be executed and it could
cause some side effects such as repeating the action which requires
"at-most-once" semantic more than once?


> I don't believe the ContainerManager needs to do that much honestly. In
the Kubernetes case for instance it only asks the Kube API for pods and
then keeps a list of these pods per action. Further it divides this list
whenever a new container is added/removed. I think we can push this quite
far in a master/slave fashion as Brendan mentioned. This is guessing
though, it'll be crucial to measure the throughput that one instance
actually can provide and then decide on whether that's feasible or not.
>
> As its state isn't moving at a super fast pace, we can probably afford to
persist it into something like redis or etcd for the failover to take over
if one dies.
>
> Of course I'm very open for scaling it out horizontally if that's
achievable.

Yes, reducing the requests against Docker daemon seems the right way to go.
BTW, how would long warmed containers be kept in the new architecture? Is
it a 1 or 2 order of magnitude in seconds?


> Assuming general round-robin (or even random scheduling) in front of the
controllers would even out things to a certain extent would they?
>
> Another probably feasible solution is to implement session stickiness or
hashing as you mentioned in todays call. Another comment that was raised
during today's would come into play there as well: We could change the
container division algorithm to not divide evenly but to only give
containers to those controllers that requested them. In conjunction with
session stickiness, that could yield better load distribution results
(given the session stickiness is smart enough to divide appropriately.

Yes, that could an option.
I concern it might cause some load imbalance among controller as well.
And another question comes up, how can we keep stick session for multiple
controllers and for multiple actions respectively?



> Overload of a given container is determined by its concurrency limit.
Today that is "1". If 1 request is active in a container, and that is the
only container, we need more containers. As soon as all containers reached
their maximum concurrency, we need to scale up.

In the new architecture, concurrency limit is controlled by users in a
per-action based way?
So in case a user wants to execute the long-running action, does he
configure the concurreny limit for the action?

And if concurrency limit is 1, in case action container is possessed,
wouldn't controllers request a container again and again?
And if it only allows container creation in a synchronous way(creating one
by one), couldn't it be a burden in case a user wants a huge number
of(100~200) simultaneous invocations?


Please bear with many questions.
I am also one of the advocates who want to improve and enhance OW in a such
that way.
I hope my question helps to build more delicate architecture.

Thanks
Best regards
Dominic

2018-07-19 2:16 GMT+09:00 Markus Thoemmes <ma...@de.ibm.com>:

> Hi Dominic,
>
> thanks for your feedback, let's see...
>
> >1. Buffering of activation and failure handling.
> >
> >As of now, Kafka acts as a kind of buffer in case activation
> >processing is
> >a bit delayed due to some reasons such as invoker failure.
> >If Kafka is only used for the overflowing case, how can it guarantee
> >"at
> >least once" activation processing?
> >For example, if a controller receives the requests and it crashed
> >before
> >the activation is complete. How can other alive controllers or the
> >restored
> >controller handle it?
>
> In the proposal, the semantics of talking to the container do not change
> from what we have today. If the http request fails for whatever reason
> while "in-flight", processing cannot be completed, just like if an invoker
> crashes. Note that we commit the message immediatly after reading it today,
> which leads to "at-most-once" semantics. OpenWhisk does not support
> "at-least-once" today. To do so, you'd retry from the outside.
>
> >2. A bottleneck in ContainerManager.
> >
> >Now ContainerManage has many critical logics.
> >It takes care of the container lifecycle, logging, scheduling and so
> >on.
> >Also, it should be aware of whole container state as well as
> >controller
> >status and distribute containers among alive controllers.
> >It might need to do some health checking to/from all controllers and
> >containers(may be orchestrator such as k8s).
> >
> >I think in this case ContainerManager can be a bottleneck as the size
> >of
> >the cluster grows.
> >And if we add more nodes to scale out ContainerManager or to prepare
> >for
> >the SPOF, then all states of ContainerManager should be shared among
> >all
> >nodes.
> >If we take master/slave approach, the master would become a
> >bottleneck at
> >some point, and if we take a clustering approach, we need some
> >mechanism to
> >synchronize the cluster status among ContainerManagers.
> >And this procedure should be done in 1 or 2 order of magnitude in
> >milliseconds.
> >
> >Do you have anything in your mind to handle this?
>
> I don't believe the ContainerManager needs to do that much honestly. In
> the Kubernetes case for instance it only asks the Kube API for pods and
> then keeps a list of these pods per action. Further it divides this list
> whenever a new container is added/removed. I think we can push this quite
> far in a master/slave fashion as Brendan mentioned. This is guessing
> though, it'll be crucial to measure the throughput that one instance
> actually can provide and then decide on whether that's feasible or not.
>
> As its state isn't moving at a super fast pace, we can probably afford to
> persist it into something like redis or etcd for the failover to take over
> if one dies.
>
> Of course I'm very open for scaling it out horizontally if that's
> achievable.
>
> >3. Imbalance among controllers.
> >
> >I think there could be some imbalance among controllers.
> >For example, there are 3 controllers with 3, 1, and 1 containers
> >respectively for the given action.
> >In some case, 1 containers in the controller1 might be overloaded but
> >2
> >containers in controller2 can be available.
> >If the number of containers for the given action belongs to each
> >controller
> >varies, it could happen more easily.
> >This is because controllers are not aware of the status of other
> >controllers.
> >So in some case, some action containers are overloaded but the others
> >may
> >handle just moderate requests.
> >Then each controller may request more containers instead of utilizing
> >existing(but in other controllers) containers, and this can lead to
> >the
> >waste of resources.
>
> Assuming general round-robin (or even random scheduling) in front of the
> controllers would even out things to a certain extent would they?
>
> Another probably feasible solution is to implement session stickiness or
> hashing as you mentioned in todays call. Another comment that was raised
> during today's would come into play there as well: We could change the
> container division algorithm to not divide evenly but to only give
> containers to those controllers that requested them. In conjunction with
> session stickiness, that could yield better load distribution results
> (given the session stickiness is smart enough to divide appropriately.
>
> >4. How do controllers determine whether to create more containers?
> >
> >Let's say, a controller has only one container for the given action.
> >How controllers recognize this container is overloaded and need more
> >containers to create?
> >If the action execution time is short, it can calculate the number of
> >buffered activation for the given action.
> >But the action execution time is long, let's say 1 min or 2 mins,
> >then even
> >though there is only 1 activation request in the buffer, the
> >controller
> >needs to create more containers.
> >(Because subsequent activation request will be delayed for 1 or
> >2mins.)
> >Since we cannot know the execution time of action in advance, we may
> >need a
> >sort of timeout(of activation response) approach for all actions.
> >But still, we cannot know how much time of execution are remaining
> >for the
> >given action after the timeout occurred.
> >Further, if a user requests 100 or 200 concurrent invocations with a
> >2
> >mins-long action, all subsequent requests will suffer from the
> >latency
> >overhead of timeout.
>
> Overload of a given container is determined by its concurrency limit.
> Today that is "1". If 1 request is active in a container, and that is the
> only container, we need more containers. As soon as all containers reached
> their maximum concurrency, we need to scale up.
>
> We do the same thing today I believe.
>
> Does that answer your questions? (Sorry for the broken quote layout, my
> mail client screws these up)
>
> Cheers,
> Markus
>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Dominic,

thanks for your feedback, let's see...

>1. Buffering of activation and failure handling.
>
>As of now, Kafka acts as a kind of buffer in case activation
>processing is
>a bit delayed due to some reasons such as invoker failure.
>If Kafka is only used for the overflowing case, how can it guarantee
>"at
>least once" activation processing?
>For example, if a controller receives the requests and it crashed
>before
>the activation is complete. How can other alive controllers or the
>restored
>controller handle it?

In the proposal, the semantics of talking to the container do not change from what we have today. If the http request fails for whatever reason while "in-flight", processing cannot be completed, just like if an invoker crashes. Note that we commit the message immediatly after reading it today, which leads to "at-most-once" semantics. OpenWhisk does not support "at-least-once" today. To do so, you'd retry from the outside. 

>2. A bottleneck in ContainerManager.
>
>Now ContainerManage has many critical logics.
>It takes care of the container lifecycle, logging, scheduling and so
>on.
>Also, it should be aware of whole container state as well as
>controller
>status and distribute containers among alive controllers.
>It might need to do some health checking to/from all controllers and
>containers(may be orchestrator such as k8s).
>
>I think in this case ContainerManager can be a bottleneck as the size
>of
>the cluster grows.
>And if we add more nodes to scale out ContainerManager or to prepare
>for
>the SPOF, then all states of ContainerManager should be shared among
>all
>nodes.
>If we take master/slave approach, the master would become a
>bottleneck at
>some point, and if we take a clustering approach, we need some
>mechanism to
>synchronize the cluster status among ContainerManagers.
>And this procedure should be done in 1 or 2 order of magnitude in
>milliseconds.
>
>Do you have anything in your mind to handle this?

I don't believe the ContainerManager needs to do that much honestly. In the Kubernetes case for instance it only asks the Kube API for pods and then keeps a list of these pods per action. Further it divides this list whenever a new container is added/removed. I think we can push this quite far in a master/slave fashion as Brendan mentioned. This is guessing though, it'll be crucial to measure the throughput that one instance actually can provide and then decide on whether that's feasible or not.

As its state isn't moving at a super fast pace, we can probably afford to persist it into something like redis or etcd for the failover to take over if one dies.

Of course I'm very open for scaling it out horizontally if that's achievable.

>3. Imbalance among controllers.
>
>I think there could be some imbalance among controllers.
>For example, there are 3 controllers with 3, 1, and 1 containers
>respectively for the given action.
>In some case, 1 containers in the controller1 might be overloaded but
>2
>containers in controller2 can be available.
>If the number of containers for the given action belongs to each
>controller
>varies, it could happen more easily.
>This is because controllers are not aware of the status of other
>controllers.
>So in some case, some action containers are overloaded but the others
>may
>handle just moderate requests.
>Then each controller may request more containers instead of utilizing
>existing(but in other controllers) containers, and this can lead to
>the
>waste of resources.

Assuming general round-robin (or even random scheduling) in front of the controllers would even out things to a certain extent would they?

Another probably feasible solution is to implement session stickiness or hashing as you mentioned in todays call. Another comment that was raised during today's would come into play there as well: We could change the container division algorithm to not divide evenly but to only give containers to those controllers that requested them. In conjunction with session stickiness, that could yield better load distribution results (given the session stickiness is smart enough to divide appropriately.

>4. How do controllers determine whether to create more containers?
>
>Let's say, a controller has only one container for the given action.
>How controllers recognize this container is overloaded and need more
>containers to create?
>If the action execution time is short, it can calculate the number of
>buffered activation for the given action.
>But the action execution time is long, let's say 1 min or 2 mins,
>then even
>though there is only 1 activation request in the buffer, the
>controller
>needs to create more containers.
>(Because subsequent activation request will be delayed for 1 or
>2mins.)
>Since we cannot know the execution time of action in advance, we may
>need a
>sort of timeout(of activation response) approach for all actions.
>But still, we cannot know how much time of execution are remaining
>for the
>given action after the timeout occurred.
>Further, if a user requests 100 or 200 concurrent invocations with a
>2
>mins-long action, all subsequent requests will suffer from the
>latency
>overhead of timeout.

Overload of a given container is determined by its concurrency limit. Today that is "1". If 1 request is active in a container, and that is the only container, we need more containers. As soon as all containers reached their maximum concurrency, we need to scale up.

We do the same thing today I believe.

Does that answer your questions? (Sorry for the broken quote layout, my mail client screws these up)

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by Dominic Kim <st...@gmail.com>.

Dear Markus.
Thank you for the great work!

I think this is a good approach in the big picture.

I have a few questions.

1. Buffering of activation and failure handling.

As of now, Kafka acts as a kind of buffer in case activation processing is
a bit delayed due to some reasons such as invoker failure.
If Kafka is only used for the overflowing case, how can it guarantee "at
least once" activation processing?
For example, if a controller receives the requests and it crashed before
the activation is complete. How can other alive controllers or the restored
controller handle it?

2. A bottleneck in ContainerManager.

Now ContainerManage has many critical logics.
It takes care of the container lifecycle, logging, scheduling and so on.
Also, it should be aware of whole container state as well as controller
status and distribute containers among alive controllers.
It might need to do some health checking to/from all controllers and
containers(may be orchestrator such as k8s).

I think in this case ContainerManager can be a bottleneck as the size of
the cluster grows.
And if we add more nodes to scale out ContainerManager or to prepare for
the SPOF, then all states of ContainerManager should be shared among all
nodes.
If we take master/slave approach, the master would become a bottleneck at
some point, and if we take a clustering approach, we need some mechanism to
synchronize the cluster status among ContainerManagers.
And this procedure should be done in 1 or 2 order of magnitude in
milliseconds.

Do you have anything in your mind to handle this?

3. Imbalance among controllers.

I think there could be some imbalance among controllers.
For example, there are 3 controllers with 3, 1, and 1 containers
respectively for the given action.
In some case, 1 containers in the controller1 might be overloaded but 2
containers in controller2 can be available.
If the number of containers for the given action belongs to each controller
varies, it could happen more easily.
This is because controllers are not aware of the status of other
controllers.
So in some case, some action containers are overloaded but the others may
handle just moderate requests.
Then each controller may request more containers instead of utilizing
existing(but in other controllers) containers, and this can lead to the
waste of resources.

4. How do controllers determine whether to create more containers?

Let's say, a controller has only one container for the given action.
How controllers recognize this container is overloaded and need more
containers to create?
If the action execution time is short, it can calculate the number of
buffered activation for the given action.
But the action execution time is long, let's say 1 min or 2 mins, then even
though there is only 1 activation request in the buffer, the controller
needs to create more containers.
(Because subsequent activation request will be delayed for 1 or 2mins.)
Since we cannot know the execution time of action in advance, we may need a
sort of timeout(of activation response) approach for all actions.
But still, we cannot know how much time of execution are remaining for the
given action after the timeout occurred.
Further, if a user requests 100 or 200 concurrent invocations with a 2
mins-long action, all subsequent requests will suffer from the latency
overhead of timeout.

Thanks
Best regards
Dominic.

2018-07-18 22:45 GMT+09:00 Martin Gencur <mg...@redhat.com>:

> On 18.7.2018 14:41, Markus Thoemmes wrote:
>
>> Hi Martin,
>>
>> thanks for the great questions :)
>>
>> thinking about scalability and the edge case. When there are not
>>> enough
>>> containers and new controllers are being created, and all of them
>>> redirect traffic to the controllers with containers, doesn't it mean
>>> overloading the available containers a lot? I'm curious how we
>>> throttle the traffic in this case.
>>>
>> True, the first few requests will overload the controller that owns the
>> very first container. That one will request new containers immediately,
>> which will then be distributed to all existing Controllers by the
>> ContainerManager. An interesting wrinkle here is, that you'd want the
>> overloading requests to be completed by the Controllers that sent it to the
>> "single-owning-Controller".
>>
>
> Ah, got it. So it is a pretty common scenario. Scaling out controllers and
> containers. I thought this is a case where we reach a limit of created
> containers and no more containers can be created.
>
>
>   What we could do here is:
>>
>> Controller0 owns ContainerA1
>> Controller1 relays requests for A to Controller0
>> Controller0 has more requests than it can handle, so it requests
>> additional containers. All requests coming from Controller1 will be
>> completed with a predefined message (for example "HTTP 503 overloaded" with
>> a specific header say "X-Return-To-Sender-By: Controller0")
>> Controller1 recognizes this as "okay, I'll wait for containers to
>> appear", which will eventually happen (because Controller0 has already
>> requested them) so it can route and complete those requests on its own.
>> Controller1 will now no longer relay requests to Controller0 but will
>> request containers itself (acknowledging that Controller0 is already
>> overloaded).
>>
>
> Yeah, I think it makes sense.
>
>
>> I guess the other approach would be to block creating new controllers
>>> when there are no containers available as long as we don't want to
>>> overload the existing containers. And keep the overflowing workload
>>> in Kafka as well.
>>>
>> Right, the second possibility is to use a pub/sub (not necessarily Kafka)
>> queue between Controllers. Controller0 subscribes to a topic for action A
>> because it owns a container for it. Controller1 doesn't own a container
>> (yet) and publishes a message as overflow to topic A. The wrinkle in this
>> case is, that Controller0 can't complete the request but needs to send it
>> back to Controller1 (where the HTTP connection is opened from the client).
>>
>> Does that make sense?
>>
>
> I was rather thinking about blocking the creation of Controller1 in this
> case and responding to the client that the system is overloaded. But the
> first approach seems better because it's a pretty common use case (not
> reaching the limit of created containers).
>
> Thanks!
> Martin
>
>
>> Cheers,
>> Markus
>>
>>
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Martin Gencur <mg...@redhat.com>.

On 18.7.2018 14:41, Markus Thoemmes wrote:
> Hi Martin,
>
> thanks for the great questions :)
>
>> thinking about scalability and the edge case. When there are not
>> enough
>> containers and new controllers are being created, and all of them
>> redirect traffic to the controllers with containers, doesn't it mean
>> overloading the available containers a lot? I'm curious how we
>> throttle the traffic in this case.
> True, the first few requests will overload the controller that owns the very first container. That one will request new containers immediately, which will then be distributed to all existing Controllers by the ContainerManager. An interesting wrinkle here is, that you'd want the overloading requests to be completed by the Controllers that sent it to the "single-owning-Controller".

Ah, got it. So it is a pretty common scenario. Scaling out controllers 
and containers. I thought this is a case where we reach a limit of 
created containers and no more containers can be created.


>   What we could do here is:
>
> Controller0 owns ContainerA1
> Controller1 relays requests for A to Controller0
> Controller0 has more requests than it can handle, so it requests additional containers. All requests coming from Controller1 will be completed with a predefined message (for example "HTTP 503 overloaded" with a specific header say "X-Return-To-Sender-By: Controller0")
> Controller1 recognizes this as "okay, I'll wait for containers to appear", which will eventually happen (because Controller0 has already requested them) so it can route and complete those requests on its own.
> Controller1 will now no longer relay requests to Controller0 but will request containers itself (acknowledging that Controller0 is already overloaded).

Yeah, I think it makes sense.

>
>> I guess the other approach would be to block creating new controllers
>> when there are no containers available as long as we don't want to
>> overload the existing containers. And keep the overflowing workload
>> in Kafka as well.
> Right, the second possibility is to use a pub/sub (not necessarily Kafka) queue between Controllers. Controller0 subscribes to a topic for action A because it owns a container for it. Controller1 doesn't own a container (yet) and publishes a message as overflow to topic A. The wrinkle in this case is, that Controller0 can't complete the request but needs to send it back to Controller1 (where the HTTP connection is opened from the client).
>
> Does that make sense?

I was rather thinking about blocking the creation of Controller1 in this 
case and responding to the client that the system is overloaded. But the 
first approach seems better because it's a pretty common use case (not 
reaching the limit of created containers).

Thanks!
Martin

>
> Cheers,
> Markus
>

Re: Proposal on a future architecture of OpenWhisk

Posted by Markus Thoemmes <ma...@de.ibm.com>.

Hi Martin,

thanks for the great questions :)

>thinking about scalability and the edge case. When there are not
>enough 
>containers and new controllers are being created, and all of them 
>redirect traffic to the controllers with containers, doesn't it mean 
>overloading the available containers a lot? I'm curious how we
>throttle the traffic in this case.

True, the first few requests will overload the controller that owns the very first container. That one will request new containers immediately, which will then be distributed to all existing Controllers by the ContainerManager. An interesting wrinkle here is, that you'd want the overloading requests to be completed by the Controllers that sent it to the "single-owning-Controller". What we could do here is:

Controller0 owns ContainerA1
Controller1 relays requests for A to Controller0
Controller0 has more requests than it can handle, so it requests additional containers. All requests coming from Controller1 will be completed with a predefined message (for example "HTTP 503 overloaded" with a specific header say "X-Return-To-Sender-By: Controller0")
Controller1 recognizes this as "okay, I'll wait for containers to appear", which will eventually happen (because Controller0 has already requested them) so it can route and complete those requests on its own.
Controller1 will now no longer relay requests to Controller0 but will request containers itself (acknowledging that Controller0 is already overloaded).

>
>I guess the other approach would be to block creating new controllers
>when there are no containers available as long as we don't want to 
>overload the existing containers. And keep the overflowing workload
>in Kafka as well.

Right, the second possibility is to use a pub/sub (not necessarily Kafka) queue between Controllers. Controller0 subscribes to a topic for action A because it owns a container for it. Controller1 doesn't own a container (yet) and publishes a message as overflow to topic A. The wrinkle in this case is, that Controller0 can't complete the request but needs to send it back to Controller1 (where the HTTP connection is opened from the client).

Does that make sense?

Cheers,
Markus

Re: Proposal on a future architecture of OpenWhisk

Posted by Martin Gencur <mg...@redhat.com>.

Hi Markus,
thinking about scalability and the edge case. When there are not enough 
containers and new controllers are being created, and all of them 
redirect traffic to the controllers with containers, doesn't it mean 
overloading the available containers a lot? I'm curious how we throttle 
the traffic in this case.

I guess the other approach would be to block creating new controllers 
when there are no containers available as long as we don't want to 
overload the existing containers. And keep the overflowing workload in 
Kafka as well.

Thanks,
Martin Gencur
QE, Red Hat

On 13.7.2018 19:29, Markus Thoemmes wrote:
> Hello OpenWhiskers,
>
> I just published a proposal on a potential future architecture for OpenWhisk that aligns deployments with and without an underlying container orchestrator like Mesos or Kubernetes. It also incooperates some of the proposals that are already out there and tries to give a holistic view of where we want OpenWhisk to go to in the near future. It's designed to keep the APIs stable but is very invasive in its changes under the hood.
>
> This proposal is the outcome of a lot of discussions with fellow colleagues and community members. It is based on experience with the problems the current architecture has. Moreover it aims to remove friction with the deployment topologies on top of a container orchestrator.
>
> Feedback is very very very welcome! The proposal has some gaps and generally does not go into much detail implementationwise. I'd love to see all those gaps filled by the community!
>
> Find the proposal here: https://cwiki.apache.org/confluence/display/OPENWHISK/OpenWhisk+future+architecture
>
> Cheers,
> Markus
>