You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by James W Dubee <jw...@us.ibm.com> on 2018/05/16 20:46:27 UTC

Activation Store Service Provider Interface


Fellow developers,

Currently the activation store is tightly coupled with the artifact store
utilized by an OpenWhisk deployment. Meaning the controller must use the
same data store, CouchDB for example, utilized by an OpenWhisk instance to
store and retrieve activation details. However, an invoker can use a
customized log store implementation to store user logs and activation
records in a data store that is independent of the primary database. Such
differences in functionality between the controller and invoker with
regards to activation storage and retrieval is problematic.


Activation records are special as they hold meaningful details about the
execution behavior of actions, triggers, rules, and sequences. Ideally
users should be able to run customized queries on these activation records
in order to find out the execution time of a group of entities, if a group
of entities were executed successfully, etc. By default, CouchDb does not
allow such customized queries to be performed. To provide user defined
queries on activations, we have already provided a log store that can be
customized per-deployment via a Service Provider Interface (SPI). Currently
we have log store implementations for Elasticsearch and Splunk. Both of
these services allow user queries.


Unfortunately these log store implementations apply mostly to the invoker.
For instance, the controller can only fetch logs from the log store using
the activation logs API. While the rest of the activation APIs must utilize
the artifact store. Activation records must be in the artifact store in
order for most of the controller activation APIs to return activation
information, even though those same activation records may have also been
saved in a different backing store by the log store. Consequently,
activation records and logs may be duplicated in the log store and artifact
store. Another problem revolves around the controller writing user logs.
User logs generated by the controller for triggers and sequences are only
written to the artifact store even if a non-default log store is being
used. Therefore, users cannot run customized queries on user logs generated
by the controller.

To eliminate the duplication of activations, and controller user logs being
inaccessible to the log store, an activation store SPI can be provided.
When configured appropriately the activation store would be able to utilize
the same data store used by the log store. This would eliminate duplication
of stored activation information in separate databases, and allow the log
store access to user logs generated by the controller.


Work for providing the activation store that is compatible with the
artifact store has already started. The related PR can be found here:
https://github.com/apache/incubator-openwhisk/pull/3619. Providing an
activation store for Elasticsearch will follow.


Regards,
James Dubee

Re: Activation Store Service Provider Interface

Posted by Chetan Mehrotra <ch...@gmail.com>.
Hi James,

This look quite useful and would allow usage of more compatible
storage for activations records. Probably we can also leverage TTL
support to expire older activations.

Few queries and suggestions

A - Activations Records and Immutability
--------------------------------------------------

I assume yes but just confirming as stores like ES are not that
suitable for updates

B - Stream activation records
---------------------------------------

In some cases we are seeing large activation responses when requested
by client (with lots of concurrent request) put pressure on Controller
and Invoker heaps. As in most cases the logic in controller or invoker
does not "interpret" the activation record it would be efficient to
have an API which can just stream the json content to output response.

So if the `ActivationStore` provides a get by id method which accepts
a Sink[ByteString, Future[T]] like we have in
ArtifactStore#readAttachment then that can possibly be plugging to
HTTP response sink and thereby avoid buffering the whole json in
memory. This would reduce heap pressure for web actions and other
cases where lots of clients are concurrently requesting activation
records
Chetan Mehrotra


On Thu, May 17, 2018 at 2:16 AM, James W Dubee <jw...@us.ibm.com> wrote:
>
>
> Fellow developers,
>
> Currently the activation store is tightly coupled with the artifact store
> utilized by an OpenWhisk deployment. Meaning the controller must use the
> same data store, CouchDB for example, utilized by an OpenWhisk instance to
> store and retrieve activation details. However, an invoker can use a
> customized log store implementation to store user logs and activation
> records in a data store that is independent of the primary database. Such
> differences in functionality between the controller and invoker with
> regards to activation storage and retrieval is problematic.
>
>
> Activation records are special as they hold meaningful details about the
> execution behavior of actions, triggers, rules, and sequences. Ideally
> users should be able to run customized queries on these activation records
> in order to find out the execution time of a group of entities, if a group
> of entities were executed successfully, etc. By default, CouchDb does not
> allow such customized queries to be performed. To provide user defined
> queries on activations, we have already provided a log store that can be
> customized per-deployment via a Service Provider Interface (SPI). Currently
> we have log store implementations for Elasticsearch and Splunk. Both of
> these services allow user queries.
>
>
> Unfortunately these log store implementations apply mostly to the invoker.
> For instance, the controller can only fetch logs from the log store using
> the activation logs API. While the rest of the activation APIs must utilize
> the artifact store. Activation records must be in the artifact store in
> order for most of the controller activation APIs to return activation
> information, even though those same activation records may have also been
> saved in a different backing store by the log store. Consequently,
> activation records and logs may be duplicated in the log store and artifact
> store. Another problem revolves around the controller writing user logs.
> User logs generated by the controller for triggers and sequences are only
> written to the artifact store even if a non-default log store is being
> used. Therefore, users cannot run customized queries on user logs generated
> by the controller.
>
> To eliminate the duplication of activations, and controller user logs being
> inaccessible to the log store, an activation store SPI can be provided.
> When configured appropriately the activation store would be able to utilize
> the same data store used by the log store. This would eliminate duplication
> of stored activation information in separate databases, and allow the log
> store access to user logs generated by the controller.
>
>
> Work for providing the activation store that is compatible with the
> artifact store has already started. The related PR can be found here:
> https://github.com/apache/incubator-openwhisk/pull/3619. Providing an
> activation store for Elasticsearch will follow.
>
>
> Regards,
> James Dubee