You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by sriram <sr...@gmail.com> on 2014/10/02 00:09:28 UTC
Re: [jira] [Commented] (SAMZA-424) Add a Cache state API to the Samza container

The differences between the cache and store are subtle. There are few
options here -

1. Merge the configs and do not differentiate between key value store and
cache explicitly. The new cache configs are just extended configs for the
key value store. The pros with this approach is that it avoids creating a
new API and helps us to use the existing store types. The cons with this
approach is that it becomes very subtle on which configs to use and if the
configs can be made to work across all underlying key value stores. Also,
what does caching mean here? It looks like we just want to provide an
eviction policy to the store and we should avoid calling it as an explicit
cache mode.

2. Have a cache API that only exposes get and flush. In this model, we
would define cache store explicitly in the config and also provide the
cache factory. For example, if we just wanted a cache backed by Voldemort,
we could simply have a VoldemortCacheStore that populates the store on
demand. It could have an option to write the changes to the changelog that
would help to avoid the cold start. The pros with this approach is that
everything happens behind the API and framework user would simply call get
when they need. This is not possible in option 1 since they would have to
explicitly put messages into the store after reading it from a remote
store. The code can also not be shared if the cache store backed by a
remote store is a common use case. The cons is that it introduces a new
store that is strictly read only and limits the caching functionality to
just reads.

3. We could also do both 1 and 2. The key value store could have an
eviction policy to bound the memory and cache store is used explicitly for
cases where we want a backing store and have the store do all the heavy
lifting of populating the cache.


On Wed, Oct 1, 2014 at 10:50 AM, Chris Riccomini (JIRA) <ji...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/SAMZA-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155191#comment-14155191
> ]
>
> Chris Riccomini commented on SAMZA-424:
> ---------------------------------------
>
> I think what we have now is composable config-based assembly, which is
> what we want to avoid. Composability in code is probably OK, but a more
> drastic change. If we could find a way to get pre-defined teplate config
> working in a good way for the stores, I think that's ideal, since it's the
> closest to what we've already got.
>
> > Add a Cache state API to the Samza container
> > --------------------------------------------
> >
> >                 Key: SAMZA-424
> >                 URL: https://issues.apache.org/jira/browse/SAMZA-424
> >             Project: Samza
> >          Issue Type: New Feature
> >          Components: container
> >            Reporter: Chinmay Soman
> >            Assignee: Chinmay Soman
> >         Attachments: SAMZA-424-Cache-API_0.pdf
> >
> >
> > There are cases when the user code needs access to a 'cache' which can
> be used to store custom data. This cache is different from the KeyValue
> store in the following ways:
> > * At the very least Needs to support LRU (Least Recently Used) and TTL
> (Time To Live) eviction strategies
> > * May not support all() and range() operations (since this wreaks havoc
> with the eviction operation)
> > * Needs to exist at a per task or a per container level.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>