You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Mayank Jaglan <mj...@umail.iu.edu> on 2017/02/17 11:00:32 UTC

[#Spring17-Airavata-Courses] : Distributed Caching for Airavata Gateway

Hello Dev,

We are working on a way to provide single logical view (and state) for session and security management using caching for the web applications.

For building a distributed cache system, I think, it should be able to -
- Scale horizontally across multiple servers.
- Scale across multiple regions (WANs).
- Provide high availability
- Provide Fault Tolerant, fail-over cluster
- Perform faster/ concurrent read & writes
- Provide data persistence, in the event of power failure
- Install, configure and deploy with less complexity
- Work well will popular technologies, like PhP, C++. Python, JAVA based frameworks

Following is the Github link to the example portal under consideration -
https://github.com/airavata-courses/spring17-laravel-portal

Following is the Github issue created for this experiment -
https://github.com/airavata-courses/spring17-laravel-portal/issues/5

** Proposed Solution **
Before starting this email thread, I did some experiment and I find a system like redis cluster is very much suitable. Here are its pros and cons -

Pros:
- Scale horizontally across multiple servers.
- Scale across multiple regions (WANs).
- Provide high availability
- Provide Fault Tolerant, fail-over cluster
- Can store data in a variety of data structures: list, array, sets and sorted sets. This means using redis APIs, a specific attribute of an object can be modified directly in the cache rather requesting full refresh of the object inside the redis cache. This brings over all efficiency for larger objects.
- Pipelining Multiple commands at once.
- Blocking reads -- will sit and wait until another process writes data to the cache
- Mass insertion of data to prime a cache
- Partitions data across multiple redis master-instances
- Can back data to disk. Provide data persistence, in the event of power failure.
- Work well will popular technologies, like PhP, C++. Python, JAVA based frameworks

Cons:
- Can be complex to configure -- requires consideration of data size to configure well.
- Requires Redis server administration for monitoring, partitioning and balancing.
- Sentinel is still a single point of failure. If the master wipes out, and Sentinel doesn't work, the system is out!

Your feedback on this topic will be helpful.

Best,
Mayank Jaglan

Re: [#Spring17-Airavata-Courses] : Distributed Caching for Airavata Gateway

Posted by Sneha Tilak <sn...@gmail.com>.

Hello Mayank, dev


I am assuming that you have suggested the Redis caching mechanism for the
whole distributed system. In that case, summarizing what was discussed on
02/17/2017, we are to implement:


   - A Thrift API which “writes” to a database (we are yet to determine the
   kind of data to be stored in the DB)
   - Various services which “read” from the DB to retrieve some app related
   data.
   - Cache for the services which read from the DB to store a copy of the
   most commonly used data.

Though the problem above is not completely defined, while thinking of a
solution, there are some performance concerns and "gotchas" we should take
into consideration while trying to maintain a "living cache" of this sort:


   - What performs “writes” to the DB?
   Is it just the Thrift API Gateway or ever the individual services? In
   either case, we must think about synchronization. We should maintain
   consistency among the data.
   - Are we to use a centralized cache or local caches?
   If you ask me, a local cache for each of the services is a better idea
   as the services may want to request different sets of data from the DB.
   - How often is there a “write” to the DB?
   If the “writes” to the DB are rare (as in once a week), we can expire
   the data in the cache every time there is a “write”. In this case, we must
   load the fresh data present in the DB on to the local caches each time
   there is a “read” requested. On the other hand, if the “writes” are often
   then we must make sure that the data in the local caches are up-to-date. In
   this case, I would suggest maintaining short-term and long-term cacheable
   data for the two data sets.
   - Is it okay for data to be unavailable for a brief amount of time?
   As we know, whenever the cache refreshes, there is a lag introduced
   while waiting for the up-to-date data to reflect in the local caches. We
   must make sure that the lag induced is minimal.

In my opinion, we require a wrapper on the DB update. When there is a
“write” to the DB, the code should do a push/expiry on the cache. Is Redis
still the optimal choice for this? If yes, how can we make sure that the
above concerns are addressed using Redis?


Do let me know if there is anything related to this issue that I have
missed.



Thanks,
Sneha Tilak

On Fri, Feb 17, 2017 at 6:00 AM, Mayank Jaglan <mj...@umail.iu.edu> wrote:

> Hello Dev,
>
> We are working on a way to provide single logical view (and state) for
> session and security management using caching for the web applications.
>
> For building a distributed cache system, I think, it should be able to -
> -    Scale horizontally across multiple servers.
> -    Scale across multiple regions (WANs).
> -    Provide high availability
> -    Provide Fault Tolerant, fail-over cluster
> -    Perform faster/ concurrent read & writes
> -    Provide data persistence, in the event of power failure
> -    Install, configure and deploy with less complexity
> -    Work well will popular technologies, like PhP, C++. Python, JAVA
> based frameworks
>
>
> Following is the Github link to the example portal under consideration -
>         https://github.com/airavata-courses/spring17-laravel-portal
>
> Following is the Github issue created for this experiment -
>         https://github.com/airavata-courses/spring17-laravel-
> portal/issues/5
>
>
>
> ** Proposed Solution **
> Before starting this email thread, I did some experiment and I find a
> system like redis cluster is very much suitable. Here are its pros and cons
> -
>
> Pros:
> -    Scale horizontally across multiple servers.
> -    Scale across multiple regions (WANs).
> -    Provide high availability
> -    Provide Fault Tolerant, fail-over cluster
> -    Can store data in a variety of data structures: list, array, sets and
> sorted sets. This means using redis APIs, a specific attribute of an object
> can be modified directly in the cache rather requesting full refresh of the
> object inside the redis cache. This brings over all efficiency for larger
> objects.
> -    Pipelining Multiple commands at once.
> -    Blocking reads -- will sit and wait until another process writes data
> to the cache
> -    Mass insertion of data to prime a cache
> -    Partitions data across multiple redis master-instances
> -    Can back data to disk. Provide data persistence, in the event of
> power failure.
> -    Work well will popular technologies, like PhP, C++. Python, JAVA
> based frameworks
>
> Cons:
> -    Can be complex to configure -- requires consideration of data size to
> configure well.
> -    Requires Redis server administration for monitoring, partitioning and
> balancing.
> -    Sentinel is still a single point of failure. If the master wipes out,
> and Sentinel doesn't work, the system is out!
>
>
> Your feedback on this topic will be helpful.
>
> Best,
> Mayank Jaglan
>