You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by "Alan M. Carroll" <am...@network-geographics.com> on 2013/08/11 02:15:37 UTC

Tiered Storage for cache design

All;

I have added a new section to the documentation, 'arch', for "Architecture". Currently it contains just my writings on the cache implementation but others are of course STRONGLY encouraged to contribute. This should not be API level stuff but design / architecture stuff that explores the concepts, designs, and interactions of components of ATS. I think it is also a good place for writing up design suggestions for future implementation.

In that vein, I have added a design Phil Sorber and I worked out at the ATS Summit in Denver concerning tiered storage in the cache. We think it's a reasonable level of implementation complexity while having a clean API, understandable semantics, a simple regression to the current implementation, and satisfying the requirements for people who want this.

Unfortunately I will be out of town (without even guaranteed electrical power) until Thursday. I will probably only be able to check email once a day or so.


Re: Tiered Storage for cache design

Posted by James Peach <jp...@apache.org>.
On Aug 10, 2013, at 5:15 PM, Alan M. Carroll <am...@network-geographics.com> wrote:

> All;
> 
> I have added a new section to the documentation, 'arch', for "Architecture".

https://trafficserver.readthedocs.org/en/latest/arch/index.en.html

Nice!

> Currently it contains just my writings on the cache implementation but others are of course STRONGLY encouraged to contribute. This should not be API level stuff but design / architecture stuff that explores the concepts, designs, and interactions of components of ATS. I think it is also a good place for writing up design suggestions for future implementation.
> 
> In that vein, I have added a design Phil Sorber and I worked out at the ATS Summit in Denver concerning tiered storage in the cache. We think it's a reasonable level of implementation complexity while having a clean API, understandable semantics, a simple regression to the current implementation, and satisfying the requirements for people who want this.
> 
> Unfortunately I will be out of town (without even guaranteed electrical power) until Thursday. I will probably only be able to check email once a day or so.
> 


Re: Tiered Storage for cache design

Posted by "Alan M. Carroll" <am...@network-geographics.com>.
Saturday, August 10, 2013, 9:27:02 PM, you wrote:

> I'm a little uncertain on this:

>         "When a request is received from a client volume assignment is done in parallel for each tier quality".


> This seems like it could be potentially expensive. For example, an object that is in all cache tiers would have to examine that object in the cache before saying it's "readable" or not, right? So lets say an object is in SSD and rotational disk, and the Oracle queries this in parallel, it'll cause us to become bound by the speed of the rotational disk, no?

No. Currently volume assignment is done without reference to disk. I think in this case the oracle would be presumed to consult an in memory directory (as is done now). This is the point of the "RW" return - "I think I have it, but maybe my directory is out of date". In that case the actual I/O is sequential as the tier volume attempts to read the actual object. The oracle determines the probe order of the tiers that returned READ or RW.

I'll take a look in more detail about the exact sequencing here. Something to think about on the road.


Re: Tiered Storage for cache design

Posted by Leif Hedstrom <zw...@apache.org>.
On Aug 10, 2013, at 7:49 PM, Leif Hedstrom <zw...@apache.org> wrote:

> 
> On Aug 10, 2013, at 6:15 PM, "Alan M. Carroll" <am...@network-geographics.com> wrote:
> 
>> All;
>> 
>> I have added a new section to the documentation, 'arch', for "Architecture". Currently it contains just my writings on the cache implementation but others are of course STRONGLY encouraged to contribute. This should not be API level stuff but design / architecture stuff that explores the concepts, designs, and interactions of components of ATS. I think it is also a good place for writing up design suggestions for future implementation.
> 
> Very cool. A few questions on the cache tiering:


I'm a little uncertain on this:

	"When a request is received from a client volume assignment is done in parallel for each tier quality".


This seems like it could be potentially expensive. For example, an object that is in all cache tiers would have to examine that object in the cache before saying it's "readable" or not, right? So lets say an object is in SSD and rotational disk, and the Oracle queries this in parallel, it'll cause us to become bound by the speed of the rotational disk, no? The risk of course is that you kill the I/O on the slow rotational disk long before the SSD. (An SSD can do maybe 20,000 lookups /sec, a rotational disk can do 300/sec, yet you theoretically hammer both with the same IOPS).

It feels like these lookups should be done sequential, or at least staggered, such that a fast device (e.g. SSD) has a chance to short circuit the lookup process.  Maybe the quality number can be used as a hint to how fast that tier is: high 16-bit becomes a metric value indicating response time in micro seconds, low 16-bit is the priority when response times are "similar".

Cheers,

-- leif


Re: Tiered Storage for cache design

Posted by "Alan M. Carroll" <am...@network-geographics.com>.
Saturday, August 10, 2013, 8:49:45 PM, you wrote:

> 1) Is the quality the same as a "tier level" ? If so, 4,294,967,296 different tiers seems incredibly excessive.

Yes. We thought it more hassle than benefit to restrict the quality value to less than 32 bits. It does give some flexibility in the choice of values if a user wants to encode more than a simple linear sequence in them.

> 2) The Oracle logic doesn't seem to include a feature to evict from one tier when the object gets "promoted" to a higher tier. I don't know if this is required, but it seems reasonable to think that if I have RAM + SSD + Disk, once an object gets stored in the RAM cache, it's enough to keep it in rotational disk for the time being.

This was discussed and I was convinced that explicit eviction wasn't that useful. Of course, behind the scenes the oracle could evict when it returns a WRITE or RW for some other tier.

> 3) How are the different cache tiers implemented? Are they all using existing disk cache layouts (except for RAM cache)? I guess if that's the case, #2 above might not make a whole lot of sense (it'll churn out anyways in the cyclone).

The same as the current volumes. The goal was to make the current layout and behavior identical to a tiered system with only one tier value, because that is what a user gets if they do not specify any tier values.

> 4) It feels like we should redo the existing RAM + disk cache such that the default is two tiers and the promotion algorithms between the two tiers maps to our LRU or CFLUS implementations (but using this concept of tiers). That would imply that a cache tier can choose between at least 3 different types cache layouts.

I need to think about it in more detail, but my view is that you could do this in the proposed design by an appropriate implementation of the oracle. Presumably we would provide reference oracle implementations that did this.


Re: Tiered Storage for cache design

Posted by Leif Hedstrom <zw...@apache.org>.
On Aug 10, 2013, at 6:15 PM, "Alan M. Carroll" <am...@network-geographics.com> wrote:

> All;
> 
> I have added a new section to the documentation, 'arch', for "Architecture". Currently it contains just my writings on the cache implementation but others are of course STRONGLY encouraged to contribute. This should not be API level stuff but design / architecture stuff that explores the concepts, designs, and interactions of components of ATS. I think it is also a good place for writing up design suggestions for future implementation.

Very cool. A few questions on the cache tiering:


1) Is the quality the same as a "tier level" ? If so, 4,294,967,296 different tiers seems incredibly excessive.

2) The Oracle logic doesn't seem to include a feature to evict from one tier when the object gets "promoted" to a higher tier. I don't know if this is required, but it seems reasonable to think that if I have RAM + SSD + Disk, once an object gets stored in the RAM cache, it's enough to keep it in rotational disk for the time being.

3) How are the different cache tiers implemented? Are they all using existing disk cache layouts (except for RAM cache)? I guess if that's the case, #2 above might not make a whole lot of sense (it'll churn out anyways in the cyclone).

4) It feels like we should redo the existing RAM + disk cache such that the default is two tiers and the promotion algorithms between the two tiers maps to our LRU or CFLUS implementations (but using this concept of tiers). That would imply that a cache tier can choose between at least 3 different types cache layouts.


Cheers,

-- Leif