You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Wei Bo <bi...@live.com> on 2012/12/21 20:18:58 UTC

How disk storage is implemented




Hi all, Is there a document/notes available describing how the Apache Traffic Server disk storage part of the cache is implemented/data structures/layout? I would like to understand how the disk cache would behave under various extreme conditions (lots of little objects, big objects) and it's been hard trying to understand the disk cache just by reading the code. Something like https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo 		 	   		  

RE: How disk storage is implemented

Posted by Wei Bo <bi...@live.com>.
Hi John,

Thanks a lot, this is really helpful.


> Date: Sun, 23 Dec 2012 12:10:41 -0800
> Subject: Re: How disk storage is implemented
> From: jplevyak@acm.org
> To: dev@trafficserver.apache.org
> 
> I added a bit of a design document.
> 
> https://cwiki.apache.org/confluence/display/TS/DiskStorageLayout
> 
> I didn't include anything about how the initial CacheKey is generated
> because that is a function of HTTP (i.e. what does actually constitute a
> "unique" document).
> 
> john
> 
> On Fri, Dec 21, 2012 at 7:57 PM, Wei Bo <bi...@live.com> wrote:
> 
> >
> > Thanks in advance. Couple of things that would be good to start (please
> > excuse my ignorance if these questions seem silly): - What does the
> > ``evacuate'' process refer to in the code? Taking stuff that was buffered
> > in memory and writing it to disk?- How is the metadata organized for the
> > on-disk data structures? It is a hash table? B+ tree? Something else?- How
> > much metadata is kept in the memory concerning the disk structures? I see
> > in the code a Lookaside table, but  am not clear what mapping is being
> > stored there.- Is there any recovery process to address inconsistency
> > following an unclean (say power outage) shutdown of the cache?- How is a
> > CacheKey generated from the http object key (e.g. host + url + vary headers
> > + alternates)? Looking at the code, it seems that:   - Vol corresponds to a
> > disk file or raw disk partition.  - Within Vol we have Dir which seem to be
> > storing metadata about actual datablocks? (There is    mention how Dir
> > nodes are allocated every 8k block) These seem to be chained together to
> > comprise the     Document (for example Document stored over a number of Dir
> > blocks). Is this on track, or am I totally off base?  Thanks > Date: Fri,
> > 21 Dec 2012 14:45:31 -0800
> > > Subject: Re: How disk storage is implemented
> > > From: jplevyak@gmail.com
> > > To: dev@trafficserver.apache.org
> > >
> > > Unfortunately most of the docs were lost when the code was in storage
> > > before being revived at yahoo.  However I can answer questions and try to
> > > do a writeup in my spare time.
> > > On Dec 21, 2012 11:19 AM, "Wei Bo" <bi...@live.com> wrote:
> > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi all, Is there a document/notes available describing how the Apache
> > > > Traffic Server disk storage part of the cache is implemented/data
> > > > structures/layout? I would like to understand how the disk cache would
> > > > behave under various extreme conditions (lots of little objects, big
> > > > objects) and it's been hard trying to understand the disk cache just by
> > > > reading the code. Something like
> > > > https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo
> > > >
> >
> >
 		 	   		  

Re: How disk storage is implemented

Posted by James Peach <jp...@apache.org>.
On 23/12/2012, at 12:10 PM, John Plevyak <jp...@acm.org> wrote:

> I added a bit of a design document.
> 
> https://cwiki.apache.org/confluence/display/TS/DiskStorageLayout
> 
> I didn't include anything about how the initial CacheKey is generated
> because that is a function of HTTP (i.e. what does actually constitute a
> "unique" document).

Thanks John, this is great. I linked Alan's cache object diagram.

> 
> john
> 
> On Fri, Dec 21, 2012 at 7:57 PM, Wei Bo <bi...@live.com> wrote:
> 
>> 
>> Thanks in advance. Couple of things that would be good to start (please
>> excuse my ignorance if these questions seem silly): - What does the
>> ``evacuate'' process refer to in the code? Taking stuff that was buffered
>> in memory and writing it to disk?- How is the metadata organized for the
>> on-disk data structures? It is a hash table? B+ tree? Something else?- How
>> much metadata is kept in the memory concerning the disk structures? I see
>> in the code a Lookaside table, but  am not clear what mapping is being
>> stored there.- Is there any recovery process to address inconsistency
>> following an unclean (say power outage) shutdown of the cache?- How is a
>> CacheKey generated from the http object key (e.g. host + url + vary headers
>> + alternates)? Looking at the code, it seems that:   - Vol corresponds to a
>> disk file or raw disk partition.  - Within Vol we have Dir which seem to be
>> storing metadata about actual datablocks? (There is    mention how Dir
>> nodes are allocated every 8k block) These seem to be chained together to
>> comprise the     Document (for example Document stored over a number of Dir
>> blocks). Is this on track, or am I totally off base?  Thanks > Date: Fri,
>> 21 Dec 2012 14:45:31 -0800
>>> Subject: Re: How disk storage is implemented
>>> From: jplevyak@gmail.com
>>> To: dev@trafficserver.apache.org
>>> 
>>> Unfortunately most of the docs were lost when the code was in storage
>>> before being revived at yahoo.  However I can answer questions and try to
>>> do a writeup in my spare time.
>>> On Dec 21, 2012 11:19 AM, "Wei Bo" <bi...@live.com> wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hi all, Is there a document/notes available describing how the Apache
>>>> Traffic Server disk storage part of the cache is implemented/data
>>>> structures/layout? I would like to understand how the disk cache would
>>>> behave under various extreme conditions (lots of little objects, big
>>>> objects) and it's been hard trying to understand the disk cache just by
>>>> reading the code. Something like
>>>> https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo
>>>> 
>> 
>> 


Re: How disk storage is implemented

Posted by John Plevyak <jp...@acm.org>.
I added a bit of a design document.

https://cwiki.apache.org/confluence/display/TS/DiskStorageLayout

I didn't include anything about how the initial CacheKey is generated
because that is a function of HTTP (i.e. what does actually constitute a
"unique" document).

john

On Fri, Dec 21, 2012 at 7:57 PM, Wei Bo <bi...@live.com> wrote:

>
> Thanks in advance. Couple of things that would be good to start (please
> excuse my ignorance if these questions seem silly): - What does the
> ``evacuate'' process refer to in the code? Taking stuff that was buffered
> in memory and writing it to disk?- How is the metadata organized for the
> on-disk data structures? It is a hash table? B+ tree? Something else?- How
> much metadata is kept in the memory concerning the disk structures? I see
> in the code a Lookaside table, but  am not clear what mapping is being
> stored there.- Is there any recovery process to address inconsistency
> following an unclean (say power outage) shutdown of the cache?- How is a
> CacheKey generated from the http object key (e.g. host + url + vary headers
> + alternates)? Looking at the code, it seems that:   - Vol corresponds to a
> disk file or raw disk partition.  - Within Vol we have Dir which seem to be
> storing metadata about actual datablocks? (There is    mention how Dir
> nodes are allocated every 8k block) These seem to be chained together to
> comprise the     Document (for example Document stored over a number of Dir
> blocks). Is this on track, or am I totally off base?  Thanks > Date: Fri,
> 21 Dec 2012 14:45:31 -0800
> > Subject: Re: How disk storage is implemented
> > From: jplevyak@gmail.com
> > To: dev@trafficserver.apache.org
> >
> > Unfortunately most of the docs were lost when the code was in storage
> > before being revived at yahoo.  However I can answer questions and try to
> > do a writeup in my spare time.
> > On Dec 21, 2012 11:19 AM, "Wei Bo" <bi...@live.com> wrote:
> >
> > >
> > >
> > >
> > >
> > > Hi all, Is there a document/notes available describing how the Apache
> > > Traffic Server disk storage part of the cache is implemented/data
> > > structures/layout? I would like to understand how the disk cache would
> > > behave under various extreme conditions (lots of little objects, big
> > > objects) and it's been hard trying to understand the disk cache just by
> > > reading the code. Something like
> > > https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo
> > >
>
>

RE: How disk storage is implemented

Posted by Wei Bo <bi...@live.com>.
Thanks in advance. Couple of things that would be good to start (please excuse my ignorance if these questions seem silly): - What does the ``evacuate'' process refer to in the code? Taking stuff that was buffered in memory and writing it to disk?- How is the metadata organized for the on-disk data structures? It is a hash table? B+ tree? Something else?- How much metadata is kept in the memory concerning the disk structures? I see in the code a Lookaside table, but  am not clear what mapping is being stored there.- Is there any recovery process to address inconsistency following an unclean (say power outage) shutdown of the cache?- How is a CacheKey generated from the http object key (e.g. host + url + vary headers + alternates)? Looking at the code, it seems that:   - Vol corresponds to a disk file or raw disk partition.  - Within Vol we have Dir which seem to be storing metadata about actual datablocks? (There is    mention how Dir nodes are allocated every 8k block) These seem to be chained together to comprise the     Document (for example Document stored over a number of Dir blocks). Is this on track, or am I totally off base?  Thanks > Date: Fri, 21 Dec 2012 14:45:31 -0800
> Subject: Re: How disk storage is implemented
> From: jplevyak@gmail.com
> To: dev@trafficserver.apache.org
> 
> Unfortunately most of the docs were lost when the code was in storage
> before being revived at yahoo.  However I can answer questions and try to
> do a writeup in my spare time.
> On Dec 21, 2012 11:19 AM, "Wei Bo" <bi...@live.com> wrote:
> 
> >
> >
> >
> >
> > Hi all, Is there a document/notes available describing how the Apache
> > Traffic Server disk storage part of the cache is implemented/data
> > structures/layout? I would like to understand how the disk cache would
> > behave under various extreme conditions (lots of little objects, big
> > objects) and it's been hard trying to understand the disk cache just by
> > reading the code. Something like
> > https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo
> >
 		 	   		  

Re: How disk storage is implemented

Posted by John Plevyak <jp...@gmail.com>.
Unfortunately most of the docs were lost when the code was in storage
before being revived at yahoo.  However I can answer questions and try to
do a writeup in my spare time.
On Dec 21, 2012 11:19 AM, "Wei Bo" <bi...@live.com> wrote:

>
>
>
>
> Hi all, Is there a document/notes available describing how the Apache
> Traffic Server disk storage part of the cache is implemented/data
> structures/layout? I would like to understand how the disk cache would
> behave under various extreme conditions (lots of little objects, big
> objects) and it's been hard trying to understand the disk cache just by
> reading the code. Something like
> https://cwiki.apache.org/TS/ramcache.html... Thanks,Wei Bo
>