You are viewing a plain text version of this content. The canonical link for it is here.

Posted to olio-dev@incubator.apache.org by "Akara Sucharitakul (JIRA)" <ji...@apache.org> on 2008/11/18 20:23:44 UTC

[jira] Created: (OLIO-12) Caching needs to be implemented in Rails application

Caching needs to be implemented in Rails application
----------------------------------------------------

                 Key: OLIO-12
                 URL: https://issues.apache.org/jira/browse/OLIO-12
             Project: Olio
          Issue Type: Improvement
          Components: rails-app
            Reporter: Akara Sucharitakul
            Assignee: Shanti Subramanyam


We need to implement caching in the rails version (as well as PHP version, see Issue#3).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

Posted by Shanti Subramanyam - PAE <Sh...@Sun.COM>.

On 01/23/09 09:15 AM, William Sobel wrote:
> 
> This writeup was taken from the write-board we started with Akara back 
> quite a few months ago. We tried to do what made sense for a Rails 
> application using page/action/fragment caching. A lot of the event 
> detail is fragment cached whereas the home page is page cached for 
> non-logged in users and fragment cached for logged-in users. Are you 
> doing a similar split?
> 

Yes.

> Running with low db load is not invalid, many application are attempting 
> to do just that. I'm surprised it made such a large difference, what 
> percentage of hits are homepage hits for non-logged in users?
> 

Not sure - but I assume it is exactly the same as for rails as this is 
determined by the driver.

> I think Akara's idea is a good one. We should also try to stress other 
> components as well. I think a memcache heavy load would be interesting 
> to test. We can create a special branch of the rails application that 
> caches the thumbs in memcached, this is no problem. 

Glad you think it's a good idea. We'll go for it then.
Akara: Can you please revise the caching doc to add details about this ? 
We should also probably post the doc somewhere - perhaps add it to the 
specification doc that is in docs/app_spec.html.

 > Currently we're
 > using the proxy server to serve the images, aren't you doing the same
 > with apache?
 >

Yes - but since php is running inside of apache, we're not really 
off-loading the php server. That's what I meant. We run a single tier 
server, where as rails apps typically run two so the front-end can serve 
the file data.

> I was thinking of setting nginx up to use the memcached module (it is 
> reported to give a 4x improvement. This may also be relevant for your 
> tests as well. http://wiki.codemongers.com/NginxHttpMemcachedModule.
> 
> http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/
> 

This does look interesting and can serve as a great test for memcached 
as well.
But the questions I have are these :
. What is the typical deployment architecture used by rails apps ?
. Is it reasonable for us to promote this nginx/memcached, thin 
architecture ?
. How flexible do we want Olio to be in terms of allowing different 
types of deployment ?


Shanti

> As Shanti said, additional input would be greatly appreciated!
> 
> - Will
> 
> On Jan 23, 2009, at 8:56 AM, Shanti Subramanyam wrote:
> 
>> Thanks Will. At this point, the PHP app is only caching the home page. 
>> We are wondering whether to even do the Event Detail page as the load 
>> on the database has been drastically cut down just from the home page 
>> caching.
>> It's a dilemma - if we cache too much, there is no load on the db. If 
>> we cache too little, there is nothing much in memcached. Of course, if 
>> we run a much larger scale (say, 10's of systems for the web tier), 
>> then I'm sure we'll see increasing load on both tiers. But practically 
>> speaking, we need to be able to run a reasonable configuration.
>>
>> Akara has another idea to use memcached more heavily, while at the 
>> same time not reducing the db load. Namely, cache the thumbnails in 
>> it. This will also reduce the load on the filestore (which currently 
>> is quite heavily stressed for the PHP app). But this strategy won't 
>> work for the rails app will it ?  I believe you're serving all static 
>> files out of the proxy server ?
>>
>> Would love to hear what others think as well.
>>
>> Shanti
>>
>> William Sobel wrote:
>>>
>>> On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
>>>
>>>> Can you please elaborate on what exactly is cached ? How is the 
>>>> cache managed (in terms of timeouts etc.) ?
>>>
>>> From the original writeup:
>>>
>>> Cache Strategy for Web20Kit
>>>
>>> Home Page
>>>
>>> The home page will be cached in two forms:
>>>
>>> 1. Cached as a whole page accessed by users arriving at the site and 
>>> users that are not logged on.
>>> 2. Cached as a page fragment, just for the content part. The page 
>>> will be constructed from the dynamic header which contains the user 
>>> name of the current user and the cached content fragment.
>>> 3. Paginations – these will be cached up to 5 pages. It is less 
>>> likely for users to search for events beyond the fifth page.
>>>
>>> Expiration and re-generation
>>>
>>> The home page will expire every 120 seconds. Then the page will be 
>>> re-generated by one of the first requests arriving after the 
>>> expiration. To prevent all requests arriving after the expiration 
>>> from re-generating, thus causing a stampede phenomenon, we will use a 
>>> lock/semaphore control mechanism as follows:
>>>
>>> 1. The home page and/or home page fragment is cached with no timeout 
>>> or a very large timeout (in the order of magnitude of days) in 
>>> memcached.
>>> 2. For each cached page, a small semaphore object is placed into 
>>> memcached with a timeout of 120 seconds – the regeneration cycle.
>>> 3. After accessing the page/fragment in the cache and sending the 
>>> response to the user, the cache client (web server) checks to see 
>>> whether the semaphore is there or has timed out. If it is not there 
>>> (timed out), the client will attempt to re-generate the page or 
>>> fragment.
>>> 4. To prevent a stampede, the client ‘adds’ a lock entry into the 
>>> cache. If the add succeeds, this thread has the lock. The lock times 
>>> out after 20 seconds using the memcached timeout mechanism. This 
>>> prevents a thread to hold a lock indefinitely.
>>> 5. After obtaining the lock, the thread generates the page or 
>>> fragment and replaces the copy in memcached.
>>> 6. Then the generating thread places a new semaphore object with the 
>>> same timeout period and removes the lock object.
>>>
>>> Event Detail Page
>>>
>>> The event detail page is cached as both content and, if not logged 
>>> on, the whole page as well.
>>>
>>> Expiration and re-generation
>>>
>>> Event detail page cache entries have a time out of 30 seconds using 
>>> the cache timeout mechanism of memcached. Thus only frequently 
>>> accessed events will remain in the cache. The load generator will 
>>> need to be designed to access event detail pages in a non-uniform 
>>> manner, too. We will use a locking mechanism for the event detail 
>>> page in a similar manner to the home page. However, we will not use 
>>> an expiry semaphore and let the page expire from the cache as a 
>>> whole. Access to the entry should however renew the expiry time so 
>>> that frequently accessed events will stay in cache. The mechanism 
>>> will work as follows:
>>>
>>> 1. The event detail page and fragment is cached with a timeout of 30 
>>> seconds.
>>> 2. As a cache client needs to access the entry, it will try to read 
>>> the entry from the cache. If the entry is available, it will extend 
>>> the cache timeout. Otherwise, the event detail page is generated from 
>>> the database.
>>> 3. To regenerate the page and prevent stampede, the client ‘adds’ a 
>>> lock entry into the cache. If the add succeeds, this thread has the 
>>> lock. The lock times out after 20 seconds using the memcached timeout 
>>> mechanism. This prevents a thread to hold a lock inidefinitely.
>>> 4. After obtaining the lock, the thread proceeds with generating the 
>>> page. After completion, the page gets placed into the cache and the 
>>> lock gets removed from memcached.
>>> 5. If we do not get the lock (add fails). We stay in a loop, sleep 
>>> for 200ms, and check/re-check whether the page matches. We keep 
>>> checking till a timeout of 5 seconds (25 iterations).
>>> 6. The attendee list and comments/rating fragments of this page is 
>>> cached in the same manner. Those sections will be re-generated while 
>>> holding a lock object in the same manner. They will be regenerated if 
>>> the fragment is not in the cache, and on or after updating of those 
>>> fragments (i.e. somebody makes a comment or signed up to attend this 
>>> event).
>>>
>>> Other Pages
>>>
>>> At this point, none of the other pages and/or their fragments are 
>>> cached. Most of the other pages are accessed at low frequency with 
>>> the exception of the tag search page. The tag search page is the next 
>>> candidate for caching and pre-generation. The caching strategy is 
>>> still to be determined.
>>>
>>> Page Caches with Ruby on Rails
>>>
>>> Ruby on Rails does not natively use memcached for whole page caches. 
>>> It can do so with caching page fragments. Instead, it will generate 
>>> static pages as files and the request will be routed to the 
>>> corresponding file that represents a fully rendered page.
>>>
>>> The Ruby on Rails implementation of Web20Kit will use the native 
>>> Rails mechanism for full page caches. Expirations result in a call to 
>>> remove the file and follow the same expiry policy defined for each 
>>> page, above. The file must be removed as the page cache expires, 
>>> either by a request arriving after expiry, or by a background job.
>>>
>>>
>>> Cheers,
>>> - Will Sobel
>>>
>>>
> 
> 
> Cheers,
> - Will Sobel
>

Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

Posted by William Sobel <ws...@eecs.berkeley.edu>.

This writeup was taken from the write-board we started with Akara back  
quite a few months ago. We tried to do what made sense for a Rails  
application using page/action/fragment caching. A lot of the event  
detail is fragment cached whereas the home page is page cached for non- 
logged in users and fragment cached for logged-in users. Are you doing  
a similar split?

Running with low db load is not invalid, many application are  
attempting to do just that. I'm surprised it made such a large  
difference, what percentage of hits are homepage hits for non-logged  
in users?

I think Akara's idea is a good one. We should also try to stress other  
components as well. I think a memcache heavy load would be interesting  
to test. We can create a special branch of the rails application that  
caches the thumbs in memcached, this is no problem. Currently we're  
using the proxy server to serve the images, aren't you doing the same  
with apache?

I was thinking of setting nginx up to use the memcached module (it is  
reported to give a 4x improvement. This may also be relevant for your  
tests as well. http://wiki.codemongers.com/NginxHttpMemcachedModule.

http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/

As Shanti said, additional input would be greatly appreciated!

- Will

On Jan 23, 2009, at 8:56 AM, Shanti Subramanyam wrote:

> Thanks Will. At this point, the PHP app is only caching the home  
> page. We are wondering whether to even do the Event Detail page as  
> the load on the database has been drastically cut down just from the  
> home page caching.
> It's a dilemma - if we cache too much, there is no load on the db.  
> If we cache too little, there is nothing much in memcached. Of  
> course, if we run a much larger scale (say, 10's of systems for the  
> web tier), then I'm sure we'll see increasing load on both tiers.  
> But practically speaking, we need to be able to run a reasonable  
> configuration.
>
> Akara has another idea to use memcached more heavily, while at the  
> same time not reducing the db load. Namely, cache the thumbnails in  
> it. This will also reduce the load on the filestore (which currently  
> is quite heavily stressed for the PHP app). But this strategy won't  
> work for the rails app will it ?  I believe you're serving all  
> static files out of the proxy server ?
>
> Would love to hear what others think as well.
>
> Shanti
>
> William Sobel wrote:
>>
>> On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
>>
>>> Can you please elaborate on what exactly is cached ? How is the  
>>> cache managed (in terms of timeouts etc.) ?
>>
>> From the original writeup:
>>
>> Cache Strategy for Web20Kit
>>
>> Home Page
>>
>> The home page will be cached in two forms:
>>
>> 1. Cached as a whole page accessed by users arriving at the site  
>> and users that are not logged on.
>> 2. Cached as a page fragment, just for the content part. The page  
>> will be constructed from the dynamic header which contains the user  
>> name of the current user and the cached content fragment.
>> 3. Paginations – these will be cached up to 5 pages. It is less  
>> likely for users to search for events beyond the fifth page.
>>
>> Expiration and re-generation
>>
>> The home page will expire every 120 seconds. Then the page will be  
>> re-generated by one of the first requests arriving after the  
>> expiration. To prevent all requests arriving after the expiration  
>> from re-generating, thus causing a stampede phenomenon, we will use  
>> a lock/semaphore control mechanism as follows:
>>
>> 1. The home page and/or home page fragment is cached with no  
>> timeout or a very large timeout (in the order of magnitude of days)  
>> in memcached.
>> 2. For each cached page, a small semaphore object is placed into  
>> memcached with a timeout of 120 seconds – the regeneration cycle.
>> 3. After accessing the page/fragment in the cache and sending the  
>> response to the user, the cache client (web server) checks to see  
>> whether the semaphore is there or has timed out. If it is not there  
>> (timed out), the client will attempt to re-generate the page or  
>> fragment.
>> 4. To prevent a stampede, the client ‘adds’ a lock entry into the  
>> cache. If the add succeeds, this thread has the lock. The lock  
>> times out after 20 seconds using the memcached timeout mechanism.  
>> This prevents a thread to hold a lock indefinitely.
>> 5. After obtaining the lock, the thread generates the page or  
>> fragment and replaces the copy in memcached.
>> 6. Then the generating thread places a new semaphore object with  
>> the same timeout period and removes the lock object.
>>
>> Event Detail Page
>>
>> The event detail page is cached as both content and, if not logged  
>> on, the whole page as well.
>>
>> Expiration and re-generation
>>
>> Event detail page cache entries have a time out of 30 seconds using  
>> the cache timeout mechanism of memcached. Thus only frequently  
>> accessed events will remain in the cache. The load generator will  
>> need to be designed to access event detail pages in a non-uniform  
>> manner, too. We will use a locking mechanism for the event detail  
>> page in a similar manner to the home page. However, we will not use  
>> an expiry semaphore and let the page expire from the cache as a  
>> whole. Access to the entry should however renew the expiry time so  
>> that frequently accessed events will stay in cache. The mechanism  
>> will work as follows:
>>
>> 1. The event detail page and fragment is cached with a timeout of  
>> 30 seconds.
>> 2. As a cache client needs to access the entry, it will try to read  
>> the entry from the cache. If the entry is available, it will extend  
>> the cache timeout. Otherwise, the event detail page is generated  
>> from the database.
>> 3. To regenerate the page and prevent stampede, the client ‘adds’ a  
>> lock entry into the cache. If the add succeeds, this thread has the  
>> lock. The lock times out after 20 seconds using the memcached  
>> timeout mechanism. This prevents a thread to hold a lock  
>> inidefinitely.
>> 4. After obtaining the lock, the thread proceeds with generating  
>> the page. After completion, the page gets placed into the cache and  
>> the lock gets removed from memcached.
>> 5. If we do not get the lock (add fails). We stay in a loop, sleep  
>> for 200ms, and check/re-check whether the page matches. We keep  
>> checking till a timeout of 5 seconds (25 iterations).
>> 6. The attendee list and comments/rating fragments of this page is  
>> cached in the same manner. Those sections will be re-generated  
>> while holding a lock object in the same manner. They will be  
>> regenerated if the fragment is not in the cache, and on or after  
>> updating of those fragments (i.e. somebody makes a comment or  
>> signed up to attend this event).
>>
>> Other Pages
>>
>> At this point, none of the other pages and/or their fragments are  
>> cached. Most of the other pages are accessed at low frequency with  
>> the exception of the tag search page. The tag search page is the  
>> next candidate for caching and pre-generation. The caching strategy  
>> is still to be determined.
>>
>> Page Caches with Ruby on Rails
>>
>> Ruby on Rails does not natively use memcached for whole page  
>> caches. It can do so with caching page fragments. Instead, it will  
>> generate static pages as files and the request will be routed to  
>> the corresponding file that represents a fully rendered page.
>>
>> The Ruby on Rails implementation of Web20Kit will use the native  
>> Rails mechanism for full page caches. Expirations result in a call  
>> to remove the file and follow the same expiry policy defined for  
>> each page, above. The file must be removed as the page cache  
>> expires, either by a request arriving after expiry, or by a  
>> background job.
>>
>>
>> Cheers,
>> - Will Sobel
>>
>>


Cheers,
- Will Sobel

Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

Posted by Shanti Subramanyam - PAE <Sh...@Sun.COM>.

On 01/25/09 03:39 PM, Amanda waite wrote:
> Shanti Subramanyam wrote:
>> Thanks Will. At this point, the PHP app is only caching the home page. 
>> We are wondering whether to even do the Event Detail page as the load 
>> on the database has been drastically cut down just from the home page 
>> caching.
>> It's a dilemma - if we cache too much, there is no load on the db. If 
>> we cache too little, there is nothing much in memcached. Of course, if 
>> we run a much larger scale (say, 10's of systems for the web tier), 
>> then I'm sure we'll see increasing load on both tiers. But practically 
>> speaking, we need to be able to run a reasonable configuration.
> 
> IMO reducing the load on the DB should be a high order goal of any 
> testing. The DB is a bottleneck, anything that can be cached should be 
> cached and if you want to test DB performance ramp up the number of 
> users until the DB struggles and then partition it and then partition it 
> some more.
> 

The problem of course is that it can get quite impractical to keep 
adding users. It means tons of hardware at the web tier level, not to 
mention the difficulties associated with scaling any workload.

>>
>> Akara has another idea to use memcached more heavily, while at the 
>> same time not reducing the db load. Namely, cache the thumbnails in 
>> it. This will also reduce the load on the filestore (which currently 
>> is quite heavily stressed for the PHP app). But this strategy won't 
>> work for the rails app will it ?  I believe you're serving all static 
>> files out of the proxy server ?
> 
> I'm looking to do as Will suggests, serve the static files from Nginx 
> and cache them with the Nginx memcached module. At the moment I'm in 
> Nginx/Thin heaven as they work so well together and I've barely 
> scratched the surface of what's possible.
> 

Heaven ? You must be kidding ! Come to the PHP world and see what's 
possible :-)

> Amanda
>>

Shanti

>> Would love to hear what others think as well.
>>
>> Shanti
>>
>> William Sobel wrote:
>>>
>>> On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
>>>
>>>> Can you please elaborate on what exactly is cached ? How is the 
>>>> cache managed (in terms of timeouts etc.) ?
>>>
>>> From the original writeup:
>>>
>>> Cache Strategy for Web20Kit
>>>
>>> Home Page
>>>
>>> The home page will be cached in two forms:
>>>
>>> 1. Cached as a whole page accessed by users arriving at the site and 
>>> users that are not logged on.
>>> 2. Cached as a page fragment, just for the content part. The page 
>>> will be constructed from the dynamic header which contains the user 
>>> name of the current user and the cached content fragment.
>>> 3. Paginations – these will be cached up to 5 pages. It is less 
>>> likely for users to search for events beyond the fifth page.
>>>
>>> Expiration and re-generation
>>>
>>> The home page will expire every 120 seconds. Then the page will be 
>>> re-generated by one of the first requests arriving after the 
>>> expiration. To prevent all requests arriving after the expiration 
>>> from re-generating, thus causing a stampede phenomenon, we will use a 
>>> lock/semaphore control mechanism as follows:
>>>
>>> 1. The home page and/or home page fragment is cached with no timeout 
>>> or a very large timeout (in the order of magnitude of days) in 
>>> memcached.
>>> 2. For each cached page, a small semaphore object is placed into 
>>> memcached with a timeout of 120 seconds – the regeneration cycle.
>>> 3. After accessing the page/fragment in the cache and sending the 
>>> response to the user, the cache client (web server) checks to see 
>>> whether the semaphore is there or has timed out. If it is not there 
>>> (timed out), the client will attempt to re-generate the page or 
>>> fragment.
>>> 4. To prevent a stampede, the client ‘adds’ a lock entry into the 
>>> cache. If the add succeeds, this thread has the lock. The lock times 
>>> out after 20 seconds using the memcached timeout mechanism. This 
>>> prevents a thread to hold a lock indefinitely.
>>> 5. After obtaining the lock, the thread generates the page or 
>>> fragment and replaces the copy in memcached.
>>> 6. Then the generating thread places a new semaphore object with the 
>>> same timeout period and removes the lock object.
>>>
>>> Event Detail Page
>>>
>>> The event detail page is cached as both content and, if not logged 
>>> on, the whole page as well.
>>>
>>> Expiration and re-generation
>>>
>>> Event detail page cache entries have a time out of 30 seconds using 
>>> the cache timeout mechanism of memcached. Thus only frequently 
>>> accessed events will remain in the cache. The load generator will 
>>> need to be designed to access event detail pages in a non-uniform 
>>> manner, too. We will use a locking mechanism for the event detail 
>>> page in a similar manner to the home page. However, we will not use 
>>> an expiry semaphore and let the page expire from the cache as a 
>>> whole. Access to the entry should however renew the expiry time so 
>>> that frequently accessed events will stay in cache. The mechanism 
>>> will work as follows:
>>>
>>> 1. The event detail page and fragment is cached with a timeout of 30 
>>> seconds.
>>> 2. As a cache client needs to access the entry, it will try to read 
>>> the entry from the cache. If the entry is available, it will extend 
>>> the cache timeout. Otherwise, the event detail page is generated from 
>>> the database.
>>> 3. To regenerate the page and prevent stampede, the client ‘adds’ a 
>>> lock entry into the cache. If the add succeeds, this thread has the 
>>> lock. The lock times out after 20 seconds using the memcached timeout 
>>> mechanism. This prevents a thread to hold a lock inidefinitely.
>>> 4. After obtaining the lock, the thread proceeds with generating the 
>>> page. After completion, the page gets placed into the cache and the 
>>> lock gets removed from memcached.
>>> 5. If we do not get the lock (add fails). We stay in a loop, sleep 
>>> for 200ms, and check/re-check whether the page matches. We keep 
>>> checking till a timeout of 5 seconds (25 iterations).
>>> 6. The attendee list and comments/rating fragments of this page is 
>>> cached in the same manner. Those sections will be re-generated while 
>>> holding a lock object in the same manner. They will be regenerated if 
>>> the fragment is not in the cache, and on or after updating of those 
>>> fragments (i.e. somebody makes a comment or signed up to attend this 
>>> event).
>>>
>>> Other Pages
>>>
>>> At this point, none of the other pages and/or their fragments are 
>>> cached. Most of the other pages are accessed at low frequency with 
>>> the exception of the tag search page. The tag search page is the next 
>>> candidate for caching and pre-generation. The caching strategy is 
>>> still to be determined.
>>>
>>> Page Caches with Ruby on Rails
>>>
>>> Ruby on Rails does not natively use memcached for whole page caches. 
>>> It can do so with caching page fragments. Instead, it will generate 
>>> static pages as files and the request will be routed to the 
>>> corresponding file that represents a fully rendered page.
>>>
>>> The Ruby on Rails implementation of Web20Kit will use the native 
>>> Rails mechanism for full page caches. Expirations result in a call to 
>>> remove the file and follow the same expiry policy defined for each 
>>> page, above. The file must be removed as the page cache expires, 
>>> either by a request arriving after expiry, or by a background job.
>>>
>>>
>>> Cheers,
>>> - Will Sobel
>>>
>>>
>

Re: What to cache ? (was Re: Caching needs to be implemented in Rails application)

Posted by Amanda waite <Am...@Sun.COM>.

Shanti Subramanyam wrote:
> Thanks Will. At this point, the PHP app is only caching the home page. 
> We are wondering whether to even do the Event Detail page as the load 
> on the database has been drastically cut down just from the home page 
> caching.
> It's a dilemma - if we cache too much, there is no load on the db. If 
> we cache too little, there is nothing much in memcached. Of course, if 
> we run a much larger scale (say, 10's of systems for the web tier), 
> then I'm sure we'll see increasing load on both tiers. But practically 
> speaking, we need to be able to run a reasonable configuration.

IMO reducing the load on the DB should be a high order goal of any 
testing. The DB is a bottleneck, anything that can be cached should be 
cached and if you want to test DB performance ramp up the number of 
users until the DB struggles and then partition it and then partition it 
some more.

>
> Akara has another idea to use memcached more heavily, while at the 
> same time not reducing the db load. Namely, cache the thumbnails in 
> it. This will also reduce the load on the filestore (which currently 
> is quite heavily stressed for the PHP app). But this strategy won't 
> work for the rails app will it ?  I believe you're serving all static 
> files out of the proxy server ?

I'm looking to do as Will suggests, serve the static files from Nginx 
and cache them with the Nginx memcached module. At the moment I'm in 
Nginx/Thin heaven as they work so well together and I've barely 
scratched the surface of what's possible.

Amanda
>
> Would love to hear what others think as well.
>
> Shanti
>
> William Sobel wrote:
>>
>> On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
>>
>>> Can you please elaborate on what exactly is cached ? How is the 
>>> cache managed (in terms of timeouts etc.) ?
>>
>> From the original writeup:
>>
>> Cache Strategy for Web20Kit
>>
>> Home Page
>>
>> The home page will be cached in two forms:
>>
>> 1. Cached as a whole page accessed by users arriving at the site and 
>> users that are not logged on.
>> 2. Cached as a page fragment, just for the content part. The page 
>> will be constructed from the dynamic header which contains the user 
>> name of the current user and the cached content fragment.
>> 3. Paginations – these will be cached up to 5 pages. It is less 
>> likely for users to search for events beyond the fifth page.
>>
>> Expiration and re-generation
>>
>> The home page will expire every 120 seconds. Then the page will be 
>> re-generated by one of the first requests arriving after the 
>> expiration. To prevent all requests arriving after the expiration 
>> from re-generating, thus causing a stampede phenomenon, we will use a 
>> lock/semaphore control mechanism as follows:
>>
>> 1. The home page and/or home page fragment is cached with no timeout 
>> or a very large timeout (in the order of magnitude of days) in 
>> memcached.
>> 2. For each cached page, a small semaphore object is placed into 
>> memcached with a timeout of 120 seconds – the regeneration cycle.
>> 3. After accessing the page/fragment in the cache and sending the 
>> response to the user, the cache client (web server) checks to see 
>> whether the semaphore is there or has timed out. If it is not there 
>> (timed out), the client will attempt to re-generate the page or 
>> fragment.
>> 4. To prevent a stampede, the client ‘adds’ a lock entry into the 
>> cache. If the add succeeds, this thread has the lock. The lock times 
>> out after 20 seconds using the memcached timeout mechanism. This 
>> prevents a thread to hold a lock indefinitely.
>> 5. After obtaining the lock, the thread generates the page or 
>> fragment and replaces the copy in memcached.
>> 6. Then the generating thread places a new semaphore object with the 
>> same timeout period and removes the lock object.
>>
>> Event Detail Page
>>
>> The event detail page is cached as both content and, if not logged 
>> on, the whole page as well.
>>
>> Expiration and re-generation
>>
>> Event detail page cache entries have a time out of 30 seconds using 
>> the cache timeout mechanism of memcached. Thus only frequently 
>> accessed events will remain in the cache. The load generator will 
>> need to be designed to access event detail pages in a non-uniform 
>> manner, too. We will use a locking mechanism for the event detail 
>> page in a similar manner to the home page. However, we will not use 
>> an expiry semaphore and let the page expire from the cache as a 
>> whole. Access to the entry should however renew the expiry time so 
>> that frequently accessed events will stay in cache. The mechanism 
>> will work as follows:
>>
>> 1. The event detail page and fragment is cached with a timeout of 30 
>> seconds.
>> 2. As a cache client needs to access the entry, it will try to read 
>> the entry from the cache. If the entry is available, it will extend 
>> the cache timeout. Otherwise, the event detail page is generated from 
>> the database.
>> 3. To regenerate the page and prevent stampede, the client ‘adds’ a 
>> lock entry into the cache. If the add succeeds, this thread has the 
>> lock. The lock times out after 20 seconds using the memcached timeout 
>> mechanism. This prevents a thread to hold a lock inidefinitely.
>> 4. After obtaining the lock, the thread proceeds with generating the 
>> page. After completion, the page gets placed into the cache and the 
>> lock gets removed from memcached.
>> 5. If we do not get the lock (add fails). We stay in a loop, sleep 
>> for 200ms, and check/re-check whether the page matches. We keep 
>> checking till a timeout of 5 seconds (25 iterations).
>> 6. The attendee list and comments/rating fragments of this page is 
>> cached in the same manner. Those sections will be re-generated while 
>> holding a lock object in the same manner. They will be regenerated if 
>> the fragment is not in the cache, and on or after updating of those 
>> fragments (i.e. somebody makes a comment or signed up to attend this 
>> event).
>>
>> Other Pages
>>
>> At this point, none of the other pages and/or their fragments are 
>> cached. Most of the other pages are accessed at low frequency with 
>> the exception of the tag search page. The tag search page is the next 
>> candidate for caching and pre-generation. The caching strategy is 
>> still to be determined.
>>
>> Page Caches with Ruby on Rails
>>
>> Ruby on Rails does not natively use memcached for whole page caches. 
>> It can do so with caching page fragments. Instead, it will generate 
>> static pages as files and the request will be routed to the 
>> corresponding file that represents a fully rendered page.
>>
>> The Ruby on Rails implementation of Web20Kit will use the native 
>> Rails mechanism for full page caches. Expirations result in a call to 
>> remove the file and follow the same expiry policy defined for each 
>> page, above. The file must be removed as the page cache expires, 
>> either by a request arriving after expiry, or by a background job.
>>
>>
>> Cheers,
>> - Will Sobel
>>
>>

What to cache ? (was Re: Caching needs to be implemented in Rails application)

Posted by Shanti Subramanyam <Sh...@Sun.COM>.

Thanks Will. At this point, the PHP app is only caching the home page. 
We are wondering whether to even do the Event Detail page as the load on 
the database has been drastically cut down just from the home page caching.
It's a dilemma - if we cache too much, there is no load on the db. If we 
cache too little, there is nothing much in memcached. Of course, if we 
run a much larger scale (say, 10's of systems for the web tier), then 
I'm sure we'll see increasing load on both tiers. But practically 
speaking, we need to be able to run a reasonable configuration.

Akara has another idea to use memcached more heavily, while at the same 
time not reducing the db load. Namely, cache the thumbnails in it. This 
will also reduce the load on the filestore (which currently is quite 
heavily stressed for the PHP app). But this strategy won't work for the 
rails app will it ?  I believe you're serving all static files out of 
the proxy server ?

Would love to hear what others think as well.

Shanti

William Sobel wrote:
>
> On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
>
>> Can you please elaborate on what exactly is cached ? How is the cache 
>> managed (in terms of timeouts etc.) ?
>
> From the original writeup:
>
> Cache Strategy for Web20Kit
>
> Home Page
>
> The home page will be cached in two forms:
>
> 1. Cached as a whole page accessed by users arriving at the site and 
> users that are not logged on.
> 2. Cached as a page fragment, just for the content part. The page will 
> be constructed from the dynamic header which contains the user name of 
> the current user and the cached content fragment.
> 3. Paginations – these will be cached up to 5 pages. It is less likely 
> for users to search for events beyond the fifth page.
>
> Expiration and re-generation
>
> The home page will expire every 120 seconds. Then the page will be 
> re-generated by one of the first requests arriving after the 
> expiration. To prevent all requests arriving after the expiration from 
> re-generating, thus causing a stampede phenomenon, we will use a 
> lock/semaphore control mechanism as follows:
>
> 1. The home page and/or home page fragment is cached with no timeout 
> or a very large timeout (in the order of magnitude of days) in memcached.
> 2. For each cached page, a small semaphore object is placed into 
> memcached with a timeout of 120 seconds – the regeneration cycle.
> 3. After accessing the page/fragment in the cache and sending the 
> response to the user, the cache client (web server) checks to see 
> whether the semaphore is there or has timed out. If it is not there 
> (timed out), the client will attempt to re-generate the page or fragment.
> 4. To prevent a stampede, the client ‘adds’ a lock entry into the 
> cache. If the add succeeds, this thread has the lock. The lock times 
> out after 20 seconds using the memcached timeout mechanism. This 
> prevents a thread to hold a lock indefinitely.
> 5. After obtaining the lock, the thread generates the page or fragment 
> and replaces the copy in memcached.
> 6. Then the generating thread places a new semaphore object with the 
> same timeout period and removes the lock object.
>
> Event Detail Page
>
> The event detail page is cached as both content and, if not logged on, 
> the whole page as well.
>
> Expiration and re-generation
>
> Event detail page cache entries have a time out of 30 seconds using 
> the cache timeout mechanism of memcached. Thus only frequently 
> accessed events will remain in the cache. The load generator will need 
> to be designed to access event detail pages in a non-uniform manner, 
> too. We will use a locking mechanism for the event detail page in a 
> similar manner to the home page. However, we will not use an expiry 
> semaphore and let the page expire from the cache as a whole. Access to 
> the entry should however renew the expiry time so that frequently 
> accessed events will stay in cache. The mechanism will work as follows:
>
> 1. The event detail page and fragment is cached with a timeout of 30 
> seconds.
> 2. As a cache client needs to access the entry, it will try to read 
> the entry from the cache. If the entry is available, it will extend 
> the cache timeout. Otherwise, the event detail page is generated from 
> the database.
> 3. To regenerate the page and prevent stampede, the client ‘adds’ a 
> lock entry into the cache. If the add succeeds, this thread has the 
> lock. The lock times out after 20 seconds using the memcached timeout 
> mechanism. This prevents a thread to hold a lock inidefinitely.
> 4. After obtaining the lock, the thread proceeds with generating the 
> page. After completion, the page gets placed into the cache and the 
> lock gets removed from memcached.
> 5. If we do not get the lock (add fails). We stay in a loop, sleep for 
> 200ms, and check/re-check whether the page matches. We keep checking 
> till a timeout of 5 seconds (25 iterations).
> 6. The attendee list and comments/rating fragments of this page is 
> cached in the same manner. Those sections will be re-generated while 
> holding a lock object in the same manner. They will be regenerated if 
> the fragment is not in the cache, and on or after updating of those 
> fragments (i.e. somebody makes a comment or signed up to attend this 
> event).
>
> Other Pages
>
> At this point, none of the other pages and/or their fragments are 
> cached. Most of the other pages are accessed at low frequency with the 
> exception of the tag search page. The tag search page is the next 
> candidate for caching and pre-generation. The caching strategy is 
> still to be determined.
>
> Page Caches with Ruby on Rails
>
> Ruby on Rails does not natively use memcached for whole page caches. 
> It can do so with caching page fragments. Instead, it will generate 
> static pages as files and the request will be routed to the 
> corresponding file that represents a fully rendered page.
>
> The Ruby on Rails implementation of Web20Kit will use the native Rails 
> mechanism for full page caches. Expirations result in a call to remove 
> the file and follow the same expiry policy defined for each page, 
> above. The file must be removed as the page cache expires, either by a 
> request arriving after expiry, or by a background job.
>
>
> Cheers,
> - Will Sobel
>
>

Re: [jira] Resolved: (OLIO-12) Caching needs to be implemented in Rails application

Posted by William Sobel <ws...@eecs.berkeley.edu>.

On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:

> Can you please elaborate on what exactly is cached ? How is the  
> cache managed (in terms of timeouts etc.) ?

 From the original writeup:

Cache Strategy for Web20Kit

Home Page

The home page will be cached in two forms:

1. Cached as a whole page accessed by users arriving at the site and  
users that are not logged on.
2. Cached as a page fragment, just for the content part. The page will  
be constructed from the dynamic header which contains the user name of  
the current user and the cached content fragment.
3. Paginations – these will be cached up to 5 pages. It is less likely  
for users to search for events beyond the fifth page.

Expiration and re-generation

The home page will expire every 120 seconds. Then the page will be re- 
generated by one of the first requests arriving after the expiration.  
To prevent all requests arriving after the expiration from re- 
generating, thus causing a stampede phenomenon, we will use a lock/ 
semaphore control mechanism as follows:

1. The home page and/or home page fragment is cached with no timeout  
or a very large timeout (in the order of magnitude of days) in  
memcached.
2. For each cached page, a small semaphore object is placed into  
memcached with a timeout of 120 seconds – the regeneration cycle.
3. After accessing the page/fragment in the cache and sending the  
response to the user, the cache client (web server) checks to see  
whether the semaphore is there or has timed out. If it is not there  
(timed out), the client will attempt to re-generate the page or  
fragment.
4. To prevent a stampede, the client ‘adds’ a lock entry into the  
cache. If the add succeeds, this thread has the lock. The lock times  
out after 20 seconds using the memcached timeout mechanism. This  
prevents a thread to hold a lock indefinitely.
5. After obtaining the lock, the thread generates the page or fragment  
and replaces the copy in memcached.
6. Then the generating thread places a new semaphore object with the  
same timeout period and removes the lock object.

Event Detail Page

The event detail page is cached as both content and, if not logged on,  
the whole page as well.

Expiration and re-generation

Event detail page cache entries have a time out of 30 seconds using  
the cache timeout mechanism of memcached. Thus only frequently  
accessed events will remain in the cache. The load generator will need  
to be designed to access event detail pages in a non-uniform manner,  
too. We will use a locking mechanism for the event detail page in a  
similar manner to the home page. However, we will not use an expiry  
semaphore and let the page expire from the cache as a whole. Access to  
the entry should however renew the expiry time so that frequently  
accessed events will stay in cache. The mechanism will work as follows:

1. The event detail page and fragment is cached with a timeout of 30  
seconds.
2. As a cache client needs to access the entry, it will try to read  
the entry from the cache. If the entry is available, it will extend  
the cache timeout. Otherwise, the event detail page is generated from  
the database.
3. To regenerate the page and prevent stampede, the client ‘adds’ a  
lock entry into the cache. If the add succeeds, this thread has the  
lock. The lock times out after 20 seconds using the memcached timeout  
mechanism. This prevents a thread to hold a lock inidefinitely.
4. After obtaining the lock, the thread proceeds with generating the  
page. After completion, the page gets placed into the cache and the  
lock gets removed from memcached.
5. If we do not get the lock (add fails). We stay in a loop, sleep for  
200ms, and check/re-check whether the page matches. We keep checking  
till a timeout of 5 seconds (25 iterations).
6. The attendee list and comments/rating fragments of this page is  
cached in the same manner. Those sections will be re-generated while  
holding a lock object in the same manner. They will be regenerated if  
the fragment is not in the cache, and on or after updating of those  
fragments (i.e. somebody makes a comment or signed up to attend this  
event).

Other Pages

At this point, none of the other pages and/or their fragments are  
cached. Most of the other pages are accessed at low frequency with the  
exception of the tag search page. The tag search page is the next  
candidate for caching and pre-generation. The caching strategy is  
still to be determined.

Page Caches with Ruby on Rails

Ruby on Rails does not natively use memcached for whole page caches.  
It can do so with caching page fragments. Instead, it will generate  
static pages as files and the request will be routed to the  
corresponding file that represents a fully rendered page.

The Ruby on Rails implementation of Web20Kit will use the native Rails  
mechanism for full page caches. Expirations result in a call to remove  
the file and follow the same expiry policy defined for each page,  
above. The file must be removed as the page cache expires, either by a  
request arriving after expiry, or by a background job.

Cheers,
- Will Sobel

Re: [jira] Resolved: (OLIO-12) Caching needs to be implemented in Rails application

Posted by Shanti Subramanyam <Sh...@Sun.COM>.

Hi Will,
 Can you please elaborate on what exactly is cached ? How is the cache 
managed (in terms of timeouts etc.) ?

Shanti

William Sobel (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/OLIO-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> William Sobel resolved OLIO-12.
> -------------------------------
>
>     Resolution: Fixed
>
> Add caching branch: SVN revision #736851
>
> webapp/rails/branches/caching/...
>
>   
>> Caching needs to be implemented in Rails application
>> ----------------------------------------------------
>>
>>                 Key: OLIO-12
>>                 URL: https://issues.apache.org/jira/browse/OLIO-12
>>             Project: Olio
>>          Issue Type: Improvement
>>          Components: rails-app
>>            Reporter: Akara Sucharitakul
>>            Assignee: William Sobel
>>
>> We need to implement caching in the rails version (as well as PHP version, see Issue#3).
>>     
>
>

[jira] Resolved: (OLIO-12) Caching needs to be implemented in Rails application

Posted by "William Sobel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OLIO-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Sobel resolved OLIO-12.
-------------------------------

    Resolution: Fixed

Add caching branch: SVN revision #736851

webapp/rails/branches/caching/...

> Caching needs to be implemented in Rails application
> ----------------------------------------------------
>
>                 Key: OLIO-12
>                 URL: https://issues.apache.org/jira/browse/OLIO-12
>             Project: Olio
>          Issue Type: Improvement
>          Components: rails-app
>            Reporter: Akara Sucharitakul
>            Assignee: William Sobel
>
> We need to implement caching in the rails version (as well as PHP version, see Issue#3).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (OLIO-12) Caching needs to be implemented in Rails application

Posted by "William Sobel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OLIO-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Sobel closed OLIO-12.
-----------------------------


> Caching needs to be implemented in Rails application
> ----------------------------------------------------
>
>                 Key: OLIO-12
>                 URL: https://issues.apache.org/jira/browse/OLIO-12
>             Project: Olio
>          Issue Type: Improvement
>          Components: rails-app
>            Reporter: Akara Sucharitakul
>            Assignee: William Sobel
>
> We need to implement caching in the rails version (as well as PHP version, see Issue#3).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (OLIO-12) Caching needs to be implemented in Rails application

Posted by "Shanti Subramanyam (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OLIO-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shanti Subramanyam reassigned OLIO-12:
--------------------------------------

    Assignee: William Sobel  (was: Shanti Subramanyam)

> Caching needs to be implemented in Rails application
> ----------------------------------------------------
>
>                 Key: OLIO-12
>                 URL: https://issues.apache.org/jira/browse/OLIO-12
>             Project: Olio
>          Issue Type: Improvement
>          Components: rails-app
>            Reporter: Akara Sucharitakul
>            Assignee: William Sobel
>
> We need to implement caching in the rails version (as well as PHP version, see Issue#3).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.