You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Suresh Thalamati (JIRA)" <de...@db.apache.org> on 2005/08/20 00:43:54 UTC

[jira] Created: (DERBY-524) weigth based page cache might improving derby throughput by keeping more heavily used pages in the page cache

weigth based  page cache  might improving derby throughput by keeping more heavily used pages in the  page cache
----------------------------------------------------------------------------------------------------------------

         Key: DERBY-524
         URL: http://issues.apache.org/jira/browse/DERBY-524
     Project: Derby
        Type: New Feature
  Components: Services  
    Versions: 10.1.1.1    
    Reporter: Suresh Thalamati


This issue was discussed on  the derby-dev list  along with online backup (derby-239) design , because online backup will read pages into the cache  and potentially replace active user pages in the cache. 

comments  from the list  related to this:
http://mail-archives.apache.org/mod_mbox/db-derby-dev/200507.mbox/%3c42E50861.8020504@sbcglobal.net%3e
Mike Wrote ....
I also agree that page cache enhancement is interesting, but probably
should be tackled as a separate project.  But keeping this goal in mind
while making changes for backup is a good thing.  An interface that
that allows backup to use/reuse a single buffer in the page cache seems
reasonable.  Specializing it would seem to allow some optimizations where free page searching could be avoided for this operation which at
a very low level is going to be pushing/pulling pages as fast as possible.

I have seen the following ideas work well in a weight based page cache, it tries to limit the overhead of weights by using multiple lru, but still have some of the benefit of weight based scheme:
1) have a much smaller range than 0-100, something like 5 where each
   value is it's own lru queue.  This reduces the overhead of searching
   and sorting based on weight.
2) as dan suggests, something like:
   no weight: free list
   0: backup page, linear scan heap pages, read ahead,
   1: probe accessed heap page
   2: leaf page
   3: non-leaf page
   4: root
3) to account for re-reference, pages move up in value when re-referenced.  Revalue happens only when page is accessed so
page is already latched, so limits additional overhead needed
to reweigh page.
 various methods can be used for moving down in value:
    o whole queues at a time
    o individual pages in lru order, based on some sort of clock like current clock



Øystein Grøvlen wrote:

>>>>>> "DJD" == Daniel John Debrunner <dj...@debrunners.com> writes:
>
>
>
>     DJD> I think modifications to the cache would be useful for b), so
>     DJD> that entries in the cache (through generic apis, not specific
>     DJD> to store) could mark how "useful/valuable" they are. Just a
>     DJD> simple scheme, lower numbers less valuable, higher numbers
>     DJD> more valuable, and if it makes it easier to fix a range,
>     DJD> e.g. 0-100, then that would be ok. Then the store could added
>     DJD> pages to the cache with this weighting, e.g. (to get the
>     DJD> general idea)
>
>     DJD>      pages for backup - weight 0
>     DJD>      overflow column pages - weight 10
>     DJD>      regular pages - weight 20
>     DJD>      leaf index pages - weight 30
>     DJD>       root index pages 80
>
>     DJD> This weight would then be factored into the decision to throw pages out
>     DJD> or not.
>
> I agree that we need some mechanism to prevent operations from filling
> the cache with pages that is not likely to be accesssed again in the
> near future.  However, I am afraid that a very detailed "cost-based"
> scheme may create a significant overhead compared to a simple LRU
> scheme.
>
> One may operate with separate LRU queues for different weights, but I
> guess the number of possible weights will have to be restricted in
> that case.
>
> I am also not convinced that it is the type of page that is the most
> important criteria for caching.  What matters is access frequency.
> The page type may give a hint, but leaf pages of one index may be more
> frequently accessed than root pages of other indexes.
>
> The type of access is also a relevant criteria.  Pages accessed
> sequentially is often less likely to be accessed again in the near
> future than pages accessed by direct lookup.  A separate LRU queue for
> sequentially accessed pages may prevent backup and other sequentially
> scans (e.g., select * from t) from throwing out directly accessed
> pages (e.g., index pages and data pages accessed through indexes.)
>
>     DJD> This project could be independent of the online backup and could have
>     DJD> benfits elsewhere.
>
> I agree.
>
>



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira