You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Øystein Grøvlen <Oy...@Sun.COM> on 2006/01/04 13:53:06 UTC

Re: new uses for basic services cache - looking for advice

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> I believe the store's file container cache would benefit from the following:
    MM> o ability to insert duplicate key items into the cache

This will require a new CacheManager implementation since the current
implementation (Clock) extends Hashtable which can not contain
duplicate keys.  (Why is inheritance used here?  The code would be
easier to reuse if Clock contained the Hashtable.)

Maybe one could have unique keys for each item, but add a method to
CacheManager for finding items based on a partial key.  (There are
already clean and discard methods that uses partial key.)  However,
Hashtable does not support efficient lookup on a partial key.  So I
think we would need to come up with something else here.  A separate
access method on the partial key (e.g., another hash table) is a
possibility, but that would give some overhead on cache changes.

    MM> o ability to return one of the keys that is not busy, rather than
    MM>    wait on one that is.
    MM> o ability to add an item if all the matching keys are busy

Maybe these to bullets could be achieved by a method
findUnused(partialKey) that returns an item that is not kept by
others, and adds a new item if all are kept.

    MM> I think this is new to the  current cache. My question is how best to
    MM> add such functionality.


    MM> Is this something that can be handled by the current cache, with only
    MM> changes to the  object's being cached themselves? I  know the way the
    MM> cache and  lock manager  is designed  you can get  a lot  of different
    MM> kinds of functionality  by changing code in the  cache interfaces that
    MM> the objects interface rather than the cache.

It seems to me that you do not change the objects much.  It is the
cache manager that needs to change.

    MM> Or is  this a basic change  (unique key vs. duplicate  key), that will
    MM> require changes to cache manager?

Another issue one needs to consider is the cost of opening and closing
file descriptors.  If the cost is significantly larger than disk
read/write, it will be important to avoid frequent closing and opening
of file descriptors for the same file.  This will have impact on
whether the current clock algorithm is suitable for this type of
caching.

-- 
Øystein


Re: new uses for basic services cache - looking for advice

Posted by Mike Matrigali <mi...@sbcglobal.net>.
In a new generalized strategy it would not be RAFContainer objects, it
would be some new abstraction for an open file.  Maybe something that
ties in closer with the relatively newer io abstraction provided by

opensource/java/engine/org/apache/derby/io

As you point out, such a project would have to consider current
assumptions on number of outstanding objects.


Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> it is ugly, but a way to store duplicate keys in a hash table is
>     MM> to store lists as the object in the hash table rather than the
>     MM> object itself.  For instance in the store/lang interface where
>     MM> store returns a hash table with a requested result set, if duplicates
>     MM> are allowed it stores the object if there is 1 or a list if there
>     MM> is more than one.  The ugly part is that it requires a class comparison
>     MM> to tell what kind of thing is being returned.
> 
> Storing a list of items per hash entry will also beat the purpose of
> using the cache to limit the number of items to the size of the
> cache's hash table.
> 
>     MM> Having  said  that I  agree  that a  new  implementation  may be  more
>     MM> appropriate, as the desire is to get the cache manager to track the
>     MM> LRU nature of the individual items in the duplicate list.  And it
>     MM> would be nice if it walked the list and asked the "is busy" question
>     MM> to each object - this may require a new interface to be implemented
>     MM> by each object to make it more generic.
> 
> I agree.
> 
>     MM> As I have said before this cache code is relatively small, and creating
>     MM> a new implementation for open files seems like a good approach.
>     MM> At the same time I think it might make sense to have a server wide
>     MM> service providing this open file cache, rather than hidden in the
>     MM> raw store implementation.  I think there is currently a problem with
>     MM> sort because it does not also go through a open file cache - and there
>     MM> are some security manager issues which also may be because there is
>     MM> not one path to do I/O to files.
> 
> I agree that some general file cache that could be used by several
> mechanism it to be preferred.  I also think this is the best way to
> guarantee a limit on the total number of open file descriptors.
> 
> What is not clear to me, is whether the objects to be cached can be
> RAFContainer objects.  It seems to be that some of the code assumes
> that there is only a single RAFContainer object per container (e.g,
> truncation and backup).  The serialization of accesses to a file that
> we are trying to avoid for read operations, may be assumed by other
> operations.
> 


Re: new uses for basic services cache - looking for advice

Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> it is ugly, but a way to store duplicate keys in a hash table is
    MM> to store lists as the object in the hash table rather than the
    MM> object itself.  For instance in the store/lang interface where
    MM> store returns a hash table with a requested result set, if duplicates
    MM> are allowed it stores the object if there is 1 or a list if there
    MM> is more than one.  The ugly part is that it requires a class comparison
    MM> to tell what kind of thing is being returned.

Storing a list of items per hash entry will also beat the purpose of
using the cache to limit the number of items to the size of the
cache's hash table.

    MM> Having  said  that I  agree  that a  new  implementation  may be  more
    MM> appropriate, as the desire is to get the cache manager to track the
    MM> LRU nature of the individual items in the duplicate list.  And it
    MM> would be nice if it walked the list and asked the "is busy" question
    MM> to each object - this may require a new interface to be implemented
    MM> by each object to make it more generic.

I agree.

    MM> As I have said before this cache code is relatively small, and creating
    MM> a new implementation for open files seems like a good approach.
    MM> At the same time I think it might make sense to have a server wide
    MM> service providing this open file cache, rather than hidden in the
    MM> raw store implementation.  I think there is currently a problem with
    MM> sort because it does not also go through a open file cache - and there
    MM> are some security manager issues which also may be because there is
    MM> not one path to do I/O to files.

I agree that some general file cache that could be used by several
mechanism it to be preferred.  I also think this is the best way to
guarantee a limit on the total number of open file descriptors.

What is not clear to me, is whether the objects to be cached can be
RAFContainer objects.  It seems to be that some of the code assumes
that there is only a single RAFContainer object per container (e.g,
truncation and backup).  The serialization of accesses to a file that
we are trying to avoid for read operations, may be assumed by other
operations.

-- 
Øystein


Re: new uses for basic services cache - looking for advice

Posted by Mike Matrigali <mi...@sbcglobal.net>.
it is ugly, but a way to store duplicate keys in a hash table is
to store lists as the object in the hash table rather than the
object itself.  For instance in the store/lang interface where
store returns a hash table with a requested result set, if duplicates
are allowed it stores the object if there is 1 or a list if there
is more than one.  The ugly part is that it requires a class comparison
to tell what kind of thing is being returned.

Having said that I agree that a new implementation may be more 
appropriate, as the desire is to get the cache manager to track the
LRU nature of the individual items in the duplicate list.  And it
would be nice if it walked the list and asked the "is busy" question
to each object - this may require a new interface to be implemented
by each object to make it more generic.

As I have said before this cache code is relatively small, and creating
a new implementation for open files seems like a good approach.
At the same time I think it might make sense to have a server wide
service providing this open file cache, rather than hidden in the
raw store implementation.  I think there is currently a problem with
sort because it does not also go through a open file cache - and there
are some security manager issues which also may be because there is
not one path to do I/O to files.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> I believe the store's file container cache would benefit from the following:
>     MM> o ability to insert duplicate key items into the cache
> 
> This will require a new CacheManager implementation since the current
> implementation (Clock) extends Hashtable which can not contain
> duplicate keys.  (Why is inheritance used here?  The code would be
> easier to reuse if Clock contained the Hashtable.)
> 
> Maybe one could have unique keys for each item, but add a method to
> CacheManager for finding items based on a partial key.  (There are
> already clean and discard methods that uses partial key.)  However,
> Hashtable does not support efficient lookup on a partial key.  So I
> think we would need to come up with something else here.  A separate
> access method on the partial key (e.g., another hash table) is a
> possibility, but that would give some overhead on cache changes.
> 
>     MM> o ability to return one of the keys that is not busy, rather than
>     MM>    wait on one that is.
>     MM> o ability to add an item if all the matching keys are busy
> 
> Maybe these to bullets could be achieved by a method
> findUnused(partialKey) that returns an item that is not kept by
> others, and adds a new item if all are kept.
> 
>     MM> I think this is new to the  current cache. My question is how best to
>     MM> add such functionality.
> 
> 
>     MM> Is this something that can be handled by the current cache, with only
>     MM> changes to the  object's being cached themselves? I  know the way the
>     MM> cache and  lock manager  is designed  you can get  a lot  of different
>     MM> kinds of functionality  by changing code in the  cache interfaces that
>     MM> the objects interface rather than the cache.
> 
> It seems to me that you do not change the objects much.  It is the
> cache manager that needs to change.
> 
>     MM> Or is  this a basic change  (unique key vs. duplicate  key), that will
>     MM> require changes to cache manager?
> 
> Another issue one needs to consider is the cost of opening and closing
> file descriptors.  If the cost is significantly larger than disk
> read/write, it will be important to avoid frequent closing and opening
> of file descriptors for the same file.  This will have impact on
> whether the current clock algorithm is suitable for this type of
> caching.
>