You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Øystein Grøvlen <Oy...@Sun.COM> on 2006/01/04 13:53:06 UTC
Re: new uses for basic services cache - looking for advice
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
MM> I believe the store's file container cache would benefit from the following:
MM> o ability to insert duplicate key items into the cache
This will require a new CacheManager implementation since the current
implementation (Clock) extends Hashtable which can not contain
duplicate keys. (Why is inheritance used here? The code would be
easier to reuse if Clock contained the Hashtable.)
Maybe one could have unique keys for each item, but add a method to
CacheManager for finding items based on a partial key. (There are
already clean and discard methods that uses partial key.) However,
Hashtable does not support efficient lookup on a partial key. So I
think we would need to come up with something else here. A separate
access method on the partial key (e.g., another hash table) is a
possibility, but that would give some overhead on cache changes.
MM> o ability to return one of the keys that is not busy, rather than
MM> wait on one that is.
MM> o ability to add an item if all the matching keys are busy
Maybe these to bullets could be achieved by a method
findUnused(partialKey) that returns an item that is not kept by
others, and adds a new item if all are kept.
MM> I think this is new to the current cache. My question is how best to
MM> add such functionality.
MM> Is this something that can be handled by the current cache, with only
MM> changes to the object's being cached themselves? I know the way the
MM> cache and lock manager is designed you can get a lot of different
MM> kinds of functionality by changing code in the cache interfaces that
MM> the objects interface rather than the cache.
It seems to me that you do not change the objects much. It is the
cache manager that needs to change.
MM> Or is this a basic change (unique key vs. duplicate key), that will
MM> require changes to cache manager?
Another issue one needs to consider is the cost of opening and closing
file descriptors. If the cost is significantly larger than disk
read/write, it will be important to avoid frequent closing and opening
of file descriptors for the same file. This will have impact on
whether the current clock algorithm is suitable for this type of
caching.
--
Øystein
Re: new uses for basic services cache - looking for advice
Posted by Mike Matrigali <mi...@sbcglobal.net>.
In a new generalized strategy it would not be RAFContainer objects, it
would be some new abstraction for an open file. Maybe something that
ties in closer with the relatively newer io abstraction provided by
opensource/java/engine/org/apache/derby/io
As you point out, such a project would have to consider current
assumptions on number of outstanding objects.
Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>
>
> MM> it is ugly, but a way to store duplicate keys in a hash table is
> MM> to store lists as the object in the hash table rather than the
> MM> object itself. For instance in the store/lang interface where
> MM> store returns a hash table with a requested result set, if duplicates
> MM> are allowed it stores the object if there is 1 or a list if there
> MM> is more than one. The ugly part is that it requires a class comparison
> MM> to tell what kind of thing is being returned.
>
> Storing a list of items per hash entry will also beat the purpose of
> using the cache to limit the number of items to the size of the
> cache's hash table.
>
> MM> Having said that I agree that a new implementation may be more
> MM> appropriate, as the desire is to get the cache manager to track the
> MM> LRU nature of the individual items in the duplicate list. And it
> MM> would be nice if it walked the list and asked the "is busy" question
> MM> to each object - this may require a new interface to be implemented
> MM> by each object to make it more generic.
>
> I agree.
>
> MM> As I have said before this cache code is relatively small, and creating
> MM> a new implementation for open files seems like a good approach.
> MM> At the same time I think it might make sense to have a server wide
> MM> service providing this open file cache, rather than hidden in the
> MM> raw store implementation. I think there is currently a problem with
> MM> sort because it does not also go through a open file cache - and there
> MM> are some security manager issues which also may be because there is
> MM> not one path to do I/O to files.
>
> I agree that some general file cache that could be used by several
> mechanism it to be preferred. I also think this is the best way to
> guarantee a limit on the total number of open file descriptors.
>
> What is not clear to me, is whether the objects to be cached can be
> RAFContainer objects. It seems to be that some of the code assumes
> that there is only a single RAFContainer object per container (e.g,
> truncation and backup). The serialization of accesses to a file that
> we are trying to avoid for read operations, may be assumed by other
> operations.
>
Re: new uses for basic services cache - looking for advice
Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
MM> it is ugly, but a way to store duplicate keys in a hash table is
MM> to store lists as the object in the hash table rather than the
MM> object itself. For instance in the store/lang interface where
MM> store returns a hash table with a requested result set, if duplicates
MM> are allowed it stores the object if there is 1 or a list if there
MM> is more than one. The ugly part is that it requires a class comparison
MM> to tell what kind of thing is being returned.
Storing a list of items per hash entry will also beat the purpose of
using the cache to limit the number of items to the size of the
cache's hash table.
MM> Having said that I agree that a new implementation may be more
MM> appropriate, as the desire is to get the cache manager to track the
MM> LRU nature of the individual items in the duplicate list. And it
MM> would be nice if it walked the list and asked the "is busy" question
MM> to each object - this may require a new interface to be implemented
MM> by each object to make it more generic.
I agree.
MM> As I have said before this cache code is relatively small, and creating
MM> a new implementation for open files seems like a good approach.
MM> At the same time I think it might make sense to have a server wide
MM> service providing this open file cache, rather than hidden in the
MM> raw store implementation. I think there is currently a problem with
MM> sort because it does not also go through a open file cache - and there
MM> are some security manager issues which also may be because there is
MM> not one path to do I/O to files.
I agree that some general file cache that could be used by several
mechanism it to be preferred. I also think this is the best way to
guarantee a limit on the total number of open file descriptors.
What is not clear to me, is whether the objects to be cached can be
RAFContainer objects. It seems to be that some of the code assumes
that there is only a single RAFContainer object per container (e.g,
truncation and backup). The serialization of accesses to a file that
we are trying to avoid for read operations, may be assumed by other
operations.
--
Øystein
Re: new uses for basic services cache - looking for advice
Posted by Mike Matrigali <mi...@sbcglobal.net>.
it is ugly, but a way to store duplicate keys in a hash table is
to store lists as the object in the hash table rather than the
object itself. For instance in the store/lang interface where
store returns a hash table with a requested result set, if duplicates
are allowed it stores the object if there is 1 or a list if there
is more than one. The ugly part is that it requires a class comparison
to tell what kind of thing is being returned.
Having said that I agree that a new implementation may be more
appropriate, as the desire is to get the cache manager to track the
LRU nature of the individual items in the duplicate list. And it
would be nice if it walked the list and asked the "is busy" question
to each object - this may require a new interface to be implemented
by each object to make it more generic.
As I have said before this cache code is relatively small, and creating
a new implementation for open files seems like a good approach.
At the same time I think it might make sense to have a server wide
service providing this open file cache, rather than hidden in the
raw store implementation. I think there is currently a problem with
sort because it does not also go through a open file cache - and there
are some security manager issues which also may be because there is
not one path to do I/O to files.
Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>
>
> MM> I believe the store's file container cache would benefit from the following:
> MM> o ability to insert duplicate key items into the cache
>
> This will require a new CacheManager implementation since the current
> implementation (Clock) extends Hashtable which can not contain
> duplicate keys. (Why is inheritance used here? The code would be
> easier to reuse if Clock contained the Hashtable.)
>
> Maybe one could have unique keys for each item, but add a method to
> CacheManager for finding items based on a partial key. (There are
> already clean and discard methods that uses partial key.) However,
> Hashtable does not support efficient lookup on a partial key. So I
> think we would need to come up with something else here. A separate
> access method on the partial key (e.g., another hash table) is a
> possibility, but that would give some overhead on cache changes.
>
> MM> o ability to return one of the keys that is not busy, rather than
> MM> wait on one that is.
> MM> o ability to add an item if all the matching keys are busy
>
> Maybe these to bullets could be achieved by a method
> findUnused(partialKey) that returns an item that is not kept by
> others, and adds a new item if all are kept.
>
> MM> I think this is new to the current cache. My question is how best to
> MM> add such functionality.
>
>
> MM> Is this something that can be handled by the current cache, with only
> MM> changes to the object's being cached themselves? I know the way the
> MM> cache and lock manager is designed you can get a lot of different
> MM> kinds of functionality by changing code in the cache interfaces that
> MM> the objects interface rather than the cache.
>
> It seems to me that you do not change the objects much. It is the
> cache manager that needs to change.
>
> MM> Or is this a basic change (unique key vs. duplicate key), that will
> MM> require changes to cache manager?
>
> Another issue one needs to consider is the cost of opening and closing
> file descriptors. If the cost is significantly larger than disk
> read/write, it will be important to avoid frequent closing and opening
> of file descriptors for the same file. This will have impact on
> whether the current clock algorithm is suitable for this type of
> caching.
>