You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chemistry.apache.org by Florian Müller <fl...@alfresco.com> on 2010/11/15 21:17:11 UTC

getObjectByPath cache problem

Hi all,

I had another look at [1]. Unfortunately, it's an unsolvable problem.
I can think of three ways to cope with it:

1. We leave it like it is, although it is very, very confusing when you run into this situation. 

2. We don't cache by path. How would that affect applications?

3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
   In the worst case, we would have to hit the repository twice.


Any opinions?

- Florian


[1] https://issues.apache.org/jira/browse/CMIS-260

Re: getObjectByPath cache problem

Posted by Florian Müller <fl...@alfresco.com>.
Ok. That makes probably more sense...

Florian

On 16/11/2010 14:52, Florent Guillaume wrote:
> I was thinking of a session-level flag to deactivate the path cache.
>
> Florent
>
> On Tue, Nov 16, 2010 at 3:44 PM, Florian Müller
> <fl...@alfresco.com>  wrote:
>> Hi Florent, hi Jens,
>>
>> Would that be a fair summary of your posts:
>>
>> - We keep the path cache.
>>
>> - We add a revalidation flag
>>   (to the OperationContext?).
>>
>> - We add an expiration time to the path-to-id mapping
>>   (controlled by a session parameter?).
>>
>> - I would like to add:
>>   We add a general expiration time to objects in the cache
>>   (controlled by a session parameter?).
>>
>> If we agree on that I would rework the cache implementation.
>>
>>
>> @Florent: You can already deactivate caching for a getObjectByPath() call by
>> providing an OperationContext that has the "cache enabled" flag set to
>> false. That is not obvious, though.
>>
>> @Jens: Whenever you call refresh() and the object is gone or you lost the
>> permission to see it, it will throw an exception. We can't change the fact
>> that the cache might return stale objects. Even if you load a fresh object
>> from the repository, it might have been changed on the server a second later
>> ... and we don't not know that until we load it again or try to change it.
>>
>>
>> Cheers,
>>
>> Florian
>>
>>
>>
>> On 16/11/2010 13:45, Jens Hübel wrote:
>>>
>>> HI Florian
>>>
>>> Yes I agree to all your thoughts, but my idea was that the case where the
>>> object changes but the path keeps stable is one that may more weird than
>>> some of the others you mention. As those are not opencmis specific I am not
>>> sure if we should promote some of your thoughts to the OASI TC...? In
>>> addition the change token is also opaque to the client and maintained by the
>>> servers. This means that the client cannot make any assumption about its
>>> meaning. My thought was that if we reduce this just to check for equality
>>> might help in some cases.
>>>
>>> But some of your thoughts go beyond the "path is no stable id" problem and
>>> are inherent with caching in general. What is if I get an object by id,
>>> cache it and the ACL changes so that I do not have access any longer? There
>>> is some risk that a property has changed. What happens if the object gets
>>> deleted in the server?
>>>
>>> We really need to carefully document the behavior of our client lib here
>>> at least.
>>>
>>> One pragmatic solution would be to cache every object with a timestamp
>>> when it got cached. If the object is accessed from the cache we might set a
>>> (configurable) timeout. If the timeout is exceeded we always refresh the
>>> object from the server. This would give at least some guarantee that a stale
>>> object will only live for a certain period of time (lets say 30mins or so).
>>> For highly sensitive scenarios this timeout might be reduced or set to zero.
>>> The timestamp should be associated with the key (e.g. path) and not the
>>> value of the cache (object).
>>>
>>> I fear there won't be a perfect solution as you already said...
>>>
>>> Jens
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>>> Sent: Dienstag, 16. November 2010 11:40
>>> To: chemistry-dev@incubator.apache.org
>>> Subject: Re: getObjectByPath cache problem
>>>
>>> The change token can only be used to detect changes within an object.
>>> The problem here is that we are potentially dealing with two objects.
>>> The root of the problem is that a path is not a stable key for an object.
>>>
>>> An object can be updated in the repository without our knowledge. The
>>> cache would then return an outdated object and everybody should be
>>> prepared for that. refresh() reloads the current state from the
>>> repository. That works fine if the object is retrieved through
>>> getObject(). You can move the object around, unfile it, put it in
>>> multiple folders and it still works. The object id is unambiguous.
>>>
>>> The cache currently maps object paths to object ids. When you call
>>> getObjectByPath() it will look up the id for this path and gets the
>>> object from the cache. If you move the object to a different folder,
>>> getObjectByPath() shouldn't find it anymore. The path of the object has
>>> changed and the old path is now invalid. Note that the object hasn't
>>> changed and therefore the change token hasn't either.
>>> Since there is no notification from the repository, the path-to-id
>>> mapping can't be corrected. The cache still thinks the object is
>>> accessible through this path. So getObjectByPath() returns the object
>>> although it should throw a CmisObjectNotFound exception.
>>>
>>> Let's assume we create a new object in the place where the old object
>>> was. The new object can now be accessed with the old path. Since the
>>> outdated path-to-id mapping is still in place, getObjectByPath() returns
>>> the old object and not the new one -- which clearly wrong.
>>>
>>> The problem that we are facing here is that there is no reliable way to
>>> keep the path-to-id mapping up-to-date. If we want to be correct, we
>>> would have to ask the repository for the current id for the given path
>>> every time getObjectByPath() is called (3) -- or not use the cache at
>>> all (2).
>>>
>>>
>>> - Florian
>>>
>>>
>>>
>>>
>>> On 16/11/2010 08:23, Jens Hübel wrote:
>>>>
>>>> Shouldn't the change token solve that? How do we deal with the change
>>>> token for other objects that are in the local cache? Ignore? Check on each
>>>> access? Configurable?
>>>>
>>>> Jens
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>>>> Sent: Montag, 15. November 2010 21:17
>>>> To: chemistry-dev@incubator.apache.org
>>>> Subject: getObjectByPath cache problem
>>>>
>>>> Hi all,
>>>>
>>>> I had another look at [1]. Unfortunately, it's an unsolvable problem.
>>>> I can think of three ways to cope with it:
>>>>
>>>> 1. We leave it like it is, although it is very, very confusing when you
>>>> run into this situation.
>>>>
>>>> 2. We don't cache by path. How would that affect applications?
>>>>
>>>> 3. When getObjectByPath is called we fetch the object id from the
>>>> repository and then get the object from the cache.
>>>>      In the worst case, we would have to hit the repository twice.
>>>>
>>>>
>>>> Any opinions?
>>>>
>>>> - Florian
>>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/CMIS-260
>>>
>>
>>
>
>
>


Re: getObjectByPath cache problem

Posted by Florent Guillaume <fg...@nuxeo.com>.
I was thinking of a session-level flag to deactivate the path cache.

Florent

On Tue, Nov 16, 2010 at 3:44 PM, Florian Müller
<fl...@alfresco.com> wrote:
> Hi Florent, hi Jens,
>
> Would that be a fair summary of your posts:
>
> - We keep the path cache.
>
> - We add a revalidation flag
>  (to the OperationContext?).
>
> - We add an expiration time to the path-to-id mapping
>  (controlled by a session parameter?).
>
> - I would like to add:
>  We add a general expiration time to objects in the cache
>  (controlled by a session parameter?).
>
> If we agree on that I would rework the cache implementation.
>
>
> @Florent: You can already deactivate caching for a getObjectByPath() call by
> providing an OperationContext that has the "cache enabled" flag set to
> false. That is not obvious, though.
>
> @Jens: Whenever you call refresh() and the object is gone or you lost the
> permission to see it, it will throw an exception. We can't change the fact
> that the cache might return stale objects. Even if you load a fresh object
> from the repository, it might have been changed on the server a second later
> ... and we don't not know that until we load it again or try to change it.
>
>
> Cheers,
>
> Florian
>
>
>
> On 16/11/2010 13:45, Jens Hübel wrote:
>>
>> HI Florian
>>
>> Yes I agree to all your thoughts, but my idea was that the case where the
>> object changes but the path keeps stable is one that may more weird than
>> some of the others you mention. As those are not opencmis specific I am not
>> sure if we should promote some of your thoughts to the OASI TC...? In
>> addition the change token is also opaque to the client and maintained by the
>> servers. This means that the client cannot make any assumption about its
>> meaning. My thought was that if we reduce this just to check for equality
>> might help in some cases.
>>
>> But some of your thoughts go beyond the "path is no stable id" problem and
>> are inherent with caching in general. What is if I get an object by id,
>> cache it and the ACL changes so that I do not have access any longer? There
>> is some risk that a property has changed. What happens if the object gets
>> deleted in the server?
>>
>> We really need to carefully document the behavior of our client lib here
>> at least.
>>
>> One pragmatic solution would be to cache every object with a timestamp
>> when it got cached. If the object is accessed from the cache we might set a
>> (configurable) timeout. If the timeout is exceeded we always refresh the
>> object from the server. This would give at least some guarantee that a stale
>> object will only live for a certain period of time (lets say 30mins or so).
>> For highly sensitive scenarios this timeout might be reduced or set to zero.
>> The timestamp should be associated with the key (e.g. path) and not the
>> value of the cache (object).
>>
>> I fear there won't be a perfect solution as you already said...
>>
>> Jens
>>
>>
>>
>> -----Original Message-----
>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>> Sent: Dienstag, 16. November 2010 11:40
>> To: chemistry-dev@incubator.apache.org
>> Subject: Re: getObjectByPath cache problem
>>
>> The change token can only be used to detect changes within an object.
>> The problem here is that we are potentially dealing with two objects.
>> The root of the problem is that a path is not a stable key for an object.
>>
>> An object can be updated in the repository without our knowledge. The
>> cache would then return an outdated object and everybody should be
>> prepared for that. refresh() reloads the current state from the
>> repository. That works fine if the object is retrieved through
>> getObject(). You can move the object around, unfile it, put it in
>> multiple folders and it still works. The object id is unambiguous.
>>
>> The cache currently maps object paths to object ids. When you call
>> getObjectByPath() it will look up the id for this path and gets the
>> object from the cache. If you move the object to a different folder,
>> getObjectByPath() shouldn't find it anymore. The path of the object has
>> changed and the old path is now invalid. Note that the object hasn't
>> changed and therefore the change token hasn't either.
>> Since there is no notification from the repository, the path-to-id
>> mapping can't be corrected. The cache still thinks the object is
>> accessible through this path. So getObjectByPath() returns the object
>> although it should throw a CmisObjectNotFound exception.
>>
>> Let's assume we create a new object in the place where the old object
>> was. The new object can now be accessed with the old path. Since the
>> outdated path-to-id mapping is still in place, getObjectByPath() returns
>> the old object and not the new one -- which clearly wrong.
>>
>> The problem that we are facing here is that there is no reliable way to
>> keep the path-to-id mapping up-to-date. If we want to be correct, we
>> would have to ask the repository for the current id for the given path
>> every time getObjectByPath() is called (3) -- or not use the cache at
>> all (2).
>>
>>
>> - Florian
>>
>>
>>
>>
>> On 16/11/2010 08:23, Jens Hübel wrote:
>>>
>>> Shouldn't the change token solve that? How do we deal with the change
>>> token for other objects that are in the local cache? Ignore? Check on each
>>> access? Configurable?
>>>
>>> Jens
>>>
>>>
>>> -----Original Message-----
>>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>>> Sent: Montag, 15. November 2010 21:17
>>> To: chemistry-dev@incubator.apache.org
>>> Subject: getObjectByPath cache problem
>>>
>>> Hi all,
>>>
>>> I had another look at [1]. Unfortunately, it's an unsolvable problem.
>>> I can think of three ways to cope with it:
>>>
>>> 1. We leave it like it is, although it is very, very confusing when you
>>> run into this situation.
>>>
>>> 2. We don't cache by path. How would that affect applications?
>>>
>>> 3. When getObjectByPath is called we fetch the object id from the
>>> repository and then get the object from the cache.
>>>     In the worst case, we would have to hit the repository twice.
>>>
>>>
>>> Any opinions?
>>>
>>> - Florian
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/CMIS-260
>>
>
>



-- 
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87

RE: getObjectByPath cache problem

Posted by Jens Hübel <jh...@opentext.com>.
Sounds good to me...

Jens

-----Original Message-----
From: Florian Müller [mailto:florian.mueller@alfresco.com] 
Sent: Dienstag, 16. November 2010 15:44
To: chemistry-dev@incubator.apache.org
Subject: Re: getObjectByPath cache problem

Hi Florent, hi Jens,

Would that be a fair summary of your posts:

- We keep the path cache.

- We add a revalidation flag
  (to the OperationContext?).

- We add an expiration time to the path-to-id mapping
   (controlled by a session parameter?).

- I would like to add:
   We add a general expiration time to objects in the cache
   (controlled by a session parameter?).

If we agree on that I would rework the cache implementation.


@Florent: You can already deactivate caching for a getObjectByPath() 
call by providing an OperationContext that has the "cache enabled" flag 
set to false. That is not obvious, though.

@Jens: Whenever you call refresh() and the object is gone or you lost 
the permission to see it, it will throw an exception. We can't change 
the fact that the cache might return stale objects. Even if you load a 
fresh object from the repository, it might have been changed on the 
server a second later ... and we don't not know that until we load it 
again or try to change it.


Cheers,

Florian



On 16/11/2010 13:45, Jens Hübel wrote:
> HI Florian
>
> Yes I agree to all your thoughts, but my idea was that the case where the object changes but the path keeps stable is one that may more weird than some of the others you mention. As those are not opencmis specific I am not sure if we should promote some of your thoughts to the OASI TC...? In addition the change token is also opaque to the client and maintained by the servers. This means that the client cannot make any assumption about its meaning. My thought was that if we reduce this just to check for equality might help in some cases.
>
> But some of your thoughts go beyond the "path is no stable id" problem and are inherent with caching in general. What is if I get an object by id, cache it and the ACL changes so that I do not have access any longer? There is some risk that a property has changed. What happens if the object gets deleted in the server?
>
> We really need to carefully document the behavior of our client lib here at least.
>
> One pragmatic solution would be to cache every object with a timestamp when it got cached. If the object is accessed from the cache we might set a (configurable) timeout. If the timeout is exceeded we always refresh the object from the server. This would give at least some guarantee that a stale object will only live for a certain period of time (lets say 30mins or so). For highly sensitive scenarios this timeout might be reduced or set to zero. The timestamp should be associated with the key (e.g. path) and not the value of the cache (object).
>
> I fear there won't be a perfect solution as you already said...
>
> Jens
>
>
>
> -----Original Message-----
> From: Florian Müller [mailto:florian.mueller@alfresco.com]
> Sent: Dienstag, 16. November 2010 11:40
> To: chemistry-dev@incubator.apache.org
> Subject: Re: getObjectByPath cache problem
>
> The change token can only be used to detect changes within an object.
> The problem here is that we are potentially dealing with two objects.
> The root of the problem is that a path is not a stable key for an object.
>
> An object can be updated in the repository without our knowledge. The
> cache would then return an outdated object and everybody should be
> prepared for that. refresh() reloads the current state from the
> repository. That works fine if the object is retrieved through
> getObject(). You can move the object around, unfile it, put it in
> multiple folders and it still works. The object id is unambiguous.
>
> The cache currently maps object paths to object ids. When you call
> getObjectByPath() it will look up the id for this path and gets the
> object from the cache. If you move the object to a different folder,
> getObjectByPath() shouldn't find it anymore. The path of the object has
> changed and the old path is now invalid. Note that the object hasn't
> changed and therefore the change token hasn't either.
> Since there is no notification from the repository, the path-to-id
> mapping can't be corrected. The cache still thinks the object is
> accessible through this path. So getObjectByPath() returns the object
> although it should throw a CmisObjectNotFound exception.
>
> Let's assume we create a new object in the place where the old object
> was. The new object can now be accessed with the old path. Since the
> outdated path-to-id mapping is still in place, getObjectByPath() returns
> the old object and not the new one -- which clearly wrong.
>
> The problem that we are facing here is that there is no reliable way to
> keep the path-to-id mapping up-to-date. If we want to be correct, we
> would have to ask the repository for the current id for the given path
> every time getObjectByPath() is called (3) -- or not use the cache at
> all (2).
>
>
> - Florian
>
>
>
>
> On 16/11/2010 08:23, Jens Hübel wrote:
>> Shouldn't the change token solve that? How do we deal with the change token for other objects that are in the local cache? Ignore? Check on each access? Configurable?
>>
>> Jens
>>
>>
>> -----Original Message-----
>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>> Sent: Montag, 15. November 2010 21:17
>> To: chemistry-dev@incubator.apache.org
>> Subject: getObjectByPath cache problem
>>
>> Hi all,
>>
>> I had another look at [1]. Unfortunately, it's an unsolvable problem.
>> I can think of three ways to cope with it:
>>
>> 1. We leave it like it is, although it is very, very confusing when you run into this situation.
>>
>> 2. We don't cache by path. How would that affect applications?
>>
>> 3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
>>      In the worst case, we would have to hit the repository twice.
>>
>>
>> Any opinions?
>>
>> - Florian
>>
>>
>> [1] https://issues.apache.org/jira/browse/CMIS-260
>


Re: getObjectByPath cache problem

Posted by Florian Müller <fl...@alfresco.com>.
Hi Florent, hi Jens,

Would that be a fair summary of your posts:

- We keep the path cache.

- We add a revalidation flag
  (to the OperationContext?).

- We add an expiration time to the path-to-id mapping
   (controlled by a session parameter?).

- I would like to add:
   We add a general expiration time to objects in the cache
   (controlled by a session parameter?).

If we agree on that I would rework the cache implementation.


@Florent: You can already deactivate caching for a getObjectByPath() 
call by providing an OperationContext that has the "cache enabled" flag 
set to false. That is not obvious, though.

@Jens: Whenever you call refresh() and the object is gone or you lost 
the permission to see it, it will throw an exception. We can't change 
the fact that the cache might return stale objects. Even if you load a 
fresh object from the repository, it might have been changed on the 
server a second later ... and we don't not know that until we load it 
again or try to change it.


Cheers,

Florian



On 16/11/2010 13:45, Jens Hübel wrote:
> HI Florian
>
> Yes I agree to all your thoughts, but my idea was that the case where the object changes but the path keeps stable is one that may more weird than some of the others you mention. As those are not opencmis specific I am not sure if we should promote some of your thoughts to the OASI TC...? In addition the change token is also opaque to the client and maintained by the servers. This means that the client cannot make any assumption about its meaning. My thought was that if we reduce this just to check for equality might help in some cases.
>
> But some of your thoughts go beyond the "path is no stable id" problem and are inherent with caching in general. What is if I get an object by id, cache it and the ACL changes so that I do not have access any longer? There is some risk that a property has changed. What happens if the object gets deleted in the server?
>
> We really need to carefully document the behavior of our client lib here at least.
>
> One pragmatic solution would be to cache every object with a timestamp when it got cached. If the object is accessed from the cache we might set a (configurable) timeout. If the timeout is exceeded we always refresh the object from the server. This would give at least some guarantee that a stale object will only live for a certain period of time (lets say 30mins or so). For highly sensitive scenarios this timeout might be reduced or set to zero. The timestamp should be associated with the key (e.g. path) and not the value of the cache (object).
>
> I fear there won't be a perfect solution as you already said...
>
> Jens
>
>
>
> -----Original Message-----
> From: Florian Müller [mailto:florian.mueller@alfresco.com]
> Sent: Dienstag, 16. November 2010 11:40
> To: chemistry-dev@incubator.apache.org
> Subject: Re: getObjectByPath cache problem
>
> The change token can only be used to detect changes within an object.
> The problem here is that we are potentially dealing with two objects.
> The root of the problem is that a path is not a stable key for an object.
>
> An object can be updated in the repository without our knowledge. The
> cache would then return an outdated object and everybody should be
> prepared for that. refresh() reloads the current state from the
> repository. That works fine if the object is retrieved through
> getObject(). You can move the object around, unfile it, put it in
> multiple folders and it still works. The object id is unambiguous.
>
> The cache currently maps object paths to object ids. When you call
> getObjectByPath() it will look up the id for this path and gets the
> object from the cache. If you move the object to a different folder,
> getObjectByPath() shouldn't find it anymore. The path of the object has
> changed and the old path is now invalid. Note that the object hasn't
> changed and therefore the change token hasn't either.
> Since there is no notification from the repository, the path-to-id
> mapping can't be corrected. The cache still thinks the object is
> accessible through this path. So getObjectByPath() returns the object
> although it should throw a CmisObjectNotFound exception.
>
> Let's assume we create a new object in the place where the old object
> was. The new object can now be accessed with the old path. Since the
> outdated path-to-id mapping is still in place, getObjectByPath() returns
> the old object and not the new one -- which clearly wrong.
>
> The problem that we are facing here is that there is no reliable way to
> keep the path-to-id mapping up-to-date. If we want to be correct, we
> would have to ask the repository for the current id for the given path
> every time getObjectByPath() is called (3) -- or not use the cache at
> all (2).
>
>
> - Florian
>
>
>
>
> On 16/11/2010 08:23, Jens Hübel wrote:
>> Shouldn't the change token solve that? How do we deal with the change token for other objects that are in the local cache? Ignore? Check on each access? Configurable?
>>
>> Jens
>>
>>
>> -----Original Message-----
>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>> Sent: Montag, 15. November 2010 21:17
>> To: chemistry-dev@incubator.apache.org
>> Subject: getObjectByPath cache problem
>>
>> Hi all,
>>
>> I had another look at [1]. Unfortunately, it's an unsolvable problem.
>> I can think of three ways to cope with it:
>>
>> 1. We leave it like it is, although it is very, very confusing when you run into this situation.
>>
>> 2. We don't cache by path. How would that affect applications?
>>
>> 3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
>>      In the worst case, we would have to hit the repository twice.
>>
>>
>> Any opinions?
>>
>> - Florian
>>
>>
>> [1] https://issues.apache.org/jira/browse/CMIS-260
>


RE: getObjectByPath cache problem

Posted by Jens Hübel <jh...@opentext.com>.
HI Florian

Yes I agree to all your thoughts, but my idea was that the case where the object changes but the path keeps stable is one that may more weird than some of the others you mention. As those are not opencmis specific I am not sure if we should promote some of your thoughts to the OASI TC...? In addition the change token is also opaque to the client and maintained by the servers. This means that the client cannot make any assumption about its meaning. My thought was that if we reduce this just to check for equality might help in some cases.

But some of your thoughts go beyond the "path is no stable id" problem and are inherent with caching in general. What is if I get an object by id, cache it and the ACL changes so that I do not have access any longer? There is some risk that a property has changed. What happens if the object gets deleted in the server?

We really need to carefully document the behavior of our client lib here at least.

One pragmatic solution would be to cache every object with a timestamp when it got cached. If the object is accessed from the cache we might set a (configurable) timeout. If the timeout is exceeded we always refresh the object from the server. This would give at least some guarantee that a stale object will only live for a certain period of time (lets say 30mins or so). For highly sensitive scenarios this timeout might be reduced or set to zero. The timestamp should be associated with the key (e.g. path) and not the value of the cache (object).

I fear there won't be a perfect solution as you already said...

Jens



-----Original Message-----
From: Florian Müller [mailto:florian.mueller@alfresco.com] 
Sent: Dienstag, 16. November 2010 11:40
To: chemistry-dev@incubator.apache.org
Subject: Re: getObjectByPath cache problem

The change token can only be used to detect changes within an object. 
The problem here is that we are potentially dealing with two objects. 
The root of the problem is that a path is not a stable key for an object.

An object can be updated in the repository without our knowledge. The 
cache would then return an outdated object and everybody should be 
prepared for that. refresh() reloads the current state from the 
repository. That works fine if the object is retrieved through 
getObject(). You can move the object around, unfile it, put it in 
multiple folders and it still works. The object id is unambiguous.

The cache currently maps object paths to object ids. When you call 
getObjectByPath() it will look up the id for this path and gets the 
object from the cache. If you move the object to a different folder, 
getObjectByPath() shouldn't find it anymore. The path of the object has 
changed and the old path is now invalid. Note that the object hasn't 
changed and therefore the change token hasn't either.
Since there is no notification from the repository, the path-to-id 
mapping can't be corrected. The cache still thinks the object is 
accessible through this path. So getObjectByPath() returns the object 
although it should throw a CmisObjectNotFound exception.

Let's assume we create a new object in the place where the old object 
was. The new object can now be accessed with the old path. Since the 
outdated path-to-id mapping is still in place, getObjectByPath() returns 
the old object and not the new one -- which clearly wrong.

The problem that we are facing here is that there is no reliable way to 
keep the path-to-id mapping up-to-date. If we want to be correct, we 
would have to ask the repository for the current id for the given path 
every time getObjectByPath() is called (3) -- or not use the cache at 
all (2).


- Florian




On 16/11/2010 08:23, Jens Hübel wrote:
> Shouldn't the change token solve that? How do we deal with the change token for other objects that are in the local cache? Ignore? Check on each access? Configurable?
>
> Jens
>
>
> -----Original Message-----
> From: Florian Müller [mailto:florian.mueller@alfresco.com]
> Sent: Montag, 15. November 2010 21:17
> To: chemistry-dev@incubator.apache.org
> Subject: getObjectByPath cache problem
>
> Hi all,
>
> I had another look at [1]. Unfortunately, it's an unsolvable problem.
> I can think of three ways to cope with it:
>
> 1. We leave it like it is, although it is very, very confusing when you run into this situation.
>
> 2. We don't cache by path. How would that affect applications?
>
> 3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
>     In the worst case, we would have to hit the repository twice.
>
>
> Any opinions?
>
> - Florian
>
>
> [1] https://issues.apache.org/jira/browse/CMIS-260


Re: getObjectByPath cache problem

Posted by Florent Guillaume <fg...@nuxeo.com>.
My suggestion would be to keep caching by path, but have a session
flag to deactivate it if the client wants to. And maybe another flag
to "always revalidate" and make the second round trip to the server to
be sure the object ID hasn't changed.
I know it's a non-solution but at least the choices are offered to the
developer and we don't set things in stone :)

Florent

On Tue, Nov 16, 2010 at 11:40 AM, Florian Müller
<fl...@alfresco.com> wrote:
> The change token can only be used to detect changes within an object. The
> problem here is that we are potentially dealing with two objects. The root
> of the problem is that a path is not a stable key for an object.
>
> An object can be updated in the repository without our knowledge. The cache
> would then return an outdated object and everybody should be prepared for
> that. refresh() reloads the current state from the repository. That works
> fine if the object is retrieved through getObject(). You can move the object
> around, unfile it, put it in multiple folders and it still works. The object
> id is unambiguous.
>
> The cache currently maps object paths to object ids. When you call
> getObjectByPath() it will look up the id for this path and gets the object
> from the cache. If you move the object to a different folder,
> getObjectByPath() shouldn't find it anymore. The path of the object has
> changed and the old path is now invalid. Note that the object hasn't changed
> and therefore the change token hasn't either.
> Since there is no notification from the repository, the path-to-id mapping
> can't be corrected. The cache still thinks the object is accessible through
> this path. So getObjectByPath() returns the object although it should throw
> a CmisObjectNotFound exception.
>
> Let's assume we create a new object in the place where the old object was.
> The new object can now be accessed with the old path. Since the outdated
> path-to-id mapping is still in place, getObjectByPath() returns the old
> object and not the new one -- which clearly wrong.
>
> The problem that we are facing here is that there is no reliable way to keep
> the path-to-id mapping up-to-date. If we want to be correct, we would have
> to ask the repository for the current id for the given path every time
> getObjectByPath() is called (3) -- or not use the cache at all (2).
>
>
> - Florian
>
>
>
>
> On 16/11/2010 08:23, Jens Hübel wrote:
>>
>> Shouldn't the change token solve that? How do we deal with the change
>> token for other objects that are in the local cache? Ignore? Check on each
>> access? Configurable?
>>
>> Jens
>>
>>
>> -----Original Message-----
>> From: Florian Müller [mailto:florian.mueller@alfresco.com]
>> Sent: Montag, 15. November 2010 21:17
>> To: chemistry-dev@incubator.apache.org
>> Subject: getObjectByPath cache problem
>>
>> Hi all,
>>
>> I had another look at [1]. Unfortunately, it's an unsolvable problem.
>> I can think of three ways to cope with it:
>>
>> 1. We leave it like it is, although it is very, very confusing when you
>> run into this situation.
>>
>> 2. We don't cache by path. How would that affect applications?
>>
>> 3. When getObjectByPath is called we fetch the object id from the
>> repository and then get the object from the cache.
>>    In the worst case, we would have to hit the repository twice.
>>
>>
>> Any opinions?
>>
>> - Florian
>>
>>
>> [1] https://issues.apache.org/jira/browse/CMIS-260
>
>



-- 
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87

Re: getObjectByPath cache problem

Posted by Florian Müller <fl...@alfresco.com>.
The change token can only be used to detect changes within an object. 
The problem here is that we are potentially dealing with two objects. 
The root of the problem is that a path is not a stable key for an object.

An object can be updated in the repository without our knowledge. The 
cache would then return an outdated object and everybody should be 
prepared for that. refresh() reloads the current state from the 
repository. That works fine if the object is retrieved through 
getObject(). You can move the object around, unfile it, put it in 
multiple folders and it still works. The object id is unambiguous.

The cache currently maps object paths to object ids. When you call 
getObjectByPath() it will look up the id for this path and gets the 
object from the cache. If you move the object to a different folder, 
getObjectByPath() shouldn't find it anymore. The path of the object has 
changed and the old path is now invalid. Note that the object hasn't 
changed and therefore the change token hasn't either.
Since there is no notification from the repository, the path-to-id 
mapping can't be corrected. The cache still thinks the object is 
accessible through this path. So getObjectByPath() returns the object 
although it should throw a CmisObjectNotFound exception.

Let's assume we create a new object in the place where the old object 
was. The new object can now be accessed with the old path. Since the 
outdated path-to-id mapping is still in place, getObjectByPath() returns 
the old object and not the new one -- which clearly wrong.

The problem that we are facing here is that there is no reliable way to 
keep the path-to-id mapping up-to-date. If we want to be correct, we 
would have to ask the repository for the current id for the given path 
every time getObjectByPath() is called (3) -- or not use the cache at 
all (2).


- Florian




On 16/11/2010 08:23, Jens Hübel wrote:
> Shouldn't the change token solve that? How do we deal with the change token for other objects that are in the local cache? Ignore? Check on each access? Configurable?
>
> Jens
>
>
> -----Original Message-----
> From: Florian Müller [mailto:florian.mueller@alfresco.com]
> Sent: Montag, 15. November 2010 21:17
> To: chemistry-dev@incubator.apache.org
> Subject: getObjectByPath cache problem
>
> Hi all,
>
> I had another look at [1]. Unfortunately, it's an unsolvable problem.
> I can think of three ways to cope with it:
>
> 1. We leave it like it is, although it is very, very confusing when you run into this situation.
>
> 2. We don't cache by path. How would that affect applications?
>
> 3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
>     In the worst case, we would have to hit the repository twice.
>
>
> Any opinions?
>
> - Florian
>
>
> [1] https://issues.apache.org/jira/browse/CMIS-260


RE: getObjectByPath cache problem

Posted by Jens Hübel <jh...@opentext.com>.
Shouldn't the change token solve that? How do we deal with the change token for other objects that are in the local cache? Ignore? Check on each access? Configurable?

Jens


-----Original Message-----
From: Florian Müller [mailto:florian.mueller@alfresco.com] 
Sent: Montag, 15. November 2010 21:17
To: chemistry-dev@incubator.apache.org
Subject: getObjectByPath cache problem

Hi all,

I had another look at [1]. Unfortunately, it's an unsolvable problem.
I can think of three ways to cope with it:

1. We leave it like it is, although it is very, very confusing when you run into this situation. 

2. We don't cache by path. How would that affect applications?

3. When getObjectByPath is called we fetch the object id from the repository and then get the object from the cache.
   In the worst case, we would have to hit the repository twice.


Any opinions?

- Florian


[1] https://issues.apache.org/jira/browse/CMIS-260