You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Jörg Hoh <jh...@googlemail.com.INVALID> on 2021/10/11 14:18:33 UTC

Cache objects retrieved/derived from a ResourceResolver

Hi,

I am investigating a reasonable way to cache objects which have the same
livecycle as a resource resolver, because they are retrieved or derived
from it.
For example this could be the result of query, which is used at multiple
locations within my application, all in the context of the same request /
resource resolver. Right now my only way is to store them in request
attributes, but that does not feel right, because it rather relates to the
resource resolver (and not the request itself), and it requires me to pass
always the request along although I just need the resource resolver. Also
in some situations there is alrady existing code, which just passes along
the resource resolver and no request (maybe the operation does not even
work in the context of a request).

I checked the API of the ResourceResolver, and it comes with
getAttributeNames() and getAttribute(), but these properties are populated
during creation and cannot be modified afterwards. So not helpful here.

I envision to have a WeakHashMap<String,Object>, which can be reached via a
new API call "getTemporaryStorage()" (I am very open for a better name),
which we could implement as default method right in the interface.

WDYT?



-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Paul Bjorkstrand <pa...@gmail.com>.
On Sat, Oct 16, 2021 at 9:42 AM Jörg Hoh <jh...@googlemail.com.invalid>
wrote:

> Hi Carsten,
>
> Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
> cziegeler@apache.org>:
>
> > I don't think that the RR is the right place for this.
> >
> > If the use case is that during a request the exact same query is
> > executed more than once, then caching the result of the query in Sling
> > is probably not resulting in the wanted gain. I think it is much better
> > to avoid executing the query multiple times in the first place -
> >
>
> That's what I want to achieve. But where should I store the result of the
> query so I don't need to execute the query multiple times? As said, I don't
> have access to a store, which allows me to store this result within a
> well-defined scope (which is identical to the lifetime of a resource
> resolver). And I don't want to roll such a logic in my own, because it's
> definitely not easy to get that right.
>
>
Is it safe to assume that there are no concurrency concerns? If there are
no concerns, a field in your adapter object (that you get from
rr.adaptTo()) can store the result of the query. You can do a null check,
and only perform the query if it is null. That is pretty standard logic in
a lot of non-concurrent applications:

<QueryResult> queryResult;
...

<QueryResult> getResult()() {
  if (queryResult == null) {
    queryResult = // some expensive query
  }

  return queryResult;
}

<SomeOtherType> processForResource(Resource r) {
  QueryResult result= getResult();
  // do your logic with the query result & resource
  // possibly storing intermediate results in other fields (maybe protected
with similar null-checks)

  return // ... the result of your computation
}

For the lifetime & scope problem, you get the benefits of the adapter cache
as part of the adaptTo logic.

If concurrency is a concern, then you would need to do some sort of
synchronization to ensure only one thread is performing the query. Given
that this is per-request, it sounds like you will not need synchronization.


> > executing the query is one part but then processing the result is
> > another. And you want to avoid this as well.
> >
>
> In every invocation of my method a different resource is passed, and I need
> to evaluate the result of that query for this resource. To be exact, I
> transform the result of the query into other domain objects, and to
> optimize the performance of this evaluation, I might even store these
> domain objects in a different structure than just a list.
>
>
If concurrency is not a problem, then you can do the same type of "caching"
in the adapter object.


> >
> > In the past 12 years or so we tried various POCs with caching in the RR.
> > Always under the (wrong) assumption that Oak/Jackrabbit is the
> > bottleneck. So we added a RR caching all get resources, all list
> > resources, all queries etc. And it turned out that the gain was close to
> > zero. Each operation in Oak was pretty fast, but if you execute it
> > hundreds of times during a request, even a fast call becomes the
> > bottleneck. So again, usually the problem is in the upper layers, above
> > Sling's resource api.
> >
>
> I don't want to create a transparent caching layer on a RR level to store
> objects which have been requested from the repository. I know that this was
> tried in the past, and did not provide the benefits we intended it to have.
> I just want to have a simple store (map) on the RR, where I can store all
> types of objects, which share the same lifecycle as the resource resolver.
>
> This is handled explicitly, and the developer is responsible to make sure,
> that indeed the lifecycles of the resource resolver and the object which
> are stored in this temporary storage are compatible. If the developer
> decides to store resources in there, and these resources might change over
> the lifetime of this (potentially long-living) resource resolver, so be it.
> There should not be any magic, it's just a simple map.
>
> Jörg
>
> --
> Cheers,
> Jörg Hoh,
>
> https://cqdump.joerghoh.de
> Twitter: @joerghoh
>

-Paul

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Carsten Ziegeler <cz...@apache.org>.
Agreed, such an approach is definitely easier to use compared to all the 
other options.

I think supporting Closeable is a nice feature which adds some 
additional value.

If we just go with that single method addition, I'm ok with adding this 
to the api. Returning a mutable map is way better than having the 
set/get/remove methods as this also allows for computeIf etc

Regards
Carsten

Am 25.10.2021 um 11:38 schrieb Jörg Hoh:
> I thought about using adapTo() and the cache integrated in there.
> Technically it could work, but it would require me to write a
> AdapterFactory plus a matching "target" class, to which I can adapt to.
> That's a lot of development overhead compared to a simple
> 
> resourceResolver.getPropertyMap().put("key", theObjectIWantToStore);
> 
> especially if this this class and the AdapterFactory are only used at this
> place. If this would be a recurring pattern, I would make this much more
> explicit. But I worked around this missing already a few times, and I never
> found a pattern (besides that having such a map on the resource resolver
> would have made it much easier.
> 
> Also when we think about resource handling, we could enhance the
> implementation, that at the logout() of a ResourceResolver all closeable
> objects stored in the map are closed, and that the map is cleared. So the
> objects in this map are closely bound to the lifetime of the resource
> resolver, making both accidental and intentional misuse harder.
> 
> WDYT?
> Jörg
> 
> 
> 
> Am Do., 21. Okt. 2021 um 20:56 Uhr schrieb Julian Sedding <
> jsedding@gmail.com>:
> 
>> Hi Jörg
>>
>> You could wrap your ResourceResolver using a custom
>> ResourceResolverWrapper that could hold your state. IIRC Sling's
>> ResourceResolverWrapper does a "deep wrap", i.e. resources obtained
>> from it are also wrapped and return the wrapped RR via their
>> Resource#getResourceResolver() method.
>>
>> Inside your custom wrapper you can store whatever state you want, and
>> you may clean it up once ResourceResolver#close() is called.
>>
>> Of course this requires you to be able to wrap the RR before calling
>> the API in question.
>>
>> Regards
>> Julian
>>
>> On Mon, Oct 18, 2021 at 2:09 PM Paul Bjorkstrand
>> <pa...@gmail.com> wrote:
>>>
>>> My point stands, there is no API provided. The thread local cleaner that
>> is
>>> used, while used commonly in many applications (I am 99% sure I saw that
>>> exact code or very similar in a SO while looking for such an API), is
>> still
>>> dependent on internal implementation details of thread locals
>>> (non-public/inaccessible fields). This is no different than depending on
>>> Unsafe. As you mentioned, it is much harder in later versions of Java
>>> because implementation details are much more encapsulated.
>>>
>>> -Paul
>>>
>>>
>>> On Mon, Oct 18, 2021 at 4:24 AM Robert Munteanu <ro...@apache.org>
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Sun, 2021-10-17 at 14:26 -0500, Paul Bjorkstrand wrote:
>>>>> As far as I understand, thread locals' storage is an implementation
>>>>> detail
>>>>> in the JVM. There isn't an API to clear all/arbitrary thread locals.
>>>>> Thread
>>>>> pools, or applications that use thread pools, need to provide hooks
>> to
>>>>> do
>>>>> that, so that code can do its own cleanup. This is exactly what the
>>>>> request listener that Carsten mentioned does.
>>>>
>>>> Sling's thread pool actually cleans up thread locals after execution
>>>> [1]. It's not yet fixed for Java 17 though [2].
>>>>
>>>> [1]:
>>>>
>>>>
>> https://github.com/apache/sling-org-apache-sling-commons-threads/blob/org.apache.sling.commons.threads-3.2.22/src/main/java/org/apache/sling/commons/threads/impl/ThreadPoolExecutorCleaningThreadLocals.java
>>>> [2]: https://issues.apache.org/jira/browse/SLING-10831
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>
> 
> 

-- 
Carsten Ziegeler
Adobe
cziegeler@apache.org

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Jörg Hoh <jh...@googlemail.com.INVALID>.
I thought about using adapTo() and the cache integrated in there.
Technically it could work, but it would require me to write a
AdapterFactory plus a matching "target" class, to which I can adapt to.
That's a lot of development overhead compared to a simple

resourceResolver.getPropertyMap().put("key", theObjectIWantToStore);

especially if this this class and the AdapterFactory are only used at this
place. If this would be a recurring pattern, I would make this much more
explicit. But I worked around this missing already a few times, and I never
found a pattern (besides that having such a map on the resource resolver
would have made it much easier.

Also when we think about resource handling, we could enhance the
implementation, that at the logout() of a ResourceResolver all closeable
objects stored in the map are closed, and that the map is cleared. So the
objects in this map are closely bound to the lifetime of the resource
resolver, making both accidental and intentional misuse harder.

WDYT?
Jörg



Am Do., 21. Okt. 2021 um 20:56 Uhr schrieb Julian Sedding <
jsedding@gmail.com>:

> Hi Jörg
>
> You could wrap your ResourceResolver using a custom
> ResourceResolverWrapper that could hold your state. IIRC Sling's
> ResourceResolverWrapper does a "deep wrap", i.e. resources obtained
> from it are also wrapped and return the wrapped RR via their
> Resource#getResourceResolver() method.
>
> Inside your custom wrapper you can store whatever state you want, and
> you may clean it up once ResourceResolver#close() is called.
>
> Of course this requires you to be able to wrap the RR before calling
> the API in question.
>
> Regards
> Julian
>
> On Mon, Oct 18, 2021 at 2:09 PM Paul Bjorkstrand
> <pa...@gmail.com> wrote:
> >
> > My point stands, there is no API provided. The thread local cleaner that
> is
> > used, while used commonly in many applications (I am 99% sure I saw that
> > exact code or very similar in a SO while looking for such an API), is
> still
> > dependent on internal implementation details of thread locals
> > (non-public/inaccessible fields). This is no different than depending on
> > Unsafe. As you mentioned, it is much harder in later versions of Java
> > because implementation details are much more encapsulated.
> >
> > -Paul
> >
> >
> > On Mon, Oct 18, 2021 at 4:24 AM Robert Munteanu <ro...@apache.org>
> wrote:
> >
> > > Hi,
> > >
> > > On Sun, 2021-10-17 at 14:26 -0500, Paul Bjorkstrand wrote:
> > > > As far as I understand, thread locals' storage is an implementation
> > > > detail
> > > > in the JVM. There isn't an API to clear all/arbitrary thread locals.
> > > > Thread
> > > > pools, or applications that use thread pools, need to provide hooks
> to
> > > > do
> > > > that, so that code can do its own cleanup. This is exactly what the
> > > > request listener that Carsten mentioned does.
> > >
> > > Sling's thread pool actually cleans up thread locals after execution
> > > [1]. It's not yet fixed for Java 17 though [2].
> > >
> > > [1]:
> > >
> > >
> https://github.com/apache/sling-org-apache-sling-commons-threads/blob/org.apache.sling.commons.threads-3.2.22/src/main/java/org/apache/sling/commons/threads/impl/ThreadPoolExecutorCleaningThreadLocals.java
> > > [2]: https://issues.apache.org/jira/browse/SLING-10831
> > >
> > > Thanks,
> > > Robert
> > >
>


-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Julian Sedding <js...@gmail.com>.
Hi Jörg

You could wrap your ResourceResolver using a custom
ResourceResolverWrapper that could hold your state. IIRC Sling's
ResourceResolverWrapper does a "deep wrap", i.e. resources obtained
from it are also wrapped and return the wrapped RR via their
Resource#getResourceResolver() method.

Inside your custom wrapper you can store whatever state you want, and
you may clean it up once ResourceResolver#close() is called.

Of course this requires you to be able to wrap the RR before calling
the API in question.

Regards
Julian

On Mon, Oct 18, 2021 at 2:09 PM Paul Bjorkstrand
<pa...@gmail.com> wrote:
>
> My point stands, there is no API provided. The thread local cleaner that is
> used, while used commonly in many applications (I am 99% sure I saw that
> exact code or very similar in a SO while looking for such an API), is still
> dependent on internal implementation details of thread locals
> (non-public/inaccessible fields). This is no different than depending on
> Unsafe. As you mentioned, it is much harder in later versions of Java
> because implementation details are much more encapsulated.
>
> -Paul
>
>
> On Mon, Oct 18, 2021 at 4:24 AM Robert Munteanu <ro...@apache.org> wrote:
>
> > Hi,
> >
> > On Sun, 2021-10-17 at 14:26 -0500, Paul Bjorkstrand wrote:
> > > As far as I understand, thread locals' storage is an implementation
> > > detail
> > > in the JVM. There isn't an API to clear all/arbitrary thread locals.
> > > Thread
> > > pools, or applications that use thread pools, need to provide hooks to
> > > do
> > > that, so that code can do its own cleanup. This is exactly what the
> > > request listener that Carsten mentioned does.
> >
> > Sling's thread pool actually cleans up thread locals after execution
> > [1]. It's not yet fixed for Java 17 though [2].
> >
> > [1]:
> >
> > https://github.com/apache/sling-org-apache-sling-commons-threads/blob/org.apache.sling.commons.threads-3.2.22/src/main/java/org/apache/sling/commons/threads/impl/ThreadPoolExecutorCleaningThreadLocals.java
> > [2]: https://issues.apache.org/jira/browse/SLING-10831
> >
> > Thanks,
> > Robert
> >

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Paul Bjorkstrand <pa...@gmail.com>.
My point stands, there is no API provided. The thread local cleaner that is
used, while used commonly in many applications (I am 99% sure I saw that
exact code or very similar in a SO while looking for such an API), is still
dependent on internal implementation details of thread locals
(non-public/inaccessible fields). This is no different than depending on
Unsafe. As you mentioned, it is much harder in later versions of Java
because implementation details are much more encapsulated.

-Paul


On Mon, Oct 18, 2021 at 4:24 AM Robert Munteanu <ro...@apache.org> wrote:

> Hi,
>
> On Sun, 2021-10-17 at 14:26 -0500, Paul Bjorkstrand wrote:
> > As far as I understand, thread locals' storage is an implementation
> > detail
> > in the JVM. There isn't an API to clear all/arbitrary thread locals.
> > Thread
> > pools, or applications that use thread pools, need to provide hooks to
> > do
> > that, so that code can do its own cleanup. This is exactly what the
> > request listener that Carsten mentioned does.
>
> Sling's thread pool actually cleans up thread locals after execution
> [1]. It's not yet fixed for Java 17 though [2].
>
> [1]:
>
> https://github.com/apache/sling-org-apache-sling-commons-threads/blob/org.apache.sling.commons.threads-3.2.22/src/main/java/org/apache/sling/commons/threads/impl/ThreadPoolExecutorCleaningThreadLocals.java
> [2]: https://issues.apache.org/jira/browse/SLING-10831
>
> Thanks,
> Robert
>

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Robert Munteanu <ro...@apache.org>.
Hi,

On Sun, 2021-10-17 at 14:26 -0500, Paul Bjorkstrand wrote:
> As far as I understand, thread locals' storage is an implementation
> detail
> in the JVM. There isn't an API to clear all/arbitrary thread locals.
> Thread
> pools, or applications that use thread pools, need to provide hooks to
> do
> that, so that code can do its own cleanup. This is exactly what the
> request listener that Carsten mentioned does.

Sling's thread pool actually cleans up thread locals after execution
[1]. It's not yet fixed for Java 17 though [2].

[1]:
https://github.com/apache/sling-org-apache-sling-commons-threads/blob/org.apache.sling.commons.threads-3.2.22/src/main/java/org/apache/sling/commons/threads/impl/ThreadPoolExecutorCleaningThreadLocals.java
[2]: https://issues.apache.org/jira/browse/SLING-10831

Thanks,
Robert

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Paul Bjorkstrand <pa...@gmail.com>.
As far as I understand, thread locals' storage is an implementation detail
in the JVM. There isn't an API to clear all/arbitrary thread locals. Thread
pools, or applications that use thread pools, need to provide hooks to do
that, so that code can do its own cleanup. This is exactly what the
request listener that Carsten mentioned does.

Also, thread pools keep threads for long periods of time (up to the entire
run of the application). There is no guarantee that a thread will be GC'd
ever in the lifetime of an application. That is the point of thread pools
in the first place as threads are relatively expensive to create [1]...at
least expensive until Project Loom [2] is finished.

[1]: https://stackoverflow.com/a/5483105
[2]: https://openjdk.java.net/projects/loom/

-Paul


On Sun, Oct 17, 2021 at 10:53 AM Carsten Ziegeler <cz...@apache.org>
wrote:

> Hi,
>
> not sure - this heavily depends on how the servlet engine manages the
> threads. I guess usually a thread pool is used, so threads are reused.
> I'm not sure whether thread locals are cleaned up when a thread is
> returned into the pool by every engine and if you can rely on it.
> But using a request listener and cleaning up the thread local works in
> all cases :)
>
> Carsten
>
> Am 17.10.2021 um 13:56 schrieb Roy Teeuwen:
> > Hey Carsten,
> >
> > Could you clarify why you need to clean up the threadlocal when it is
> bound to the thread of a request? Don't the threads of a request get
> garbage collected, and by that way also the ThreadLocal's bound to it
> >
> > Greets,
> > Roy
> >
> >> On 17 Oct 2021, at 09:57, Carsten Ziegeler <cz...@apache.org>
> wrote:
> >>
> >> Thanks Jörg, initially I got your use case wrong.
> >>
> >> As several people have adviced here now, using an adapter factory
> should give you what you need.
> >>
> >> A more ugly version would be to use a thread local and clean it up when
> the request finishes (via a request listener)
> >>
> >> Another option would be to write a custom resource provider, but that
> might be even more a hack than anything else.
> >>
> >> Regards
> >> Carsten
> >>
> >> Am 16.10.2021 um 16:41 schrieb Jörg Hoh:
> >>> Hi Carsten,
> >>> Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
> >>> cziegeler@apache.org>:
> >>>> I don't think that the RR is the right place for this.
> >>>>
> >>>> If the use case is that during a request the exact same query is
> >>>> executed more than once, then caching the result of the query in Sling
> >>>> is probably not resulting in the wanted gain. I think it is much
> better
> >>>> to avoid executing the query multiple times in the first place -
> >>>>
> >>> That's what I want to achieve. But where should I store the result of
> the
> >>> query so I don't need to execute the query multiple times? As said, I
> don't
> >>> have access to a store, which allows me to store this result within a
> >>> well-defined scope (which is identical to the lifetime of a resource
> >>> resolver). And I don't want to roll such a logic in my own, because
> it's
> >>> definitely not easy to get that right.
> >>>> executing the query is one part but then processing the result is
> >>>> another. And you want to avoid this as well.
> >>>>
> >>> In every invocation of my method a different resource is passed, and I
> need
> >>> to evaluate the result of that query for this resource. To be exact, I
> >>> transform the result of the query into other domain objects, and to
> >>> optimize the performance of this evaluation, I might even store these
> >>> domain objects in a different structure than just a list.
> >>>>
> >>>> In the past 12 years or so we tried various POCs with caching in the
> RR.
> >>>> Always under the (wrong) assumption that Oak/Jackrabbit is the
> >>>> bottleneck. So we added a RR caching all get resources, all list
> >>>> resources, all queries etc. And it turned out that the gain was close
> to
> >>>> zero. Each operation in Oak was pretty fast, but if you execute it
> >>>> hundreds of times during a request, even a fast call becomes the
> >>>> bottleneck. So again, usually the problem is in the upper layers,
> above
> >>>> Sling's resource api.
> >>>>
> >>> I don't want to create a transparent caching layer on a RR level to
> store
> >>> objects which have been requested from the repository. I know that
> this was
> >>> tried in the past, and did not provide the benefits we intended it to
> have.
> >>> I just want to have a simple store (map) on the RR, where I can store
> all
> >>> types of objects, which share the same lifecycle as the resource
> resolver.
> >>> This is handled explicitly, and the developer is responsible to make
> sure,
> >>> that indeed the lifecycles of the resource resolver and the object
> which
> >>> are stored in this temporary storage are compatible. If the developer
> >>> decides to store resources in there, and these resources might change
> over
> >>> the lifetime of this (potentially long-living) resource resolver, so
> be it.
> >>> There should not be any magic, it's just a simple map.
> >>> Jörg
> >>
> >> --
> >> Carsten Ziegeler
> >> Adobe
> >> cziegeler@apache.org <ma...@apache.org>
> >
>
> --
> Carsten Ziegeler
> Adobe
> cziegeler@apache.org
>

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Carsten Ziegeler <cz...@apache.org>.
Hi,

not sure - this heavily depends on how the servlet engine manages the 
threads. I guess usually a thread pool is used, so threads are reused. 
I'm not sure whether thread locals are cleaned up when a thread is 
returned into the pool by every engine and if you can rely on it.
But using a request listener and cleaning up the thread local works in 
all cases :)

Carsten

Am 17.10.2021 um 13:56 schrieb Roy Teeuwen:
> Hey Carsten,
> 
> Could you clarify why you need to clean up the threadlocal when it is bound to the thread of a request? Don't the threads of a request get garbage collected, and by that way also the ThreadLocal's bound to it
> 
> Greets,
> Roy
> 
>> On 17 Oct 2021, at 09:57, Carsten Ziegeler <cz...@apache.org> wrote:
>>
>> Thanks Jörg, initially I got your use case wrong.
>>
>> As several people have adviced here now, using an adapter factory should give you what you need.
>>
>> A more ugly version would be to use a thread local and clean it up when the request finishes (via a request listener)
>>
>> Another option would be to write a custom resource provider, but that might be even more a hack than anything else.
>>
>> Regards
>> Carsten
>>
>> Am 16.10.2021 um 16:41 schrieb Jörg Hoh:
>>> Hi Carsten,
>>> Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
>>> cziegeler@apache.org>:
>>>> I don't think that the RR is the right place for this.
>>>>
>>>> If the use case is that during a request the exact same query is
>>>> executed more than once, then caching the result of the query in Sling
>>>> is probably not resulting in the wanted gain. I think it is much better
>>>> to avoid executing the query multiple times in the first place -
>>>>
>>> That's what I want to achieve. But where should I store the result of the
>>> query so I don't need to execute the query multiple times? As said, I don't
>>> have access to a store, which allows me to store this result within a
>>> well-defined scope (which is identical to the lifetime of a resource
>>> resolver). And I don't want to roll such a logic in my own, because it's
>>> definitely not easy to get that right.
>>>> executing the query is one part but then processing the result is
>>>> another. And you want to avoid this as well.
>>>>
>>> In every invocation of my method a different resource is passed, and I need
>>> to evaluate the result of that query for this resource. To be exact, I
>>> transform the result of the query into other domain objects, and to
>>> optimize the performance of this evaluation, I might even store these
>>> domain objects in a different structure than just a list.
>>>>
>>>> In the past 12 years or so we tried various POCs with caching in the RR.
>>>> Always under the (wrong) assumption that Oak/Jackrabbit is the
>>>> bottleneck. So we added a RR caching all get resources, all list
>>>> resources, all queries etc. And it turned out that the gain was close to
>>>> zero. Each operation in Oak was pretty fast, but if you execute it
>>>> hundreds of times during a request, even a fast call becomes the
>>>> bottleneck. So again, usually the problem is in the upper layers, above
>>>> Sling's resource api.
>>>>
>>> I don't want to create a transparent caching layer on a RR level to store
>>> objects which have been requested from the repository. I know that this was
>>> tried in the past, and did not provide the benefits we intended it to have.
>>> I just want to have a simple store (map) on the RR, where I can store all
>>> types of objects, which share the same lifecycle as the resource resolver.
>>> This is handled explicitly, and the developer is responsible to make sure,
>>> that indeed the lifecycles of the resource resolver and the object which
>>> are stored in this temporary storage are compatible. If the developer
>>> decides to store resources in there, and these resources might change over
>>> the lifetime of this (potentially long-living) resource resolver, so be it.
>>> There should not be any magic, it's just a simple map.
>>> Jörg
>>
>> -- 
>> Carsten Ziegeler
>> Adobe
>> cziegeler@apache.org <ma...@apache.org>
> 

-- 
Carsten Ziegeler
Adobe
cziegeler@apache.org

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Roy Teeuwen <ro...@teeuwen.be>.
Hey Carsten,

Could you clarify why you need to clean up the threadlocal when it is bound to the thread of a request? Don't the threads of a request get garbage collected, and by that way also the ThreadLocal's bound to it

Greets,
Roy

> On 17 Oct 2021, at 09:57, Carsten Ziegeler <cz...@apache.org> wrote:
> 
> Thanks Jörg, initially I got your use case wrong.
> 
> As several people have adviced here now, using an adapter factory should give you what you need.
> 
> A more ugly version would be to use a thread local and clean it up when the request finishes (via a request listener)
> 
> Another option would be to write a custom resource provider, but that might be even more a hack than anything else.
> 
> Regards
> Carsten
> 
> Am 16.10.2021 um 16:41 schrieb Jörg Hoh:
>> Hi Carsten,
>> Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
>> cziegeler@apache.org>:
>>> I don't think that the RR is the right place for this.
>>> 
>>> If the use case is that during a request the exact same query is
>>> executed more than once, then caching the result of the query in Sling
>>> is probably not resulting in the wanted gain. I think it is much better
>>> to avoid executing the query multiple times in the first place -
>>> 
>> That's what I want to achieve. But where should I store the result of the
>> query so I don't need to execute the query multiple times? As said, I don't
>> have access to a store, which allows me to store this result within a
>> well-defined scope (which is identical to the lifetime of a resource
>> resolver). And I don't want to roll such a logic in my own, because it's
>> definitely not easy to get that right.
>>> executing the query is one part but then processing the result is
>>> another. And you want to avoid this as well.
>>> 
>> In every invocation of my method a different resource is passed, and I need
>> to evaluate the result of that query for this resource. To be exact, I
>> transform the result of the query into other domain objects, and to
>> optimize the performance of this evaluation, I might even store these
>> domain objects in a different structure than just a list.
>>> 
>>> In the past 12 years or so we tried various POCs with caching in the RR.
>>> Always under the (wrong) assumption that Oak/Jackrabbit is the
>>> bottleneck. So we added a RR caching all get resources, all list
>>> resources, all queries etc. And it turned out that the gain was close to
>>> zero. Each operation in Oak was pretty fast, but if you execute it
>>> hundreds of times during a request, even a fast call becomes the
>>> bottleneck. So again, usually the problem is in the upper layers, above
>>> Sling's resource api.
>>> 
>> I don't want to create a transparent caching layer on a RR level to store
>> objects which have been requested from the repository. I know that this was
>> tried in the past, and did not provide the benefits we intended it to have.
>> I just want to have a simple store (map) on the RR, where I can store all
>> types of objects, which share the same lifecycle as the resource resolver.
>> This is handled explicitly, and the developer is responsible to make sure,
>> that indeed the lifecycles of the resource resolver and the object which
>> are stored in this temporary storage are compatible. If the developer
>> decides to store resources in there, and these resources might change over
>> the lifetime of this (potentially long-living) resource resolver, so be it.
>> There should not be any magic, it's just a simple map.
>> Jörg
> 
> -- 
> Carsten Ziegeler
> Adobe
> cziegeler@apache.org <ma...@apache.org>

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Carsten Ziegeler <cz...@apache.org>.
Thanks Jörg, initially I got your use case wrong.

As several people have adviced here now, using an adapter factory should 
give you what you need.

A more ugly version would be to use a thread local and clean it up when 
the request finishes (via a request listener)

Another option would be to write a custom resource provider, but that 
might be even more a hack than anything else.

Regards
Carsten

Am 16.10.2021 um 16:41 schrieb Jörg Hoh:
> Hi Carsten,
> 
> Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
> cziegeler@apache.org>:
> 
>> I don't think that the RR is the right place for this.
>>
>> If the use case is that during a request the exact same query is
>> executed more than once, then caching the result of the query in Sling
>> is probably not resulting in the wanted gain. I think it is much better
>> to avoid executing the query multiple times in the first place -
>>
> 
> That's what I want to achieve. But where should I store the result of the
> query so I don't need to execute the query multiple times? As said, I don't
> have access to a store, which allows me to store this result within a
> well-defined scope (which is identical to the lifetime of a resource
> resolver). And I don't want to roll such a logic in my own, because it's
> definitely not easy to get that right.
> 
> 
>> executing the query is one part but then processing the result is
>> another. And you want to avoid this as well.
>>
> 
> In every invocation of my method a different resource is passed, and I need
> to evaluate the result of that query for this resource. To be exact, I
> transform the result of the query into other domain objects, and to
> optimize the performance of this evaluation, I might even store these
> domain objects in a different structure than just a list.
> 
> 
> 
>>
>> In the past 12 years or so we tried various POCs with caching in the RR.
>> Always under the (wrong) assumption that Oak/Jackrabbit is the
>> bottleneck. So we added a RR caching all get resources, all list
>> resources, all queries etc. And it turned out that the gain was close to
>> zero. Each operation in Oak was pretty fast, but if you execute it
>> hundreds of times during a request, even a fast call becomes the
>> bottleneck. So again, usually the problem is in the upper layers, above
>> Sling's resource api.
>>
> 
> I don't want to create a transparent caching layer on a RR level to store
> objects which have been requested from the repository. I know that this was
> tried in the past, and did not provide the benefits we intended it to have.
> I just want to have a simple store (map) on the RR, where I can store all
> types of objects, which share the same lifecycle as the resource resolver.
> 
> This is handled explicitly, and the developer is responsible to make sure,
> that indeed the lifecycles of the resource resolver and the object which
> are stored in this temporary storage are compatible. If the developer
> decides to store resources in there, and these resources might change over
> the lifetime of this (potentially long-living) resource resolver, so be it.
> There should not be any magic, it's just a simple map.
> 
> Jörg
> 

-- 
Carsten Ziegeler
Adobe
cziegeler@apache.org

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Jörg Hoh <jh...@googlemail.com.INVALID>.
Hi Carsten,

Am Sa., 16. Okt. 2021 um 09:41 Uhr schrieb Carsten Ziegeler <
cziegeler@apache.org>:

> I don't think that the RR is the right place for this.
>
> If the use case is that during a request the exact same query is
> executed more than once, then caching the result of the query in Sling
> is probably not resulting in the wanted gain. I think it is much better
> to avoid executing the query multiple times in the first place -
>

That's what I want to achieve. But where should I store the result of the
query so I don't need to execute the query multiple times? As said, I don't
have access to a store, which allows me to store this result within a
well-defined scope (which is identical to the lifetime of a resource
resolver). And I don't want to roll such a logic in my own, because it's
definitely not easy to get that right.


> executing the query is one part but then processing the result is
> another. And you want to avoid this as well.
>

In every invocation of my method a different resource is passed, and I need
to evaluate the result of that query for this resource. To be exact, I
transform the result of the query into other domain objects, and to
optimize the performance of this evaluation, I might even store these
domain objects in a different structure than just a list.



>
> In the past 12 years or so we tried various POCs with caching in the RR.
> Always under the (wrong) assumption that Oak/Jackrabbit is the
> bottleneck. So we added a RR caching all get resources, all list
> resources, all queries etc. And it turned out that the gain was close to
> zero. Each operation in Oak was pretty fast, but if you execute it
> hundreds of times during a request, even a fast call becomes the
> bottleneck. So again, usually the problem is in the upper layers, above
> Sling's resource api.
>

I don't want to create a transparent caching layer on a RR level to store
objects which have been requested from the repository. I know that this was
tried in the past, and did not provide the benefits we intended it to have.
I just want to have a simple store (map) on the RR, where I can store all
types of objects, which share the same lifecycle as the resource resolver.

This is handled explicitly, and the developer is responsible to make sure,
that indeed the lifecycles of the resource resolver and the object which
are stored in this temporary storage are compatible. If the developer
decides to store resources in there, and these resources might change over
the lifetime of this (potentially long-living) resource resolver, so be it.
There should not be any magic, it's just a simple map.

Jörg

-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Carsten Ziegeler <cz...@apache.org>.
I don't think that the RR is the right place for this.

If the use case is that during a request the exact same query is 
executed more than once, then caching the result of the query in Sling 
is probably not resulting in the wanted gain. I think it is much better 
to avoid executing the query multiple times in the first place - 
executing the query is one part but then processing the result is 
another. And you want to avoid this as well.

In the past 12 years or so we tried various POCs with caching in the RR. 
Always under the (wrong) assumption that Oak/Jackrabbit is the 
bottleneck. So we added a RR caching all get resources, all list 
resources, all queries etc. And it turned out that the gain was close to 
zero. Each operation in Oak was pretty fast, but if you execute it 
hundreds of times during a request, even a fast call becomes the 
bottleneck. So again, usually the problem is in the upper layers, above 
Sling's resource api.

Now, lets assume, you want to cache in the RR - this becomes quickly 
very tricky. When do you clear/update the cache? The quick answer is 
usually "we cache per request" - which already comes with the first 
problem, how do you know that a RR is bound to a request? But even if 
you manage to find this out, there are methods which perform 
modifications, so you need to update the cache based on them. Similar 
there is RR refresh(), especially used by long running RRs where you 
want to clear the cache. But implementing this in the RR is not 
sufficient as the resource api might be bypassed and JCR API might be 
directly used. Or in other words, maintaining the cache correctly is 
impossible in the RR layer.

So, if things need to be cached, this should be done in Oak/Jackrabbit. 
This is the only place where you know about all operations and can deal 
with the cache appropriately.

If caching something in the RR really provides benefits and does not 
create headaches, then the resource provider API already provides all 
you need. Stateful resource providers like the JCR one maintain a state 
object per resource resolver; for JCR this basically holds the JCR 
session but it can be extended to hold a cache or similar, too. The 
state object is discarded along with the RR.

In summary:
- avoid redundant calls in the first place
- if needed, cache at the right place (Oak)
- state can be maintained per RR without any api addition

Regards
Carsten
Am 11.10.2021 um 16:18 schrieb Jörg Hoh:
> Hi,
> 
> I am investigating a reasonable way to cache objects which have the same
> livecycle as a resource resolver, because they are retrieved or derived
> from it.
> For example this could be the result of query, which is used at multiple
> locations within my application, all in the context of the same request /
> resource resolver. Right now my only way is to store them in request
> attributes, but that does not feel right, because it rather relates to the
> resource resolver (and not the request itself), and it requires me to pass
> always the request along although I just need the resource resolver. Also
> in some situations there is alrady existing code, which just passes along
> the resource resolver and no request (maybe the operation does not even
> work in the context of a request).
> 
> I checked the API of the ResourceResolver, and it comes with
> getAttributeNames() and getAttribute(), but these properties are populated
> during creation and cannot be modified afterwards. So not helpful here.
> 
> I envision to have a WeakHashMap<String,Object>, which can be reached via a
> new API call "getTemporaryStorage()" (I am very open for a better name),
> which we could implement as default method right in the interface.
> 
> WDYT?
> 
> 
> 

-- 
Carsten Ziegeler
Adobe
cziegeler@apache.org

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Carsten Ziegeler <cz...@apache.org>.
But you can use adaptTo to retrieve your cache via an adapter factory. 
As Radu points out, those objects are cached/created once per Adaptable. 
This binds the lifetime of your cache to the lifetime of the Adaptable 
and you get what you need for free without any api additions.

Carsten

Am 15.10.2021 um 18:37 schrieb Jörg Hoh:
> Hi Radu,
> 
> not necessarily. My case at hand is something like this:
> 
> I have a API method, which takes a Resource to perform some checks on it.
> As part of these checks it adapts the Resource to the ResourceResolver,
> performs a search using this resource resolver and uses the results to do
> the checks on the passed resource. This method is called for a number of
> resources, which belong to the same resource resolver, and also the search
> is always the same and has the same search results.
> The idea is to run the query only once and and store the search result
> along with the resource resolver, so I can reuse it on subseqent
> invocations. I don't want to manage an external / dedicated global cache
> (security!) and I also cannot realistically change the method signature to
> pass an additional parameter (the cache).
> 
> So in this case I don't want to cache the result of an adaptTo() call.
> 
> Jörg
> 
> 
> Am Fr., 15. Okt. 2021 um 15:23 Uhr schrieb Radu Cotescu <ra...@apache.org>:
> 
>> Hi Jörg,
>>
>> Are those objects generated based on a ResourceResolver#adaptTo call? If
>> so, we should already have a built-in cache via SlingAdaptable [0].
>>
>> Thanks,
>> Radu
>>
>>> On 13 Oct 2021, at 09:28, Jörg Hoh <jh...@googlemail.com.INVALID>
>> wrote:
>>>
>>> Hi Robert,
>>>
>>> Am Di., 12. Okt. 2021 um 09:35 Uhr schrieb Robert Munteanu <
>>> rombert@apache.org>:
>>>
>>>>
>>>>
>>>> Is the basic need to only know when the resource resolver is going to
>>>> be closed in order to flush the cache?
>>>>
>>>
>>> That is not my concern at all. In that case I would require the value to
>> be
>>> a Closeable. But that would make it more complex to store simple objects,
>>> and it would also require to change the "close" methods all
>> implementations
>>> of the ResourceResolver interface. By just adding a default method
>>> "getTemporaryStorage()" we can avoid that. Of course this would mandate
>>> that these cached objects do not require an explicit close(). But the
>> only
>>> relevant cases I saw in the last years in the Sling world are
>>> ResourceResolvers and JCR Sessions. And I hope that you don't store other
>>> resourceResolvers in such a cache :-)
>>>
>>> The problem I want to solve is
>>> * store objects derived from a resource resolver along the resolver,
>> because
>>> * existing API and code, which is just passing in a resource resolver,
>> but
>>> not a "cache object".
>>>
>>>
>>>
>>>>
>>>> If that is the case we could expose a hook that notifies interested
>>>> parties about a ResourceResolver closing so they can flush the caches.
>>>>
>>>> The benefit would IMO be that we keep the API lean and allow creating
>>>> new bundles that implement the caching logic.
>>>>
>>>
>>> I just call it "cache" because I want to emphasize that I cannot rely on
>>> the presence of a key-value pair in that map. I don't think that we need
>> a
>>> very complex logic there by default. But if an implementation could
>> provide
>>> their own implementation of the default method returning a more adjusted.
>>>
>>> Jörg
>>>
>>> --
>>> Cheers,
>>> Jörg Hoh,
>>>
>>> https://cqdump.joerghoh.de
>>> Twitter: @joerghoh
>>
>>
> 

-- 
Carsten Ziegeler
Adobe
cziegeler@apache.org

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Jörg Hoh <jh...@googlemail.com.INVALID>.
Hi Radu,

not necessarily. My case at hand is something like this:

I have a API method, which takes a Resource to perform some checks on it.
As part of these checks it adapts the Resource to the ResourceResolver,
performs a search using this resource resolver and uses the results to do
the checks on the passed resource. This method is called for a number of
resources, which belong to the same resource resolver, and also the search
is always the same and has the same search results.
The idea is to run the query only once and and store the search result
along with the resource resolver, so I can reuse it on subseqent
invocations. I don't want to manage an external / dedicated global cache
(security!) and I also cannot realistically change the method signature to
pass an additional parameter (the cache).

So in this case I don't want to cache the result of an adaptTo() call.

Jörg


Am Fr., 15. Okt. 2021 um 15:23 Uhr schrieb Radu Cotescu <ra...@apache.org>:

> Hi Jörg,
>
> Are those objects generated based on a ResourceResolver#adaptTo call? If
> so, we should already have a built-in cache via SlingAdaptable [0].
>
> Thanks,
> Radu
>
> > On 13 Oct 2021, at 09:28, Jörg Hoh <jh...@googlemail.com.INVALID>
> wrote:
> >
> > Hi Robert,
> >
> > Am Di., 12. Okt. 2021 um 09:35 Uhr schrieb Robert Munteanu <
> > rombert@apache.org>:
> >
> >>
> >>
> >> Is the basic need to only know when the resource resolver is going to
> >> be closed in order to flush the cache?
> >>
> >
> > That is not my concern at all. In that case I would require the value to
> be
> > a Closeable. But that would make it more complex to store simple objects,
> > and it would also require to change the "close" methods all
> implementations
> > of the ResourceResolver interface. By just adding a default method
> > "getTemporaryStorage()" we can avoid that. Of course this would mandate
> > that these cached objects do not require an explicit close(). But the
> only
> > relevant cases I saw in the last years in the Sling world are
> > ResourceResolvers and JCR Sessions. And I hope that you don't store other
> > resourceResolvers in such a cache :-)
> >
> > The problem I want to solve is
> > * store objects derived from a resource resolver along the resolver,
> because
> > * existing API and code, which is just passing in a resource resolver,
> but
> > not a "cache object".
> >
> >
> >
> >>
> >> If that is the case we could expose a hook that notifies interested
> >> parties about a ResourceResolver closing so they can flush the caches.
> >>
> >> The benefit would IMO be that we keep the API lean and allow creating
> >> new bundles that implement the caching logic.
> >>
> >
> > I just call it "cache" because I want to emphasize that I cannot rely on
> > the presence of a key-value pair in that map. I don't think that we need
> a
> > very complex logic there by default. But if an implementation could
> provide
> > their own implementation of the default method returning a more adjusted.
> >
> > Jörg
> >
> > --
> > Cheers,
> > Jörg Hoh,
> >
> > https://cqdump.joerghoh.de
> > Twitter: @joerghoh
>
>

-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Radu Cotescu <ra...@apache.org>.
Hi Jörg,

Are those objects generated based on a ResourceResolver#adaptTo call? If so, we should already have a built-in cache via SlingAdaptable [0].

Thanks,
Radu

> On 13 Oct 2021, at 09:28, Jörg Hoh <jh...@googlemail.com.INVALID> wrote:
> 
> Hi Robert,
> 
> Am Di., 12. Okt. 2021 um 09:35 Uhr schrieb Robert Munteanu <
> rombert@apache.org>:
> 
>> 
>> 
>> Is the basic need to only know when the resource resolver is going to
>> be closed in order to flush the cache?
>> 
> 
> That is not my concern at all. In that case I would require the value to be
> a Closeable. But that would make it more complex to store simple objects,
> and it would also require to change the "close" methods all implementations
> of the ResourceResolver interface. By just adding a default method
> "getTemporaryStorage()" we can avoid that. Of course this would mandate
> that these cached objects do not require an explicit close(). But the only
> relevant cases I saw in the last years in the Sling world are
> ResourceResolvers and JCR Sessions. And I hope that you don't store other
> resourceResolvers in such a cache :-)
> 
> The problem I want to solve is
> * store objects derived from a resource resolver along the resolver, because
> * existing API and code, which is just passing in a resource resolver, but
> not a "cache object".
> 
> 
> 
>> 
>> If that is the case we could expose a hook that notifies interested
>> parties about a ResourceResolver closing so they can flush the caches.
>> 
>> The benefit would IMO be that we keep the API lean and allow creating
>> new bundles that implement the caching logic.
>> 
> 
> I just call it "cache" because I want to emphasize that I cannot rely on
> the presence of a key-value pair in that map. I don't think that we need a
> very complex logic there by default. But if an implementation could provide
> their own implementation of the default method returning a more adjusted.
> 
> Jörg
> 
> -- 
> Cheers,
> Jörg Hoh,
> 
> https://cqdump.joerghoh.de
> Twitter: @joerghoh


Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Jörg Hoh <jh...@googlemail.com.INVALID>.
Hi Robert,

Am Di., 12. Okt. 2021 um 09:35 Uhr schrieb Robert Munteanu <
rombert@apache.org>:

>
>
> Is the basic need to only know when the resource resolver is going to
> be closed in order to flush the cache?
>

That is not my concern at all. In that case I would require the value to be
a Closeable. But that would make it more complex to store simple objects,
and it would also require to change the "close" methods all implementations
of the ResourceResolver interface. By just adding a default method
"getTemporaryStorage()" we can avoid that. Of course this would mandate
that these cached objects do not require an explicit close(). But the only
relevant cases I saw in the last years in the Sling world are
ResourceResolvers and JCR Sessions. And I hope that you don't store other
resourceResolvers in such a cache :-)

The problem I want to solve is
* store objects derived from a resource resolver along the resolver, because
* existing API and code, which is just passing in a resource resolver, but
not a "cache object".



>
> If that is the case we could expose a hook that notifies interested
> parties about a ResourceResolver closing so they can flush the caches.
>
> The benefit would IMO be that we keep the API lean and allow creating
> new bundles that implement the caching logic.
>

I just call it "cache" because I want to emphasize that I cannot rely on
the presence of a key-value pair in that map. I don't think that we need a
very complex logic there by default. But if an implementation could provide
their own implementation of the default method returning a more adjusted.

Jörg

-- 
Cheers,
Jörg Hoh,

https://cqdump.joerghoh.de
Twitter: @joerghoh

Re: Cache objects retrieved/derived from a ResourceResolver

Posted by Robert Munteanu <ro...@apache.org>.
Hi,

On Mon, 2021-10-11 at 16:18 +0200, Jörg Hoh wrote:
> Hi,
> 
> I envision to have a WeakHashMap<String,Object>, which can be reached
> via a
> new API call "getTemporaryStorage()" (I am very open for a better
> name),
> which we could implement as default method right in the interface.

Is the basic need to only know when the resource resolver is going to
be closed in order to flush the cache?

If that is the case we could expose a hook that notifies interested
parties about a ResourceResolver closing so they can flush the caches. 

The benefit would IMO be that we keep the API lean and allow creating
new bundles that implement the caching logic.

Thanks,
Robert