You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@sling.apache.org by "Roll, Kevin" <Ke...@idexx.com> on 2015/11/19 17:22:16 UTC

ResourceResolver and thread safety

We are experiencing unexplained corruption in our Sling repository. This takes the form of

11:18:00.203 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
javax.jcr.InvalidItemStateException: Item does not exist anymore: 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
                at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
I'm attempting to track down what I might be doing wrong that could cause this to happen. I was a little scared when I read the docs for ResourceResolver and I found that it is not thread safe. However, after logging out the instances I get from ResourceResolverFactory.getAdministrativeResourceResolver() it seems that I am getting unique instances each time. As all of these tasks and jobs run on separate threads it would be a big problem if I inadvertently reused a ResourceResolver. Is this something I need to worry about?

Also, is it safe/proper to call ResourceResolver.commit() multiple times, or should I do this once?

Thanks!


Re: ResourceResolver and thread safety

Posted by Carsten Ziegeler <cz...@apache.org>.
Roll, Kevin wrote
> Thanks for the confirmation, Carsten... we will ignore these warnings. It seems a bit odd to me that the event handler triggers when a job is modified; I understand the "everything is a Resource" philosophy, but in my mind the job is not part of my data, it's a Sling internal. Somebody probably has a use case where they want to see this but for our app it's just noise. It would be nice to be able to annotate the EventHandler and filter down the things it will actually respond to.
> 
Yepp, that's in the works for the next update of Sling :)

Carsten

> -----Original Message-----
> From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
> Sent: Wednesday, November 25, 2015 8:13 AM
> To: users@sling.apache.org
> Subject: Re: ResourceResolver and thread safety
> 
> Roll, Kevin wrote
>> Even more interesting:
>>
>> 12:28:00.502 INFO  [Thread-33] <c.i.l.r.e.ResourceAddedHandler> OSGi resource added event received: /var/eventing/jobs/assigned/db2af778-5713-4723-ad27-12cd89d2ddb0/com.idexx.labstation.repository.fileuploaded.job/2015/11/24/12/28/com.idexx.labstation.repository.fileuploaded.job_db2af778-5713-4723-ad27-12cd89d2ddb0_2345
>> 12:28:00.518 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.apache.jackrabbit.core.ItemManager> f3092eb9-f31e-4299-a976-01caf508018d: failed to determine path to
>> 12:28:00.520 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node f3092eb9-f31e-4299-a976-01caf508018d
>> javax.jcr.InvalidItemStateException: Item does not exist anymore: f3092eb9-f31e-4299-a976-01caf508018d
>> 	at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
>> 	at org.apache.jackrabbit.core.ItemImpl.sanityCheck(ItemImpl.java:104)
>> 	at org.apache.jackrabbit.core.NodeImpl.getPrimaryNodeType(NodeImpl.java:2337)
>> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrItemResource.getResourceTypeForNode(JcrItemResource.java:121)
>> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.getResourceType(JcrNodeResource.java:95)
>> 	at org.apache.sling.jcr.resource.internal.JcrResourceListener.processOsgiEventQueue(JcrResourceListener.java:351)
>>
>> I looked at the code in JcrResourceListener and it is simply trying to get the resource type for the node that triggered the TOPIC_RESOURCE_ADDED handler. But, the thing that triggered the event handler is a *job*, and presumably it may have been processed and removed in the short time before the event handler runs. Is this possibly a bug in Sling?
>>
> The above error can be ignored, unfortunately afaik we can't surpress
> logging it.
> What happens is exactly as you assume: the resource listener is trying
> to process an event for a resource and before it gets to the point of
> processing it (getting the resource type), someone else already removed
> that node. This can happen, is nothing dramatic, but just looks wired in
> the logs.
> If you run the integration tests for the job module in Sling, you'll
> notice a lot of those messages. But the tests proof that the
> functionality is not affected by it
> 
> Regards
> Carsten
> 


 
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: ResourceResolver and thread safety

Posted by "Roll, Kevin" <Ke...@idexx.com>.
Thanks for the confirmation, Carsten... we will ignore these warnings. It seems a bit odd to me that the event handler triggers when a job is modified; I understand the "everything is a Resource" philosophy, but in my mind the job is not part of my data, it's a Sling internal. Somebody probably has a use case where they want to see this but for our app it's just noise. It would be nice to be able to annotate the EventHandler and filter down the things it will actually respond to.

-----Original Message-----
From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
Sent: Wednesday, November 25, 2015 8:13 AM
To: users@sling.apache.org
Subject: Re: ResourceResolver and thread safety

Roll, Kevin wrote
> Even more interesting:
> 
> 12:28:00.502 INFO  [Thread-33] <c.i.l.r.e.ResourceAddedHandler> OSGi resource added event received: /var/eventing/jobs/assigned/db2af778-5713-4723-ad27-12cd89d2ddb0/com.idexx.labstation.repository.fileuploaded.job/2015/11/24/12/28/com.idexx.labstation.repository.fileuploaded.job_db2af778-5713-4723-ad27-12cd89d2ddb0_2345
> 12:28:00.518 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.apache.jackrabbit.core.ItemManager> f3092eb9-f31e-4299-a976-01caf508018d: failed to determine path to
> 12:28:00.520 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node f3092eb9-f31e-4299-a976-01caf508018d
> javax.jcr.InvalidItemStateException: Item does not exist anymore: f3092eb9-f31e-4299-a976-01caf508018d
> 	at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
> 	at org.apache.jackrabbit.core.ItemImpl.sanityCheck(ItemImpl.java:104)
> 	at org.apache.jackrabbit.core.NodeImpl.getPrimaryNodeType(NodeImpl.java:2337)
> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrItemResource.getResourceTypeForNode(JcrItemResource.java:121)
> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.getResourceType(JcrNodeResource.java:95)
> 	at org.apache.sling.jcr.resource.internal.JcrResourceListener.processOsgiEventQueue(JcrResourceListener.java:351)
> 
> I looked at the code in JcrResourceListener and it is simply trying to get the resource type for the node that triggered the TOPIC_RESOURCE_ADDED handler. But, the thing that triggered the event handler is a *job*, and presumably it may have been processed and removed in the short time before the event handler runs. Is this possibly a bug in Sling?
> 
The above error can be ignored, unfortunately afaik we can't surpress
logging it.
What happens is exactly as you assume: the resource listener is trying
to process an event for a resource and before it gets to the point of
processing it (getting the resource type), someone else already removed
that node. This can happen, is nothing dramatic, but just looks wired in
the logs.
If you run the integration tests for the job module in Sling, you'll
notice a lot of those messages. But the tests proof that the
functionality is not affected by it

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: ResourceResolver and thread safety

Posted by Carsten Ziegeler <cz...@apache.org>.
Roll, Kevin wrote
> Even more interesting:
> 
> 12:28:00.502 INFO  [Thread-33] <c.i.l.r.e.ResourceAddedHandler> OSGi resource added event received: /var/eventing/jobs/assigned/db2af778-5713-4723-ad27-12cd89d2ddb0/com.idexx.labstation.repository.fileuploaded.job/2015/11/24/12/28/com.idexx.labstation.repository.fileuploaded.job_db2af778-5713-4723-ad27-12cd89d2ddb0_2345
> 12:28:00.518 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.apache.jackrabbit.core.ItemManager> f3092eb9-f31e-4299-a976-01caf508018d: failed to determine path to
> 12:28:00.520 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node f3092eb9-f31e-4299-a976-01caf508018d
> javax.jcr.InvalidItemStateException: Item does not exist anymore: f3092eb9-f31e-4299-a976-01caf508018d
> 	at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
> 	at org.apache.jackrabbit.core.ItemImpl.sanityCheck(ItemImpl.java:104)
> 	at org.apache.jackrabbit.core.NodeImpl.getPrimaryNodeType(NodeImpl.java:2337)
> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrItemResource.getResourceTypeForNode(JcrItemResource.java:121)
> 	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.getResourceType(JcrNodeResource.java:95)
> 	at org.apache.sling.jcr.resource.internal.JcrResourceListener.processOsgiEventQueue(JcrResourceListener.java:351)
> 
> I looked at the code in JcrResourceListener and it is simply trying to get the resource type for the node that triggered the TOPIC_RESOURCE_ADDED handler. But, the thing that triggered the event handler is a *job*, and presumably it may have been processed and removed in the short time before the event handler runs. Is this possibly a bug in Sling?
> 
The above error can be ignored, unfortunately afaik we can't surpress
logging it.
What happens is exactly as you assume: the resource listener is trying
to process an event for a resource and before it gets to the point of
processing it (getting the resource type), someone else already removed
that node. This can happen, is nothing dramatic, but just looks wired in
the logs.
If you run the integration tests for the job module in Sling, you'll
notice a lot of those messages. But the tests proof that the
functionality is not affected by it

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: ResourceResolver and thread safety

Posted by "Roll, Kevin" <Ke...@idexx.com>.
Even more interesting:

12:28:00.502 INFO  [Thread-33] <c.i.l.r.e.ResourceAddedHandler> OSGi resource added event received: /var/eventing/jobs/assigned/db2af778-5713-4723-ad27-12cd89d2ddb0/com.idexx.labstation.repository.fileuploaded.job/2015/11/24/12/28/com.idexx.labstation.repository.fileuploaded.job_db2af778-5713-4723-ad27-12cd89d2ddb0_2345
12:28:00.518 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.apache.jackrabbit.core.ItemManager> f3092eb9-f31e-4299-a976-01caf508018d: failed to determine path to
12:28:00.520 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node f3092eb9-f31e-4299-a976-01caf508018d
javax.jcr.InvalidItemStateException: Item does not exist anymore: f3092eb9-f31e-4299-a976-01caf508018d
	at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
	at org.apache.jackrabbit.core.ItemImpl.sanityCheck(ItemImpl.java:104)
	at org.apache.jackrabbit.core.NodeImpl.getPrimaryNodeType(NodeImpl.java:2337)
	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrItemResource.getResourceTypeForNode(JcrItemResource.java:121)
	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.getResourceType(JcrNodeResource.java:95)
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.processOsgiEventQueue(JcrResourceListener.java:351)

I looked at the code in JcrResourceListener and it is simply trying to get the resource type for the node that triggered the TOPIC_RESOURCE_ADDED handler. But, the thing that triggered the event handler is a *job*, and presumably it may have been processed and removed in the short time before the event handler runs. Is this possibly a bug in Sling?


-----Original Message-----
From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
Sent: Friday, November 20, 2015 11:39 AM
To: users@sling.apache.org
Subject: Re: ResourceResolver and thread safety

Am 20.11.15 um 09:10 schrieb Roll, Kevin:
> OK, allow me to broaden the question. We are getting mysterious repository corruption as previously detailed. I'm fairly certain that this is related to a task I have that scans Resources and sets some node properties but I can't pin down what I might be doing wrong. Are there any common mistakes that might cause the "Item does not exist anymore" exception? I can describe my code in more depth if that would help, but it is rather involved.

This can have several reasons. Usually when I'm seeing this it happens
that concurrent resource resolvers are involved. While for example one
resolver traverses a tree, another one is modifying it.

Carsten


RE: ResourceResolver and thread safety

Posted by "Roll, Kevin" <Ke...@idexx.com>.
I've realized this is coming from a slightly different direction than I thought. The trouble began when I added a new TOPIC_RESOURCE_CHANGED handler that works alongside an existing TOPIC_RESOURCE_ADDED handler. The exception trace looks like this:

javax.jcr.InvalidItemStateException: Item does not exist anymore: 32b1cc4d-eb62-4a51-8a06-857369fb6b35
	at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
	at org.apache.jackrabbit.core.ItemImpl.sanityCheck(ItemImpl.java:104)
	at org.apache.jackrabbit.core.NodeImpl.getPrimaryNodeType(NodeImpl.java:2337)
	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrItemResource.getResourceTypeForNode(JcrItemResource.java:121)
	at org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.getResourceType(JcrNodeResource.java:95)
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.processOsgiEventQueue(JcrResourceListener.java:351)

I had not realized the significance of that last line - this is not my code but the event processing thread. After some investigation I realized that BOTH of these event handlers are triggered simultaneously under certain conditions, and they run on separate threads. They do not do any Resource modification, but they do both look at the same Resource and create instances of the same Job class. Synchronizing that Job creation did not help, but I think there is some sort of concurrency problem related to these event handlers. Does that sound plausible?


-----Original Message-----
From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
Sent: Friday, November 20, 2015 11:39 AM
To: users@sling.apache.org
Subject: Re: ResourceResolver and thread safety

Am 20.11.15 um 09:10 schrieb Roll, Kevin:
> OK, allow me to broaden the question. We are getting mysterious repository corruption as previously detailed. I'm fairly certain that this is related to a task I have that scans Resources and sets some node properties but I can't pin down what I might be doing wrong. Are there any common mistakes that might cause the "Item does not exist anymore" exception? I can describe my code in more depth if that would help, but it is rather involved.

This can have several reasons. Usually when I'm seeing this it happens
that concurrent resource resolvers are involved. While for example one
resolver traverses a tree, another one is modifying it.

Carsten


Re: ResourceResolver and thread safety

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 20.11.15 um 09:10 schrieb Roll, Kevin:
> OK, allow me to broaden the question. We are getting mysterious repository corruption as previously detailed. I'm fairly certain that this is related to a task I have that scans Resources and sets some node properties but I can't pin down what I might be doing wrong. Are there any common mistakes that might cause the "Item does not exist anymore" exception? I can describe my code in more depth if that would help, but it is rather involved.

This can have several reasons. Usually when I'm seeing this it happens
that concurrent resource resolvers are involved. While for example one
resolver traverses a tree, another one is modifying it.

Carsten


RE: ResourceResolver and thread safety

Posted by "Roll, Kevin" <Ke...@idexx.com>.
OK, allow me to broaden the question. We are getting mysterious repository corruption as previously detailed. I'm fairly certain that this is related to a task I have that scans Resources and sets some node properties but I can't pin down what I might be doing wrong. Are there any common mistakes that might cause the "Item does not exist anymore" exception? I can describe my code in more depth if that would help, but it is rather involved.


-----Original Message-----
From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
Sent: Thursday, November 19, 2015 4:44 PM
To: users@sling.apache.org
Subject: Re: ResourceResolver and thread safety

ResourceResolver is as stated not thread safe, therefore it should not
be used concurrently by different threads. However, if you create the
resource resolver through the factory yourself you can pass it on to
another thread.

ResourceResolver.commit() can be called multiple times.

HTH
Regards
Carsten

Am 19.11.15 um 10:22 schrieb Roll, Kevin:
> We are experiencing unexplained corruption in our Sling repository. This takes the form of
> 
> 11:18:00.203 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
> javax.jcr.InvalidItemStateException: Item does not exist anymore: 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
>                 at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
> I'm attempting to track down what I might be doing wrong that could cause this to happen. I was a little scared when I read the docs for ResourceResolver and I found that it is not thread safe. However, after logging out the instances I get from ResourceResolverFactory.getAdministrativeResourceResolver() it seems that I am getting unique instances each time. As all of these tasks and jobs run on separate threads it would be a big problem if I inadvertently reused a ResourceResolver. Is this something I need to worry about?
> 
> Also, is it safe/proper to call ResourceResolver.commit() multiple times, or should I do this once?
> 
> Thanks!
> 
> 


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: ResourceResolver and thread safety

Posted by Carsten Ziegeler <cz...@apache.org>.
ResourceResolver is as stated not thread safe, therefore it should not
be used concurrently by different threads. However, if you create the
resource resolver through the factory yourself you can pass it on to
another thread.

ResourceResolver.commit() can be called multiple times.

HTH
Regards
Carsten

Am 19.11.15 um 10:22 schrieb Roll, Kevin:
> We are experiencing unexplained corruption in our Sling repository. This takes the form of
> 
> 11:18:00.203 ERROR [Apache Sling JCR Resource Event Queue Processor] <o.a.s.j.r.i.h.jcr.JcrNodeResource> Unable to get resource type for node node 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
> javax.jcr.InvalidItemStateException: Item does not exist anymore: 8dfdf81a-8db6-4c6e-be54-43f5ee2220b2
>                 at org.apache.jackrabbit.core.ItemImpl.itemSanityCheck(ItemImpl.java:116)
> I'm attempting to track down what I might be doing wrong that could cause this to happen. I was a little scared when I read the docs for ResourceResolver and I found that it is not thread safe. However, after logging out the instances I get from ResourceResolverFactory.getAdministrativeResourceResolver() it seems that I am getting unique instances each time. As all of these tasks and jobs run on separate threads it would be a big problem if I inadvertently reused a ResourceResolver. Is this something I need to worry about?
> 
> Also, is it safe/proper to call ResourceResolver.commit() multiple times, or should I do this once?
> 
> Thanks!
> 
> 


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org