You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Amit.. Gupta." <am...@adobe.com> on 2013/05/17 13:44:15 UTC

JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Hi Devs,

I have been debugging an issue with sling jobs and finally found that current implementation JobMangerImpl.writeJob can cause inconsistent behaviour (root cause is JcrResourceProvider.create)

Issue that led to this:
I observed that sometime all of my event properties were not being written to the job node. Though the job node was being created. But ultimately JobManager would error out giving following messages:
17.05.2013 16:41:57.578 *WARN* [Apache Sling Job Background Loader] org.apache.sling.event.impl.jobs.JobManagerImpl Discarding job - job topic is missing : /var/eventing/jobs/assigned/826cd21a-6a8f-48cb-b112-768b421af572/slingevent:eventadmin/2013/5/17/16/39/com.adobe.cq.collection.update.job_826cd21a-6a8f-48cb-b112-768b421af572_2
Sometime, my job handler would be called, but event won't have enough properties that I sent to jobManager.
Problem:
There was an issue in my code that was adding a property to the event, which had invalid key i.e. /a/b/c/a.txt and JcrResourceProvider can not persist it. Hence the issue. This is fine, I can correct it.

But the main problem is that this persistence error was never reported in error logs, and job got persisted event though JcrResourceProvider.create threw a PersistenceException. But the job was created with fewer properties with what I intended. This resulted in sometime, my JobHandler being called, but not getting enough properties.

With the debugging, I found that JobManagerImpl.writeJob can cause some inconsistent behaviour due to the way, ResourceUtil.getOrCreateResource and JcrResourceProvider.create.

In this case following happened:
JcrResourceProvider.create threw PE while persisting the property, but the node was already by this time.
ResourceUtil.getOrCreateResource caught the PE, but checked for the existence of resource and hence ignored it.

Now, above implementation is wrong, either JcrResourceProvider should ensure that operation is atomic. Or ResourceUtil.getOrCreateResource should be changed revert changes in case of exception.

WDYT?

Thanks,
Amit


RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by "Amit.. Gupta." <am...@adobe.com>.
Hi Carsten,

I think that change has surfaced one more bug that was perhaps working as a side effect, after I used the latest changed JcrResourceResolver, I get following error in logs
19.05.2013 02:40:37.421 *ERROR* [Thread-33] org.apache.sling.event.impl.dea.DistributedEventReceiver Exception during writing the event to the resource tree. org.apache.sling.api.resource.PersistenceException: Unable to create resource with path /var/eventing/distribution/826cd21a-6a8f-48cb-b112-768b421af572/2013/5/19/2/40/event-372
	at org.apache.sling.event.impl.support.ResourceHelper.getOrCreateResource(ResourceHelper.java:230)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.writeEvent(DistributedEventReceiver.java:208)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.processWriteQueue(DistributedEventReceiver.java:158)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.access$500(DistributedEventReceiver.java:75)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver$1.run(DistributedEventReceiver.java:115)
	at java.lang.Thread.run(Thread.java:662) 

on debugging I found that this event has following property
":sling:jobs:asynchandler" which jcr wont allow

This property is coming from JobConsumer.

Thanks,
-Amit

Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by Carsten Ziegeler <cz...@apache.org>.
Should be fixed in the sling eventing with revision 1484242

Carsten


2013/5/19 Carsten Ziegeler <cz...@apache.org>

> Thanks for reporting, yes this is in fact a problem of the notification
> which is sent by the job handling - it shouldn't contain the async handler.
>
> Carsten
>
>
> 2013/5/18 Amit.. Gupta. <am...@adobe.com>
>
> Minor correction in the mail
>>
>> -----Original Message-----
>> From: Amit.. Gupta.
>> Sent: 19 May 2013 02:53
>> To: dev@sling.apache.org
>> Subject: RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause
>> inconsistent behavior
>>
>> Hi Carsten,
>>
>> I think that change has surfaced one more bug that was perhaps working as
>> a side effect, after I used the latest changed JcrResourceProvider, I get
>> following error in logs
>> 19.05.2013 02:40:37.421 *ERROR* [Thread-33]
>> org.apache.sling.event.impl.dea.DistributedEventReceiver Exception during
>> writing the event to the resource tree.
>> org.apache.sling.api.resource.PersistenceException: Unable to create
>> resource with path
>> /var/eventing/distribution/826cd21a-6a8f-48cb-b112-768b421af572/2013/5/19/2/40/event-372
>>         at
>> org.apache.sling.event.impl.support.ResourceHelper.getOrCreateResource(ResourceHelper.java:230)
>>         at
>> org.apache.sling.event.impl.dea.DistributedEventReceiver.writeEvent(DistributedEventReceiver.java:208)
>>         at
>> org.apache.sling.event.impl.dea.DistributedEventReceiver.processWriteQueue(DistributedEventReceiver.java:158)
>>         at
>> org.apache.sling.event.impl.dea.DistributedEventReceiver.access$500(DistributedEventReceiver.java:75)
>>         at
>> org.apache.sling.event.impl.dea.DistributedEventReceiver$1.run(DistributedEventReceiver.java:115)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>> on debugging I found that this event has following property
>> ":sling:jobs:asynchandler" which jcr wont allow
>>
>> This property is coming from JobConsumer.
>>
>> Thanks,
>> -Amit
>>
>
>
>
> --
> Carsten Ziegeler
> cziegeler@apache.org
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by Carsten Ziegeler <cz...@apache.org>.
Thanks for reporting, yes this is in fact a problem of the notification
which is sent by the job handling - it shouldn't contain the async handler.

Carsten


2013/5/18 Amit.. Gupta. <am...@adobe.com>

> Minor correction in the mail
>
> -----Original Message-----
> From: Amit.. Gupta.
> Sent: 19 May 2013 02:53
> To: dev@sling.apache.org
> Subject: RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause
> inconsistent behavior
>
> Hi Carsten,
>
> I think that change has surfaced one more bug that was perhaps working as
> a side effect, after I used the latest changed JcrResourceProvider, I get
> following error in logs
> 19.05.2013 02:40:37.421 *ERROR* [Thread-33]
> org.apache.sling.event.impl.dea.DistributedEventReceiver Exception during
> writing the event to the resource tree.
> org.apache.sling.api.resource.PersistenceException: Unable to create
> resource with path
> /var/eventing/distribution/826cd21a-6a8f-48cb-b112-768b421af572/2013/5/19/2/40/event-372
>         at
> org.apache.sling.event.impl.support.ResourceHelper.getOrCreateResource(ResourceHelper.java:230)
>         at
> org.apache.sling.event.impl.dea.DistributedEventReceiver.writeEvent(DistributedEventReceiver.java:208)
>         at
> org.apache.sling.event.impl.dea.DistributedEventReceiver.processWriteQueue(DistributedEventReceiver.java:158)
>         at
> org.apache.sling.event.impl.dea.DistributedEventReceiver.access$500(DistributedEventReceiver.java:75)
>         at
> org.apache.sling.event.impl.dea.DistributedEventReceiver$1.run(DistributedEventReceiver.java:115)
>         at java.lang.Thread.run(Thread.java:662)
>
> on debugging I found that this event has following property
> ":sling:jobs:asynchandler" which jcr wont allow
>
> This property is coming from JobConsumer.
>
> Thanks,
> -Amit
>



-- 
Carsten Ziegeler
cziegeler@apache.org

RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by "Amit.. Gupta." <am...@adobe.com>.
Minor correction in the mail 

-----Original Message-----
From: Amit.. Gupta. 
Sent: 19 May 2013 02:53
To: dev@sling.apache.org
Subject: RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Hi Carsten,

I think that change has surfaced one more bug that was perhaps working as a side effect, after I used the latest changed JcrResourceProvider, I get following error in logs
19.05.2013 02:40:37.421 *ERROR* [Thread-33] org.apache.sling.event.impl.dea.DistributedEventReceiver Exception during writing the event to the resource tree. org.apache.sling.api.resource.PersistenceException: Unable to create resource with path /var/eventing/distribution/826cd21a-6a8f-48cb-b112-768b421af572/2013/5/19/2/40/event-372
	at org.apache.sling.event.impl.support.ResourceHelper.getOrCreateResource(ResourceHelper.java:230)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.writeEvent(DistributedEventReceiver.java:208)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.processWriteQueue(DistributedEventReceiver.java:158)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver.access$500(DistributedEventReceiver.java:75)
	at org.apache.sling.event.impl.dea.DistributedEventReceiver$1.run(DistributedEventReceiver.java:115)
	at java.lang.Thread.run(Thread.java:662) 

on debugging I found that this event has following property ":sling:jobs:asynchandler" which jcr wont allow

This property is coming from JobConsumer.

Thanks,
-Amit

Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by Carsten Ziegeler <cz...@apache.org>.
Great, just checked the code - and we explicitly forbid keys with a slash -
which makes sense as this would address a child resource, or any other
resource in the tree.

Carsten


2013/5/17 Amit.. Gupta. <am...@adobe.com>

> Yeah, the key is okie, it was just a side effect of some logic, I didn't
> intend to put it there. I can fix that
>
> -Amit
>
> -----Original Message-----
> From: Carsten Ziegeler [mailto:cziegeler@apache.org]
> Sent: 17 May 2013 17:27
> To: dev@sling.apache.org
> Subject: Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause
> inconsistent behavior
>
> Hi,
>
> good catch, I think JcrResourceProvider should remove the node if setting
> properties fails.
>
> But /a/b/c/a.txt shouldn't be an invalid key - this should be encoded.
>
> Regards
> Carsten
>
>
> 2013/5/17 Amit.. Gupta. <am...@adobe.com>
>
> > Hi Devs,
> >
> > I have been debugging an issue with sling jobs and finally found that
> > current implementation JobMangerImpl.writeJob can cause inconsistent
> > behaviour (root cause is JcrResourceProvider.create)
> >
> > Issue that led to this:
> > I observed that sometime all of my event properties were not being
> > written to the job node. Though the job node was being created. But
> > ultimately JobManager would error out giving following messages:
> > 17.05.2013 16:41:57.578 *WARN* [Apache Sling Job Background Loader]
> > org.apache.sling.event.impl.jobs.JobManagerImpl Discarding job - job
> > topic is missing :
> > /var/eventing/jobs/assigned/826cd21a-6a8f-48cb-b112-768b421af572/sling
> > event:eventadmin/2013/5/17/16/39/com.adobe.cq.collection.update.job_82
> > 6cd21a-6a8f-48cb-b112-768b421af572_2
> > Sometime, my job handler would be called, but event won't have enough
> > properties that I sent to jobManager.
> > Problem:
> > There was an issue in my code that was adding a property to the event,
> > which had invalid key i.e. /a/b/c/a.txt and JcrResourceProvider can
> > not persist it. Hence the issue. This is fine, I can correct it.
> >
> > But the main problem is that this persistence error was never reported
> > in error logs, and job got persisted event though
> > JcrResourceProvider.create threw a PersistenceException. But the job
> > was created with fewer properties with what I intended. This resulted
> > in sometime, my JobHandler being called, but not getting enough
> properties.
> >
> > With the debugging, I found that JobManagerImpl.writeJob can cause
> > some inconsistent behaviour due to the way,
> > ResourceUtil.getOrCreateResource and JcrResourceProvider.create.
> >
> > In this case following happened:
> > JcrResourceProvider.create threw PE while persisting the property, but
> > the node was already by this time.
> > ResourceUtil.getOrCreateResource caught the PE, but checked for the
> > existence of resource and hence ignored it.
> >
> > Now, above implementation is wrong, either JcrResourceProvider should
> > ensure that operation is atomic. Or ResourceUtil.getOrCreateResource
> > should be changed revert changes in case of exception.
> >
> > WDYT?
> >
> > Thanks,
> > Amit
> >
> >
>
>
> --
> Carsten Ziegeler
> cziegeler@apache.org
>



-- 
Carsten Ziegeler
cziegeler@apache.org

RE: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by "Amit.. Gupta." <am...@adobe.com>.
Yeah, the key is okie, it was just a side effect of some logic, I didn't intend to put it there. I can fix that

-Amit

-----Original Message-----
From: Carsten Ziegeler [mailto:cziegeler@apache.org] 
Sent: 17 May 2013 17:27
To: dev@sling.apache.org
Subject: Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Hi,

good catch, I think JcrResourceProvider should remove the node if setting properties fails.

But /a/b/c/a.txt shouldn't be an invalid key - this should be encoded.

Regards
Carsten


2013/5/17 Amit.. Gupta. <am...@adobe.com>

> Hi Devs,
>
> I have been debugging an issue with sling jobs and finally found that 
> current implementation JobMangerImpl.writeJob can cause inconsistent 
> behaviour (root cause is JcrResourceProvider.create)
>
> Issue that led to this:
> I observed that sometime all of my event properties were not being 
> written to the job node. Though the job node was being created. But 
> ultimately JobManager would error out giving following messages:
> 17.05.2013 16:41:57.578 *WARN* [Apache Sling Job Background Loader] 
> org.apache.sling.event.impl.jobs.JobManagerImpl Discarding job - job 
> topic is missing :
> /var/eventing/jobs/assigned/826cd21a-6a8f-48cb-b112-768b421af572/sling
> event:eventadmin/2013/5/17/16/39/com.adobe.cq.collection.update.job_82
> 6cd21a-6a8f-48cb-b112-768b421af572_2
> Sometime, my job handler would be called, but event won't have enough 
> properties that I sent to jobManager.
> Problem:
> There was an issue in my code that was adding a property to the event, 
> which had invalid key i.e. /a/b/c/a.txt and JcrResourceProvider can 
> not persist it. Hence the issue. This is fine, I can correct it.
>
> But the main problem is that this persistence error was never reported 
> in error logs, and job got persisted event though 
> JcrResourceProvider.create threw a PersistenceException. But the job 
> was created with fewer properties with what I intended. This resulted 
> in sometime, my JobHandler being called, but not getting enough properties.
>
> With the debugging, I found that JobManagerImpl.writeJob can cause 
> some inconsistent behaviour due to the way, 
> ResourceUtil.getOrCreateResource and JcrResourceProvider.create.
>
> In this case following happened:
> JcrResourceProvider.create threw PE while persisting the property, but 
> the node was already by this time.
> ResourceUtil.getOrCreateResource caught the PE, but checked for the 
> existence of resource and hence ignored it.
>
> Now, above implementation is wrong, either JcrResourceProvider should 
> ensure that operation is atomic. Or ResourceUtil.getOrCreateResource 
> should be changed revert changes in case of exception.
>
> WDYT?
>
> Thanks,
> Amit
>
>


--
Carsten Ziegeler
cziegeler@apache.org

Re: JcrResourceProvider.create/JobMangerImpl.writeJob can cause inconsistent behavior

Posted by Carsten Ziegeler <cz...@apache.org>.
Hi,

good catch, I think JcrResourceProvider should remove the node if setting
properties fails.

But /a/b/c/a.txt shouldn't be an invalid key - this should be encoded.

Regards
Carsten


2013/5/17 Amit.. Gupta. <am...@adobe.com>

> Hi Devs,
>
> I have been debugging an issue with sling jobs and finally found that
> current implementation JobMangerImpl.writeJob can cause inconsistent
> behaviour (root cause is JcrResourceProvider.create)
>
> Issue that led to this:
> I observed that sometime all of my event properties were not being written
> to the job node. Though the job node was being created. But ultimately
> JobManager would error out giving following messages:
> 17.05.2013 16:41:57.578 *WARN* [Apache Sling Job Background Loader]
> org.apache.sling.event.impl.jobs.JobManagerImpl Discarding job - job topic
> is missing :
> /var/eventing/jobs/assigned/826cd21a-6a8f-48cb-b112-768b421af572/slingevent:eventadmin/2013/5/17/16/39/com.adobe.cq.collection.update.job_826cd21a-6a8f-48cb-b112-768b421af572_2
> Sometime, my job handler would be called, but event won't have enough
> properties that I sent to jobManager.
> Problem:
> There was an issue in my code that was adding a property to the event,
> which had invalid key i.e. /a/b/c/a.txt and JcrResourceProvider can not
> persist it. Hence the issue. This is fine, I can correct it.
>
> But the main problem is that this persistence error was never reported in
> error logs, and job got persisted event though JcrResourceProvider.create
> threw a PersistenceException. But the job was created with fewer properties
> with what I intended. This resulted in sometime, my JobHandler being
> called, but not getting enough properties.
>
> With the debugging, I found that JobManagerImpl.writeJob can cause some
> inconsistent behaviour due to the way, ResourceUtil.getOrCreateResource and
> JcrResourceProvider.create.
>
> In this case following happened:
> JcrResourceProvider.create threw PE while persisting the property, but the
> node was already by this time.
> ResourceUtil.getOrCreateResource caught the PE, but checked for the
> existence of resource and hence ignored it.
>
> Now, above implementation is wrong, either JcrResourceProvider should
> ensure that operation is atomic. Or ResourceUtil.getOrCreateResource should
> be changed revert changes in case of exception.
>
> WDYT?
>
> Thanks,
> Amit
>
>


-- 
Carsten Ziegeler
cziegeler@apache.org