You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Joerg Hoh (Jira)" <ji...@apache.org> on 2022/01/27 09:13:00 UTC

[jira] [Updated] (SLING-11095) Reduce repo traversals when scheduling next job

     [ https://issues.apache.org/jira/browse/SLING-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joerg Hoh updated SLING-11095:
------------------------------
    Description: 
In an AEM instance where a lot of Sling Job processing was ongoing I traced all cases where a ResourceResolver was opened. I did this for a period of around 30 seconds. 


In 154 times the stacktrace looked like this:
{noformat:title=loadJobs}
[...]
	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.loadJobs(QueueJobCache.java:224) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:180) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
	at org.apache.jackrabbit.commons.observation.ListenerTracker$1.onEvent(ListenerTracker.java:190) [org.apache.jackrabbit.jackrabbit-jcr-commons:2.20.0]
 {noformat}

And in another 254 cases the stacktrace looked like this:
{noformat:title=startProcessing}
	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.JobHandler.persistJobProperties(JobHandler.java:217) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.JobHandler.startProcessing(JobHandler.java:74) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:190) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
{noformat}

In the case of {{loadJobs}} the ResourceResolver is opened to load all Jobs, although I am not sure if really all jobs are required at that point. To achieve this it traverses the repository to load all jobs matching a topic [1]. I wonder if this is traversal is necessary, because during that time this is done around 5 times per second. Is it possible to optimize that, so we don't have that much repository access.

Not sure if we can achieve something similar in the case of {{startJobs}}.



[1] https://github.com/apache/sling-org-apache-sling-event/blob/dbffb437564d553dcb65d60742cbfbeccf9c587f/src/main/java/org/apache/sling/event/impl/jobs/queues/QueueJobCache.java#L288


 

  was:
In an AEM instance where a lot of Sling Job processing was ongoing I traced all cases where a ResourceResolver was opened. I did this for a period of around 30 seconds. 


In 154 times the stacktrace looked like this:
{noformat:title=loadJobs}
[...]
	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.loadJobs(QueueJobCache.java:224) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:180) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
	at org.apache.jackrabbit.commons.observation.ListenerTracker$1.onEvent(ListenerTracker.java:190) [org.apache.jackrabbit.jackrabbit-jcr-commons:2.20.0]
 {noformat}

And in another 254 the stacktrace looked like this:
{noformat:title=startProcessing}
	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.JobHandler.persistJobProperties(JobHandler.java:217) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.JobHandler.startProcessing(JobHandler.java:74) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:190) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
{noformat}

In the case of {{loadJobs}} the ResourceResolver is opened to load all Jobs, although I am not sure if really all jobs are required at that point. To achieve this it traverses the repository to load all jobs matching a topic [1]. I wonder if this is traversal is necessary, because during that time this is done around 5 times per second. Is it possible to optimize that, so we don't have that much repository access.



[1] https://github.com/apache/sling-org-apache-sling-event/blob/dbffb437564d553dcb65d60742cbfbeccf9c587f/src/main/java/org/apache/sling/event/impl/jobs/queues/QueueJobCache.java#L288


 


> Reduce repo traversals when scheduling next job
> -----------------------------------------------
>
>                 Key: SLING-11095
>                 URL: https://issues.apache.org/jira/browse/SLING-11095
>             Project: Sling
>          Issue Type: Improvement
>          Components: Event
>    Affects Versions: Event 4.2.24
>            Reporter: Joerg Hoh
>            Priority: Major
>
> In an AEM instance where a lot of Sling Job processing was ongoing I traced all cases where a ResourceResolver was opened. I did this for a period of around 30 seconds. 
> In 154 times the stacktrace looked like this:
> {noformat:title=loadJobs}
> [...]
> 	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
> 	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.loadJobs(QueueJobCache.java:224) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:180) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
> 	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
> 	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
> 	at org.apache.jackrabbit.commons.observation.ListenerTracker$1.onEvent(ListenerTracker.java:190) [org.apache.jackrabbit.jackrabbit-jcr-commons:2.20.0]
>  {noformat}
> And in another 254 cases the stacktrace looked like this:
> {noformat:title=startProcessing}
> 	at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:89) [org.apache.sling.resourceresolver:1.8.0]
> 	at org.apache.sling.event.impl.jobs.config.JobManagerConfiguration.createResourceResolver(JobManagerConfiguration.java:298) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.JobHandler.persistJobProperties(JobHandler.java:217) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.JobHandler.startProcessing(JobHandler.java:74) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueJobCache.getNextJob(QueueJobCache.java:190) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.JobQueueImpl.startJobs(JobQueueImpl.java:261) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueManager.start(QueueManager.java:275) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.event.impl.jobs.queues.QueueManager.handleEvent(QueueManager.java:460) [org.apache.sling.event:4.2.24]
> 	at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:431) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.tasks.HandlerTask.runWithoutDenylistTiming(HandlerTask.java:82) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:107) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.handler.EventAdminImpl.sendEvent(EventAdminImpl.java:155) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.felix.eventadmin.impl.security.EventAdminSecurityDecorator.sendEvent(EventAdminSecurityDecorator.java:96) [org.apache.felix.eventadmin:1.6.2]
> 	at org.apache.sling.event.impl.jobs.notifications.NewJobSender.onChange(NewJobSender.java:121) [org.apache.sling.event:4.2.24]
> 	at org.apache.sling.resourceresolver.impl.observation.BasicObservationReporter.reportChanges(BasicObservationReporter.java:211) [org.apache.sling.resourceresolver:1.8.0]
> 	at org.apache.sling.jcr.resource.internal.JcrResourceListener.onEvent(JcrResourceListener.java:155) [org.apache.sling.jcr.resource:3.1.0]
> {noformat}
> In the case of {{loadJobs}} the ResourceResolver is opened to load all Jobs, although I am not sure if really all jobs are required at that point. To achieve this it traverses the repository to load all jobs matching a topic [1]. I wonder if this is traversal is necessary, because during that time this is done around 5 times per second. Is it possible to optimize that, so we don't have that much repository access.
> Not sure if we can achieve something similar in the case of {{startJobs}}.
> [1] https://github.com/apache/sling-org-apache-sling-event/blob/dbffb437564d553dcb65d60742cbfbeccf9c587f/src/main/java/org/apache/sling/event/impl/jobs/queues/QueueJobCache.java#L288
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)