You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2018/07/12 15:32:00 UTC

[jira] [Created] (SLING-7778) race-condition in scheduled job creation might keep scheduled job in-memory

Stefan Egli created SLING-7778:
----------------------------------

             Summary: race-condition in scheduled job creation might keep scheduled job in-memory
                 Key: SLING-7778
                 URL: https://issues.apache.org/jira/browse/SLING-7778
             Project: Sling
          Issue Type: Bug
          Components: Event
    Affects Versions: Event 4.2.0
            Reporter: Stefan Egli
            Assignee: Stefan Egli


The creation of a scheduled job is handled in [{{JobSchedulerImpl.addScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L501] which in turn [calls {{scheduledJobHandler.addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L523]. There, the job is first persisted to the repository via [calling {{writeScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L213].

Normally the sequence of events is then as follows:
* [A] the [synchronized block in {{addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L216] is entered and the job is scheduled with quartz.
* [B] then shortly afterwards, observation is triggered and {{handleAddOrUpdate}} is called. This will, as a *side-effect* [adds the scheduled job to the {{scheduledJobs}} map|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L446]
* the result of this being that the scheduled job is now in {{ScheduledJobHandler.scheduledJobs}}
* later on, when the scheduled job is finished, {{handleRemove()}} (or similar) is called, and it checks if it finds the scheduled job in {{ScheduledJobHandler.scheduledJobs}}. if it does, all good. And here we're in the all good case, so all is fine

However, there's the following race-condition:
* first [B] happens, ie first the observation event is triggered, so {{handleAddOrUpdate}} is called, and again the side-effect hits: the scheduled job is added to the {{ScheduledJobHandler.scheduledJobs}}. all good so far
* then [A] continues into the synchronized block. *But* now, the [remove of the scheduled job|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L217] actually does damage: the scheduled job ends up *not* being in the {{ScheduledJobHandler.scheduledJobs}} map.
* so in this scenario, when a later {{handleRemove()}} (or similar) happens, it can't find the scheduled job (see [here|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L370] h being null)
* the result being that [{{JobScheduler.unscheduleJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L371] is *not invoked*, *leaving the scheduled job with the JobSchedulerImpl*.

And the result of this race-condition is that {{JobSchedulerImpl.scheduledJobs}} map has a zombie scheduled job, which it shouldn't actually have.

So later on for example, a topology event happens, such as a PROPERTIES_EVENT. That in turn will trigger [{{JobSchedulerImpl.configurationChanged()}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L152] which will now *re-schedule the previously scheduled job*. And we have a duplicate scheduled job execution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)