You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@roller.apache.org by "Allen Gilliland (JIRA)" <no...@atlassian.com> on 2007/06/12 20:57:55 UTC

[Roller-JIRA] Created: (ROL-1446) Task leasing causes scheduling inconsistencies

Task leasing causes scheduling inconsistencies
----------------------------------------------

                 Key: ROL-1446
                 URL: http://opensource.atlassian.com/projects/roller/browse/ROL-1446
             Project: Roller
          Issue Type: Bug
    Affects Versions: 3.1
            Reporter: Allen Gilliland
            Assignee: Roller Unassigned


After a bit more poking around I have realized that some of the problems I've seen with the task scheduling is actually being caused by the leasing process we are using.  The root of the problem is that the task scheduling is not properly synchronized with the leasing process and therefore scheduling drift happens.

An example.  Assume that a task is scheduled to run once per minute starting 00:00:00.50.  This will mean that the subsequent run times for the task will be 00:01:00.50, 00:02:00.50, etc, etc.  Now take into account the fact that in the database the leasing time of a task is defined by the time the task obtained a lease on db time, and that time is some amount of time after the time the actual task was started.  So lets assume for a moment that it takes 700ms to obtain a lease via the db.  This means that the time the db thinks a task is run is different than the time the app thinks the task is run, and in our particular example the actual clock difference will be 1 second (00:00:00.50 + 700ms = 00:00:01.20).  What this means is that when the application runs the task the next time at 00:01:00.50 and tries to obtain a new lease it will be refused because the db thinks the last run time for the task was at 00:00:01.20 which is less than 60 seconds from 00:01:00.50.  So this means that the additional time required to obtain a lease in the db can actually cause the lease time to be off by 1 or more seconds and therefore cause a subsequent run of the task to fail.

I have seen this exact problem occur with jobs meant to run once daily where the job runs at just after midnight, obtains a lease at 00:00:01.xxx seconds and runs, and then the following day the task fails to run because the app thinks that the interval time for the task has not yet elapsed.

Sorting this out will require better alignment of the clocks and timestamps stored in this process and this is the best option I can come up with right now ...

When a task successfully obtains a lease and runs it must keep track of the exact time the task was first initiated, then when the task completes and releases its lease it stores that time in db as the last time the lease was acquired.  This would basically be a fairly simple attempt at properly adjusting the lease time stored in the db so that it does not include the additional amount of time required to process obtaining the lease.  So an example would be that if a task is set to run hourly starting at 05:00 and it obtains its lease at 05:01.20 then when the task completes we would subtract the 01.20 seconds from the time stored in the db so that the db properly reflects the time the task was run, not the time the lease was obtained.

I am sure there are other ways to better synchronize the multiple clocks involved when doing clustered task scheduling, but at the end of the day it's apparent that part of the solution is going to have to involve properly accounting for the extra time that gets used up to obtain a lease so that scheduling doesn't drift.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/roller/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Roller-JIRA] Closed: (ROL-1446) Task leasing causes scheduling inconsistencies

Posted by "linda skrocki (JIRA)" <no...@atlassian.com>.

     [ http://opensource.atlassian.com/projects/roller/browse/ROL-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

linda skrocki closed ROL-1446.
------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0

Allen fixed in 4.0.

> Task leasing causes scheduling inconsistencies
> ----------------------------------------------
>
>                 Key: ROL-1446
>                 URL: http://opensource.atlassian.com/projects/roller/browse/ROL-1446
>             Project: Roller
>          Issue Type: Bug
>    Affects Versions: 3.1
>            Reporter: Allen Gilliland
>            Assignee: Roller Unassigned
>             Fix For: 4.0
>
>
> After a bit more poking around I have realized that some of the problems I've seen with the task scheduling is actually being caused by the leasing process we are using.  The root of the problem is that the task scheduling is not properly synchronized with the leasing process and therefore scheduling drift happens.
> An example.  Assume that a task is scheduled to run once per minute starting 00:00:00.50.  This will mean that the subsequent run times for the task will be 00:01:00.50, 00:02:00.50, etc, etc.  Now take into account the fact that in the database the leasing time of a task is defined by the time the task obtained a lease on db time, and that time is some amount of time after the time the actual task was started.  So lets assume for a moment that it takes 700ms to obtain a lease via the db.  This means that the time the db thinks a task is run is different than the time the app thinks the task is run, and in our particular example the actual clock difference will be 1 second (00:00:00.50 + 700ms = 00:00:01.20).  What this means is that when the application runs the task the next time at 00:01:00.50 and tries to obtain a new lease it will be refused because the db thinks the last run time for the task was at 00:00:01.20 which is less than 60 seconds from 00:01:00.50.  So this means that the additional time required to obtain a lease in the db can actually cause the lease time to be off by 1 or more seconds and therefore cause a subsequent run of the task to fail.
> I have seen this exact problem occur with jobs meant to run once daily where the job runs at just after midnight, obtains a lease at 00:00:01.xxx seconds and runs, and then the following day the task fails to run because the app thinks that the interval time for the task has not yet elapsed.
> Sorting this out will require better alignment of the clocks and timestamps stored in this process and this is the best option I can come up with right now ...
> When a task successfully obtains a lease and runs it must keep track of the exact time the task was first initiated, then when the task completes and releases its lease it stores that time in db as the last time the lease was acquired.  This would basically be a fairly simple attempt at properly adjusting the lease time stored in the db so that it does not include the additional amount of time required to process obtaining the lease.  So an example would be that if a task is set to run hourly starting at 05:00 and it obtains its lease at 05:01.20 then when the task completes we would subtract the 01.20 seconds from the time stored in the db so that the db properly reflects the time the task was run, not the time the lease was obtained.
> I am sure there are other ways to better synchronize the multiple clocks involved when doing clustered task scheduling, but at the end of the day it's apparent that part of the solution is going to have to involve properly accounting for the extra time that gets used up to obtain a lease so that scheduling doesn't drift.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/roller/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira