You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2016/02/29 10:32:18 UTC
[jira] [Comment Edited] (SLING-5560) Delay job processing at startup to avoid unnecessary stale job handling

    [ https://issues.apache.org/jira/browse/SLING-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171586#comment-15171586 ] 

Stefan Egli edited comment on SLING-5560 at 2/29/16 9:31 AM:
-------------------------------------------------------------

[~chetanm], [~cziegeler], we could have the job manager adapt the delay by taking knowledge about the cluster state pre-shutdown and at-restart into account:
* assuming we'd categorize the job manager into two phases: 
** an unstable one which is during topology_changing and during reassignment of jobs after a topology_changed
** a stable one which is after reassignment, during normal operation
* upon entering a stable phase (or as the last step after reassignment), the job manager could persist the local cluster view (ie all slingIds of the local cluster)
* upon restart (ie topology_init), the job manager could compare the new view with that persisted 'last stable one'
** if they match (normal eg for tarMk), then the default could be very low, if not even 0sec
** if they don't match, then you could have a 1, 2 or perhaps 5min default


was (Author: egli):
[~chetanm], [~cziegeler], we could have the job manager adapt the delay by taking knowledge about the cluster state pre-shutdown and at-restart into account:
* assuming we'd categorize the job manager into two phases: 
** an unstable one which is during topology_changed and during reassignment of jobs after a topology_changed
** a stable one which is after reassignment, during normal operation
* upon entering a stable phase (or as the last step after reassignment), the job manager could persist the local cluster view (ie all slingIds of the local cluster)
* upon restart (ie topology_init), the job manager could compare the new view with that persisted 'last stable one'
** if they match (normal eg for tarMk), then the default could be very low, if not even 0sec
** if they don't match, then you could have a 1, 2 or perhaps 5min default

> Delay job processing at startup to avoid unnecessary stale job handling
> -----------------------------------------------------------------------
>
>                 Key: SLING-5560
>                 URL: https://issues.apache.org/jira/browse/SLING-5560
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Chetan Mehrotra
>             Fix For: Event 4.1.0
>
>
> While running in a cluster (or in some case non cluster setup also) Topology would become stable after "some" time. For e.g. in a 2 node setup by the time first node comes up second node might not have started so topology would not detect it and first node might think that second node is not there and it can then start assigning job for that node to current node under stable job processing.
> Instead of doing this just right at startup job processing should start after "some" delay such that topology becomes stable. This would avoid this unnecessary work and probably even reduce load on the master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)