You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2016/03/16 16:47:33 UTC
[jira] [Commented] (SLING-5560) Delay job processing at startup to avoid unnecessary stale job handling

    [ https://issues.apache.org/jira/browse/SLING-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197535#comment-15197535 ] 

Stefan Egli commented on SLING-5560:
------------------------------------

[~chetanm] re
bq. 30 sec default (as was the case earlier)
I can't find this anywhere in the code that we 'earlier' had a default delay (be that 30sec or 10sec or otherwise). I've checked the {{JobManagerConfiguration}} back to sling.event 3.4.0 and it always does a {{CheckTopologyTask.fullRun}} immediately on receiving the {{TOPOLOGY_INIT}}

Suggestion on how to implement this:
* simply by single config
** have a new parameter 'startupDelay' of say 30 or 60sec by default (I think we shouldn't re/ab-use the existing 'backgroundLoadDelay' as a) that is orthogonal and b) it additionally applies and c) it also applies for {{TOPOLOGY_CHANGED}}, not only for startup)
** any topology event that is received *before* the startupDelay has passed is 'queued'
** once the startupDelay has passed, those queued topology events are processed. This might not be as simple as just callling the existing {{handleTopologyEvent}} method in a loop, as it likely contains {{!current}} views. So perhaps the logic must be slightly modified there, not sure.
* by automatism and a config (as already suggested above, here with more details):
** upon an actual topology change, store a 'copy' of the clusterView (ie all slingIds) {{/var/eventing/clusterInstances/<mySlingId>}}
** upon {{TOPOLOGY_INIT}} compare the then-current clusterView with what's stoerd under {{/var/eventing/clusterInstances/<mySlingId>}}
*** if it matches go ahead
*** if it doesn't match, wait a max 'maxStartupDelay' and then still go ahead even though that then means to do a reassignment (bite the bullet)

[~cziegeler], wdyt? should we go for the simple or the automatism approach?

> Delay job processing at startup to avoid unnecessary stale job handling
> -----------------------------------------------------------------------
>
>                 Key: SLING-5560
>                 URL: https://issues.apache.org/jira/browse/SLING-5560
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Chetan Mehrotra
>            Assignee: Stefan Egli
>             Fix For: Event 4.1.0
>
>
> While running in a cluster (or in some case non cluster setup also) Topology would become stable after "some" time. For e.g. in a 2 node setup by the time first node comes up second node might not have started so topology would not detect it and first node might think that second node is not there and it can then start assigning job for that node to current node under stable job processing.
> Instead of doing this just right at startup job processing should start after "some" delay such that topology becomes stable. This would avoid this unnecessary work and probably even reduce load on the master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)