You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Koushik Das (JIRA)" <ji...@apache.org> on 2015/04/29 06:38:06 UTC

[jira] [Commented] (CLOUDSTACK-8425) Job framework: Same internal job can execute simultaneously

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518717#comment-14518717 ] 

Koushik Das commented on CLOUDSTACK-8425:
-----------------------------------------

Internal jobs get scheduled from 2 different places:
1. In AsyncJobManagerImpl.java there is a generic scheduler which schedules jobs in batches

            protected void reallyRun() {
                try {
                    List<SyncQueueItemVO> l = _queueMgr.dequeueFromAny(getMsid(), MAX_ONETIME_SCHEDULE_SIZE);
                    if (l != null && l.size() > 0) {
                        for (SyncQueueItemVO item : l) {
                            if (s_logger.isDebugEnabled()) {
                                s_logger.debug("Execute sync-queue item: " + item.toString());
                            }
                            executeQueueItem(item, false);
                        }
                    }

2. There is another place in AsyncJobManagerImpl.java, which schedules the next job in a specific VM queue. Look at the checkQueue() in the snippet below.

                } finally {
                    // guard final clause as well
                    try {
                        if (job.getSyncSource() != null) {
                            // here check queue item one more time to double make sure that queue item is removed in case of any uncaught exception
                            _queueMgr.purgeItem(job.getSyncSource().getId());
                            checkQueue(job.getSyncSource().getQueueId());
                        }

                        try {
                            JmxUtil.unregisterMBean("AsyncJobManager", "Active Job " + job.getId());
                        } catch (Exception e) {
                            // Due to co-existence of normal-dispatched-job/wakeup-dispatched-job, MBean unregister() call
                            // is expected to fail under situations
                            if (s_logger.isTraceEnabled())
                                s_logger.trace("Unable to unregister job " + job.getId() + " to JMX monitoring due to exception " + ExceptionUtil.toString(e));
                        }


Since there is no synchronisation between these 2 places, in certain scenarios both may end up in scheduling the same job.

> Job framework: Same internal job can execute simultaneously
> -----------------------------------------------------------
>
>                 Key: CLOUDSTACK-8425
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8425
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.5.0
>            Reporter: Koushik Das
>            Assignee: Koushik Das
>            Priority: Critical
>             Fix For: 4.6.0
>
>
> Same internal work job can be executed simultaneously by 2 worker threads. The below logs show that the same work job-24152 is executed by threads Work-Job-Executor-84 and Work-Job-Executor-85. This can lead to undesired behaviour subsequently.
> 2015-03-17 14:43:34,289 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-12:ctx-4fa08ed6 job-24151 ctx-f7582544) (logid:26c78006) Sync job-24152 execution on object VmWorkJobQueue.1396
> 2015-03-17 14:43:35,420 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-4dc28af0) (logid:b0db94c2) Schedule queued job-24152
> 2015-03-17 14:43:35,422 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-83:ctx-62660b07 job-24149/job-24150) (logid:97d686aa) Schedule queued job-24152
> 2015-03-17 14:43:35,426 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:d28187a4) Add job-24152 into job monitoring
> 2015-03-17 14:43:35,428 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-85:ctx-5a06c91e job-24151/job-24152) (logid:e55e3f25) Add job-24152 into job monitoring
> 2015-03-17 14:43:35,432 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:26c78006) Executing AsyncJobVO
> {id:24152, userId: 4, accountId: 4, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkDetachVolume, cmdInfo: rO0ABXNyAB9jb20uY2xvdWQudm0uVm1Xb3JrRGV0YWNoVm9sdW1lhvReA-s81AMCAAFMAAh2b2x1bWVJZHQAEExqYXZhL2xhbmcvTG9uZzt4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7eHAAAAAAAAAABAAAAAAAAAAEAAAAAAAABXR0ABRWb2x1bWVBcGlTZXJ2aWNlSW1wbHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAALWw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 104230524961057, completeMsid: null, lastUpdated: null, lastPolled: null, created: Tue Mar 17 14:43:34 EDT 2015}
> 2015-03-17 14:43:35,432 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152) (logid:26c78006) Run VM work job: com.cloud.vm.VmWorkDetachVolume for VM 1396, job origin: 24151
> 2015-03-17 14:43:35,433 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-84:ctx-9568eb6e job-24151/job-24152 ctx-0f9cd10a) (logid:26c78006) Execute VM work job: com.cloud.vm.VmWorkDetachVolume
> {"volumeId":2907,"userId":4,"accountId":4,"vmId":1396,"handlerName":"VolumeApiServiceImpl"}
> 2015-03-17 14:43:35,435 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-85:ctx-5a06c91e job-24151/job-24152) (logid:26c78006) Executing AsyncJobVO
> {id:24152, userId: 4, accountId: 4, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkDetachVolume, cmdInfo: rO0ABXNyAB9jb20uY2xvdWQudm0uVm1Xb3JrRGV0YWNoVm9sdW1lhvReA-s81AMCAAFMAAh2b2x1bWVJZHQAEExqYXZhL2xhbmcvTG9uZzt4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7eHAAAAAAAAAABAAAAAAAAAAEAAAAAAAABXR0ABRWb2x1bWVBcGlTZXJ2aWNlSW1wbHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAALWw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 104230524961057, completeMsid: null, lastUpdated: null, lastPolled: null, created: Tue Mar 17 14:43:34 EDT 2015}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)