You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Lou DeGenaro (JIRA)" <de...@uima.apache.org> on 2018/10/05 11:38:00 UTC

[jira] [Resolved] (UIMA-5883) DUCC JobDriver (JD) may cause job to never process all work items if JobProcess (JP) is preempted

     [ https://issues.apache.org/jira/browse/UIMA-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lou DeGenaro resolved UIMA-5883.
--------------------------------
    Resolution: Fixed

Change set delivered.

> DUCC JobDriver (JD) may cause job to never process all work items if JobProcess (JP) is preempted
> -------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-5883
>                 URL: https://issues.apache.org/jira/browse/UIMA-5883
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>            Priority: Major
>             Fix For: 2.2.3-Ducc
>
>
> Noticed on Apache DUCC demo that Job 14493 had work items total=10001, completed=9986, dispatch=15, and made no further progress.  Looking in work-item-state.json we see the 9986 that have completed and can infer precisely those that did not.  Then looking in the JD log for those not yet complete work items, we see entries similar to:
> 25 Sep 2018 23:28:28,042 WARN ActionGet - T[14] engage seqNo=? remote=uima-ducc-demo-6.8448.25 node=uima-ducc-demo-6 pid=8448 text=process discontinued
> Looking at the code, we see that under this condition that the JD has obtained a CAS from the CR, but chooses not to give it to the requesting JP process since JD knows that the requester has been targeted for termination (e.g. preempted).  But the JD forgets to put the CAS back into the queue!  And therefore those CASes never get processed and the Job is hung forevermore.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)