You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2014/08/28 23:55:09 UTC

[jira] [Updated] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs

     [ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis updated OOZIE-1984:
--------------------------------

    Attachment: OOZIE-1984.patch

> SLACalculator in HA mode performs duplicate operations on records with completed jobs
> -------------------------------------------------------------------------------------
>
>                 Key: OOZIE-1984
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1984
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>             Fix For: trunk, 4.1.0
>
>         Attachments: OOZIE-1984.patch
>
>
> Scenario:
> SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener.
> Buggy part:
> {code}
> SLACalculatorMemory.java
> else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
>                 // jobid might not exist in slaMap in HA Setting
>                 SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get(
>                         SLARegQuery.GET_SLA_REG_ALL, jobId);
>                 if (slaRegBean != null) { // filter out jobs picked by SLA job event listener
>                                           // but not actually configured for SLA
>                     SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get(
>                             SLASummaryQuery.GET_SLA_SUMMARY, jobId);
>                     slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
>                     if (slaCalc.getEventProcessed() < 7) {
>                         slaMap.put(jobId, slaCalc);
>                     }
>                 }
>             }
>         }
>         if (slaCalc != null) {
> ..
> Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance())
>                                 .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
>                         byte eventProc = ((Byte) eventProcObj).byteValue();
> ..
> processJobEndSuccessSLA(slaCalc, startTime, endTime);
> {code}
> method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event _again_. So the bug here is two-fold:
>  * if all events are already processed, still invokes this function
>  * event processed is 8 (1000), so second LSB bit is unset and hence duration processed.
> Fix - not invoke function when eventProc = 1000



--
This message was sent by Atlassian JIRA
(v6.2#6252)