You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2014/08/28 23:43:08 UTC

[jira] [Created] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs

Mona Chitnis created OOZIE-1984:
-----------------------------------

             Summary: SLACalculator in HA mode performs duplicate operations on records with completed jobs
                 Key: OOZIE-1984
                 URL: https://issues.apache.org/jira/browse/OOZIE-1984
             Project: Oozie
          Issue Type: Bug
    Affects Versions: trunk
            Reporter: Mona Chitnis
             Fix For: trunk, 4.1.0


Scenario:

SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener.

Buggy part:
{code}
SLACalculatorMemory.java

else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
                // jobid might not exist in slaMap in HA Setting
                SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get(
                        SLARegQuery.GET_SLA_REG_ALL, jobId);
                if (slaRegBean != null) { // filter out jobs picked by SLA job event listener
                                          // but not actually configured for SLA
                    SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get(
                            SLASummaryQuery.GET_SLA_SUMMARY, jobId);
                    slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
                    if (slaCalc.getEventProcessed() < 7) {
                        slaMap.put(jobId, slaCalc);
                    }
                }
            }
        }
        if (slaCalc != null) {
..
Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance())
                                .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
                        byte eventProc = ((Byte) eventProcObj).byteValue();
..
processJobEndSuccessSLA(slaCalc, startTime, endTime);
{code}

method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event _again_. So the bug here is two-fold:
 * if all events are already processed, still invokes this function
 * event processed is 8 (1000), so second LSB bit is unset and hence duration processed.

Fix - not invoke function when eventProc = 1000



--
This message was sent by Atlassian JIRA
(v6.2#6252)