You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2014/08/28 23:55:09 UTC
[jira] [Updated] (OOZIE-1984) SLACalculator in HA mode performs
duplicate operations on records with completed jobs
[ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mona Chitnis updated OOZIE-1984:
--------------------------------
Attachment: OOZIE-1984.patch
> SLACalculator in HA mode performs duplicate operations on records with completed jobs
> -------------------------------------------------------------------------------------
>
> Key: OOZIE-1984
> URL: https://issues.apache.org/jira/browse/OOZIE-1984
> Project: Oozie
> Issue Type: Bug
> Affects Versions: trunk
> Reporter: Mona Chitnis
> Fix For: trunk, 4.1.0
>
> Attachments: OOZIE-1984.patch
>
>
> Scenario:
> SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener.
> Buggy part:
> {code}
> SLACalculatorMemory.java
> else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
> // jobid might not exist in slaMap in HA Setting
> SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get(
> SLARegQuery.GET_SLA_REG_ALL, jobId);
> if (slaRegBean != null) { // filter out jobs picked by SLA job event listener
> // but not actually configured for SLA
> SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get(
> SLASummaryQuery.GET_SLA_SUMMARY, jobId);
> slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
> if (slaCalc.getEventProcessed() < 7) {
> slaMap.put(jobId, slaCalc);
> }
> }
> }
> }
> if (slaCalc != null) {
> ..
> Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance())
> .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
> byte eventProc = ((Byte) eventProcObj).byteValue();
> ..
> processJobEndSuccessSLA(slaCalc, startTime, endTime);
> {code}
> method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event _again_. So the bug here is two-fold:
> * if all events are already processed, still invokes this function
> * event processed is 8 (1000), so second LSB bit is unset and hence duration processed.
> Fix - not invoke function when eventProc = 1000
--
This message was sent by Atlassian JIRA
(v6.2#6252)