You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/04/14 00:21:12 UTC

[jira] [Created] (TEZ-2313) Regression in handling obsolete events in ShuffleScheduler

Bikas Saha created TEZ-2313:
-------------------------------

             Summary: Regression in handling obsolete events in ShuffleScheduler
                 Key: TEZ-2313
                 URL: https://issues.apache.org/jira/browse/TEZ-2313
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Bikas Saha
            Priority: Critical


/cc [~rohini]

When an obsolete event is received then the shuffle scheduler fails fast even when pipelining is disabled. IIRC, obsolete inputs were supposed to fail the shuffled inputs if we were reading and merging partial spilled outputs. But in this case, pipelining is not on. So not sure why we are failing fast. 

{noformat}
Caused by: java.io.IOException: InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=4485], attemptNumber=1, pathComponent=null, fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] is marked as obsoleteInput, but it exists in shuffleInfoEventMap. Some data could have been already merged to memory/disk outputs. Failing the fetch early.
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.obsoleteInput(ShuffleScheduler.java:546)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.processTaskFailedEvent(ShuffleInputEventHandlerOrderedGrouped.java:122)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.handleEvent(ShuffleInputEventHandlerOrderedGrouped.java:73)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleInputEventHandlerOrderedGrouped.handleEvents(ShuffleInputEventHandlerOrderedGrouped.java:63)
at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.handleEvents(Shuffle.java:246)
at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.handleEvents(OrderedGroupedKVInput.java:265)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:620)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$1100(LogicalIOProcessorRuntimeTask.java:93)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:683)
at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35){noformat}
/cc [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)