You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/05/05 15:09:00 UTC

[jira] [Comment Edited] (TEZ-2414) LogicalIOProcessorRuntimeTask, RuntimeTask, TezTaskRunner should handle interrupts & carry out necessary cleanups

    [ https://issues.apache.org/jira/browse/TEZ-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528394#comment-14528394 ] 

Rajesh Balamohan edited comment on TEZ-2414 at 5/5/15 1:08 PM:
---------------------------------------------------------------

Attaching patch for TEZ-2003 branch. Related patch for master is TEZ-1752. 

Attaching the review comments.

>>>>
Spoke to Rajesh offline about committing the patch into TEZ-2003 instead of master for now, and hardening it there.
Looking at the patch, especially TaskRunner and LogicalIORuntimeTask changes, I think it can be split into two - changes to LogicalIOProcessorRuntimeTask, RuntimeTask, TaskReporter, TezTaskRunner go into the branch, and the rest into the master. Rajesh, thoughts on splitting it ?
Comments (TaskReporter, LogicalIORuntimeTask etc changes) - mostly minor
processorClosed=true, initializedInputs.remove, initializedOutputs.remove() - should be fore the actual invocation of close. Otherwise if there's an error we may try to invoke close twice.
 LOG.info("Cleanup is complete. EventRouterThread is yet to be interrupted"); 
Nit: This may be misleading since the eventRouter may have been interrupted during the close() finally block.
+    } catch (Throwable t) {
+      LOG.warn("Error in final cleanup of task. ", t);
This isn't necessary. Exceptions are already being handled for individual close calls.
pubilc enum State - Can be an isRunning method like the existing hasInitialized method, instead of making the enum public
In TaskRunnerCallable, is there a need to check for InterruptedExceptions in the catch Throwable clause ?
Nit: task will never be null in TezTaskRunner
An exception from abortTask should probably be reported as a Failure. Can be fixed later in the branch.
Before taskFuture = executor.submit(callable); - checking the interrupt status may be useful. Otherwise the task would start, and the get is immediately interrupted.
maybeInterruptWaitingThread - Don't think we should be interrupting the main thread at this point. That can have several consequences. If task.run() returned without an Exception, but due to an abort/interrupt invocation, and the close succeeds after this - we'll try reporting the task as successful. If there's an error from close, there's a possibility that the main thread would have deregistered the task from the Reporter, and the TaskRunnerCallable thread would fail while indicating an error. Throwing an InterruptedException here and gracefully falling off the TaskRunnerCallable thread should take care of this. After this, the main run() thread regains control and can shutdown.
For a future jira. Inputs / Outputs could check for interrupts during shutdown to prevent costly spill attempts. The rest of the patch probably addresses this though.
>>>>


>>> processorClosed=true, initializedInputs.remove, initializedOutputs.remove() - should be fore the actual invocation of close
- Fixed. Also removed log statements & unwanted try block.

>> pubilc enum State - Can be an isRunning
- Fixed. Added RuntimeTask.isRunning()

>> In TaskRunnerCallable, is there a need to check for InterruptedExceptions
- Yes, needed. If processor run is not consuming the interrupt, it might be throwing interruptedException & in such cases, we need to reset the interrupt status.

>> An exception from abortTask should probably be reported as a Failure
- Fixed.

>> Before taskFuture = executor.submit(callable); - checking the interrupt status may be useful
- Fixed.



was (Author: rajesh.balamohan):
Attaching patch for TEZ-2003 branch.

> LogicalIOProcessorRuntimeTask, RuntimeTask, TezTaskRunner should handle interrupts & carry out necessary cleanups
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2414
>                 URL: https://issues.apache.org/jira/browse/TEZ-2414
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2414.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)