You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/03/02 17:40:00 UTC

[jira] [Commented] (TEZ-3897) Tez Local Mode hang for vertices with broadcast input

    [ https://issues.apache.org/jira/browse/TEZ-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383862#comment-16383862 ] 

Jonathan Eagles commented on TEZ-3897:
--------------------------------------

[~jlowe]
{quote}
Do we need to worry about task deallocations implying the task needs to be interrupted/killed? The other schedulers will automatically deallocate a container if a task deallocate maps to a currently allocated task.
{quote}
The deallocate task request when processed will issue a container being released message to the context that will start the container release process

{quote}
Seems like there shouldn't be a PreemptTaskRequest so much as a DeallocateContainerRequest. Both of those kinds of requests don't need a priority, so whichever one is kept arguably shouldn't derive from TaskRequest but something like a SchedulerRequest that TaskRequest derives from as well. Or just have the queue hold Object rather than TaskRequest and do RTTI on everything in the queue as it already does.
{quote}

Change the class inheritance to reflect this change.

{quote}
Nit: It's a bit odd for addPreemptTaskRequest's signature to return an Object yet it always returns null. Better as a void method?
{quote}

The api expects an object or null returned from the deallocate container message. However that's not possible since the message is actually processed in the dispatch thread. The caller ignores the return value as well. Changed the async handler so that it doesn't return null.

{quote}
I didn't see in the patch where the actual preemption of the running task occurs. I would expect there to be a corresponding change in LocalContainerLauncher to implement the preempt of the running task, but it still explicitly ignores any requests to stop a container.
{quote}
Added the container being released message for deallocate container (preemption case) and added logic in LocalContainerLauncher to cancel the future. Verified that futures are cancelled and that preemption works correctly.

> Tez Local Mode hang for vertices with broadcast input
> -----------------------------------------------------
>
>                 Key: TEZ-3897
>                 URL: https://issues.apache.org/jira/browse/TEZ-3897
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3897.001.patch, TEZ-3897.002.patch
>
>
> Broadcast edges are not taken into consideration for slow-start edges so downstream tasks in local mode can start before upstream tasks. Without preemption in the scheduler, there will be a hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)