You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Eric Badger (JIRA)" <ji...@apache.org> on 2017/02/08 21:02:41 UTC

[jira] [Comment Edited] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

    [ https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858256#comment-15858256 ] 

Eric Badger edited comment on TEZ-3274 at 2/8/17 9:01 PM:
----------------------------------------------------------

Uploading a first-effort patch to get some feedback. In this patch I have refactored the slow start code in the {{ShuffleVertexManager}} and {{FairShuffleVertexManager}} to make it available to all vertex managers via a config value ({{TEZ_SHUFFLE_VERTEX_MANAGER_ENABLE_SLOW_START}}) and a wrapper around the original {{scheduleTasks()}} function, which I named {{scheduleTasksFilter()}}. This preserves the functionality of all the current vertex managers, while allowing for them or any new vertex managers to adopt the slow start functionality easily in the future. The reason that the current vertex managers are not currently compatible out of the box with the refactored slow start code is because {{scheduleTasks()}} always scheduled all tasks and so the vertex managers could assume that a call to {{scheduleTasks()}} would effectively drain the list of pending tasks. However, {{scheduleTasksFilter()}} is not guaranteed to schedule all of the tasks or even any of the tasks at all. It returns a list of tasks that were actually scheduled so that the vertex manager can take the appropriate action (remove them from the pending tasks list, drop them, etc.). 

In this patch I have combined the configs for {{ShuffleVertexManager}} and {{FairShuffleVertexManager}} into a single set of configs. Currently, there are separate configs for each and they seem to be redundant. However, combining them may lead to backwards compatibility problems. Thoughts?

Questions about the next step:
  - Is it ok to move the configs to YarnConfiguration.java and change their names to something more generic. Until now they were specific to {{ShuffleVertexManager}} and {{FairShuffleVertexManager}}. Even then I'm not sure why they weren't put in YarnConfiguration.java initially, but [~jeagles] noted offline that there may have been a reason for this. 
  - I think that this same sort of refactoring effort can be applied to auto-parallelism. Are there any potential pitfalls that I should be weary of and/or anything special that I need to take into consideration while refactoring?
  - Does it make sense to add in the slow start capability to vertex managers such as {{ImmediateStartVertexManager}}. Theoretically slow start could be applied, but doesn't really make sense in this context. If slow start were enabled and the min was set then some tasks would simply never be scheduled since {{scheduleTasks()}} is only called by that vertex manager on start and on a vertex state update.
  - Should the {{VertexManagerPlugin}}'s be able to override user-specified configs coming in through the payload. The use-case for this would be in changing all of the plugins to use {{scheduleTasksFilter()}}, but then allow them to explicitly set the slow start/auto parallelism enable config to false. The con of this is that now the {{VertexManager}} can get one config from the payload (even though it doesn't currently), then the {{VertexManagerPlugin}} can change it to be something different and those 2 can be out of sync. So really the {{VertexManager}} couldn't trust the original payload value until coming back from the plugin, because it could've changed in the plugin. This is how it is setup right now with the {{ShuffleVertexManagerBaseConfig}} being created in the plugin from the payload. But I'm not completely sold that this is best for the future development of the code. 

[~bikassaha], [~sseth], [~jeagles], [~rohini], could you please review the initial patch and questions? Thanks!


was (Author: ebadger):
Uploading a first-effort patch to get some feedback. In this patch I have refactored the slow start code in the {{ShuffleVertexManager}} and {{FairShuffleVertexManager}} to make it available to all vertex managers via a config value ({{TEZ_SHUFFLE_VERTEX_MANAGER_ENABLE_SLOW_START}}) and a wrapper around the original {{scheduleTasks()}} function, which I named {{scheduleTasksFilter()}}. This preserves the functionality of all the current vertex managers, while allowing for them or any new vertex managers to adopt the slow start functionality easily in the future. The reason that the current vertex managers are not currently compatible out of the box with the refactored slow start code is because {{scheduleTasks()}} always scheduled all tasks and so the vertex managers could assume that a call to {{scheduleTasks()}} would effectively drain the list of pending tasks. However, {{scheduleTasksFilter()}} is not guaranteed to schedule all of the tasks or even any of the tasks at all. It returns a list of tasks that were actually scheduled so that the vertex manager can take the appropriate action (remove them from the pending tasks list, drop them, etc.). 

In this patch I have combined the configs for {{ShuffleVertexManager}} and {{FairShuffleVertexManager}} into a single set of configs. Currently, there are separate configs for each and they seem to be redundant. However, combining them may lead to backwards compatibility problems. Thoughts?

Questions about the next step:
  - Is it ok to move the configs to YarnConfiguration.java and change their names to something more generic. Until now they were specific to {{ShuffleVertexManager}} and {{FairShuffleVertexManager}}. Even then I'm not sure why they weren't put in YarnConfiguration.java initially, but [~jeagles] noted offline that there may have been a reason for this. 
  - I think that this same sort of refactoring effort can be applied to auto-parallelism. Are there any potential pitfalls that I should be weary of and/or anything special that I need to take into consideration while refactoring?
  - Does it make sense to add in the slow start capability to vertex managers such as {{ImmediateStartVertexManager}}. Theoretically slow start could be applied, but doesn't really make sense in this context. If slow start were enabled and the min was set then some tasks would simply never be scheduled since {{scheduleTasks()}} is only called by that vertex manager on start and on a vertex state update.

[~bikassaha], [~sseth], [~jeagles], [~rohini], could you please review the initial patch and questions? Thanks!

> Vertex with MRInput and shuffle input does not respect slow start
> -----------------------------------------------------------------
>
>                 Key: TEZ-3274
>                 URL: https://issues.apache.org/jira/browse/TEZ-3274
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Eric Badger
>         Attachments: TEZ-3274.001.patch
>
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and not ShuffleVertexManager) and start containers and tasks immediately. In this scenario, resources can be wasted since they do not respect tez.shuffle-vertex-manager.min-src-fraction tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)