You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tez.apache.org by Grandl Robert <rg...@yahoo.com> on 2014/06/22 04:23:22 UTC

Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as far as I reached the code and I want to understand it better. 

Can you point me to the main classes where the scheduling logic is involved ? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions are taken ? 

Each vertex manager decides the scheduling of the vertex tasks based on some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did not get it yet. 

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com> wrote:

Hi Robert, 

At the specification level, each vertex in a DAG defines the no. of tasks it will run ( no. of tasks can be decided when the plan is created or at run-time ). Furthermore, each task can also be tagged with location hints as to where to run the task - such as which host/rack or potentially if it is tied to a task in an upstream vertex, the container on which the previous task ran. 

The basic scheduling is done on the basis of vertex dependencies using a form of BFS traversal and the requirements of availability of upstream data. The root vertices are considered highest priority and the priority of vertices decreases based on the the distance from the root. Downstream vertices depending on the type of edge connecting them to their upstream vertices may decide to delay the start ( of their tasks ) until some or all tasks of upstream vertices are completed. An example is shuffle slow-start if you are familiar with the shuffle implementation in MapReduce. 

For example, all root vertices ( vertices with no inbound edges ) are “started” immediately when the DAG starts. Starting a vertex’s tasks effectively implies asking YARN for containers to run those tasks. Once containers become available ( i.e. provided by YARN or a container becomes free after a previously assigned task to it completes), it is assigned to the next pending task. The next pending task currently is the highest priority task i.e. a pending task for a vertex closest to the roots. There is also some additional logic with respect to affinity for dependent tasks and also backoff logic present to ensure that we give some weightage to ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
> 
> Can someone give me some details(or some pointers) on how the DAG scheduling in Tez happens ? 
> 
> Like: in what order are the tasks(w/o dependencies) chosen for scheduling; how the assignment between tasks and nodes happens, etc ...
> 
> Thanks,
> robert
> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: question regarding the DAG scheduler

Posted by Bikas Saha <bi...@hortonworks.com>.

Can you please outline the new ideas you have in mind. You can discuss them
in the mailing lists or open jiras and post comments there. This will help
get early feedback in case those are significant changes. This may help
prevent investing work in a potentially wrong direction. Alternatively,
this may help aligning/cooperating with others who may be pursuing similar
ideas.

Thanks

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Tuesday, June 24, 2014 11:49 AM
*To:* user@tez.incubator.apache.org; Hitesh Shah; Bikas Saha
*Subject:* Re: question regarding the DAG scheduler

Thanks a lot for your detailed comments.

Yeah, I would like to contribute some new ideas in this area, and most
likely the changes will be also in Tez framework. That's why I am pinging
you with so many questions(and probably will do further).

Pointing me to some jiras will be great.

Robert

On Tuesday, June 24, 2014 10:10 AM, Bikas Saha <bi...@hortonworks.com>
wrote:

So after the DAG is created, every vertex will be initialized,  as
following: root vertices first, and after a root vertex is initialized it
notifies downstream vertices. After all root vertices have initialized,
each next downstream vertex is initialized and so on. Then, each root
vertex is started

Bikas - Correct

immediately ALL the tasks corresponding to that vertex are sent to
DAGNaturalScheduler who will assign priorities and send them to
TaskScheduler

Bikas – Not quite. The vertex manager for the vertices will decide which
tasks to start. The default case is ImmediateStartVertexManager which
starts all tasks. The vertexmanager may be different for different edge
types or when explicitly set by the user. The TaskAttempt state machine
takes care of sending the task to the DAGScheduler (whose default is
DAGSchedulerNaturalOrder) that assigns priorities. Once that is done the
TaskAttempt state machine sends the task to the TaskScheduler. There may be
some extra overhead in doing this via events that we may be able to remove.

Task Scheduler is then negotiating Yarn containers and assigning next task
based on the priorities.

Bikas – correct.

Also, as soon as some tasks have started to finish in an upstream vertex(or
all the tasks should be finish in the upstream vertex ?),  a downstream
vertex(not a root one) is started and again all the tasks goes to
DAGNaturalScheduler,assign priorities and so on.

The downstream vertex is ready to start as soon as all its upstream
vertices have started and sent it a signal to start (just like init). At
this point, the vertex manager of the downstream vertex can start tasks.
The ImmediateStartVertexManager starts all tasks immediately, the
InputReadyVertexManager starts a task when its inputs are ready. The
ShuffleVertexManager starts tasks when partial inputs are ready. Then for
those tasks, its rinse and repeat like you note.

Also, by BFS order you mean, that initialize vertices initially by: root
vertices followed by downstream vertices. There is no other DAG traversal
otherwise right ? Just in the initialization phase.

Bikas – This traversal happens during initialization and start. Init
triggers the vertex initialization code to run – eg. Generate input data,
determine edge types for custom edges etc. Once a vertex has inited, it can
move into started once all its upstream vertices have started. That is when
its tasks can be scheduled to run. Thus vertex start also follows
topological order like vertex init.

Curious to the source of your questions. Are you looking at contributing
some new ideas in this area. I could point you to some jiras to help you
get started. Nothing better than patching some jiras to get familiar with
the code J

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Monday, June 23, 2014 10:21 PM
*To:* user@tez.incubator.apache.org; Hitesh Shah; Bikas Saha
*Subject:* Re: question regarding the DAG scheduler

Bikas, Hitesh,

Please correct me if I am wrong. So after the DAG is created, every vertex
will be initialized,  as following: root vertices first, and after a root
vertex is initialized it notifies downstream vertices. After all root
vertices have initialized, each next downstream vertex is initialized and
so on. Then, each root vertex is started and immediately ALL the tasks
corresponding to that vertex are sent to DAGNaturalScheduler who will
assign priorities and send them to TaskScheduler. Task Scheduler is then
negotiating Yarn containers and assigning next task based on the
priorities. Also, as soon as some tasks have started to finish in an
upstream vertex(or all the tasks should be finish in the upstream vertex
?),  a downstream vertex(not a root one) is started and again all the tasks
goes to DAGNaturalScheduler,assign priorities and so on.

Is this right ? So it seems the whole logic for scheduling happens in:
VertexManager, DAGNaturalScheduler, TAskScheduler. Am I missing something
important here ?

Also, by BFS order you mean, that initialize vertices initially by: root
vertices followed by downstream vertices. There is no other DAG traversal
otherwise right ? Just in the initialization phase.

Thanks,
robert

On Monday, June 23, 2014 11:06 AM, Bikas Saha <bi...@hortonworks.com> wrote:

I am not sure what you mean by map and reduce here. Do you mean that the
vertex names have map and reduce in them?

Then that is something in Hive. Hive chooses to name the vertices Map or
Reduce based on whether they are root vertices or not. Even if it’s a
complex DAG. The hive mailing list is a better forum to ask about the
motivation of those names.

The DAG app master code was seeded from the code of the MRAppMaster but is
substantially different from it.

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Monday, June 23, 2014 11:01 AM
*To:* Bikas Saha; user@tez.incubator.apache.org; Hitesh Shah
*Subject:* Re: question regarding the DAG scheduler

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all
the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an
Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the
DAG are either Map or Reduce ? For example, when I am running a HIVE query
atop tez, still a sequence of MapReduce jobs will be created, but they will
be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task
is sent off for scheduling to the framework. Then the DAGScheduler assigns
it priority of execution depending its depth in the graph and other things
like whether it’s a retry or not. Then the task goes to the TaskScheduler
that actually assigns it to a YARN container. The classes in the code have
the same prefix as in this email. DAGSchedulerMRR was written to simulate
MapReduce scheduling and is not used as it does not handle DAG’s. The graph
traversal happens in the framework. The vertices are initialized in their
topological order. Then the root vertices are started. Once a vertex starts
it sends a start signal to its downstream vertices which again happens in
topological order.

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Saturday, June 21, 2014 7:23 PM
*To:* Hitesh Shah; user@tez.incubator.apache.org
*Subject:* Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as
far as I reached the code and I want to understand it better.

Can you point me to the main classes where the scheduling logic is involved
? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions
are taken ?

Each vertex manager decides the scheduling of the vertex tasks based on
some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do
the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did
not get it yet.

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com>
wrote:

Hi Robert,

At the specification level, each vertex in a DAG defines the no. of tasks
it will run ( no. of tasks can be decided when the plan is created or at
run-time ). Furthermore, each task can also be tagged with location hints
as to where to run the task - such as which host/rack or potentially if it
is tied to a task in an upstream vertex, the container on which the
previous task ran.

The basic scheduling is done on the basis of vertex dependencies using a
form of BFS traversal and the requirements of availability of upstream
data. The root vertices are considered highest priority and the priority of
vertices decreases based on the the distance from the root. Downstream
vertices depending on the type of edge connecting them to their upstream
vertices may decide to delay the start ( of their tasks ) until some or all
tasks of upstream vertices are completed. An example is shuffle slow-start
if you are familiar with the shuffle implementation in MapReduce.

For example, all root vertices ( vertices with no inbound edges ) are
“started” immediately when the DAG starts. Starting a vertex’s tasks
effectively implies asking YARN for containers to run those tasks. Once
containers become available ( i.e. provided by YARN or a container becomes
free after a previously assigned task to it completes), it is assigned to
the next pending task. The next pending task currently is the highest
priority task i.e. a pending task for a vertex closest to the roots. There
is also some additional logic with respect to affinity for dependent tasks
and also backoff logic present to ensure that we give some weightage to
ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
>
> Can someone give me some details(or some pointers) on how the DAG
scheduling in Tez happens ?
>
> Like: in what order are the tasks(w/o dependencies) chosen for
scheduling; how the assignment between tasks and nodes happens, etc ...
>
> Thanks,
> robert

>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: question regarding the DAG scheduler

Posted by Grandl Robert <rg...@yahoo.com>.

Thanks a lot for your detailed comments.

Yeah, I would like to contribute some new ideas in this area, and most likely the changes will be also in Tez framework. That's why I am pinging you with so many questions(and probably will do further).

Pointing me to some jiras will be great. 

Robert

On Tuesday, June 24, 2014 10:10 AM, Bikas Saha <bi...@hortonworks.com> wrote:

So after the DAG is created, every vertex will be initialized,  as following: root vertices first, and after a root vertex is initialized it notifies downstream vertices. After all root vertices have initialized, each next downstream vertex is initialized and so on. Then, each root vertex is started
Bikas - Correct

immediately ALL the tasks corresponding to that vertex are sent to DAGNaturalScheduler who will assign priorities and send them to TaskScheduler
Bikas – Not quite. The vertex manager for the vertices will decide which tasks to start. The default case is ImmediateStartVertexManager which starts all tasks. The vertexmanager may be different for different edge types or when explicitly set by the user. The TaskAttempt state machine takes care of sending the task to the DAGScheduler (whose default is DAGSchedulerNaturalOrder) that assigns priorities. Once that is done the TaskAttempt state machine sends the task to the TaskScheduler. There may be some extra overhead in doing this via events that we may be able to remove.

Task Scheduler is then negotiating Yarn containers and assigning next task based on the priorities. 
Bikas – correct.

Also, as soon as some tasks have started to finish in an upstream vertex(or all the tasks should be finish in the upstream vertex ?),  a downstream vertex(not a root one) is started and again all the tasks goes to DAGNaturalScheduler,assign priorities and so on. 
The downstream vertex is ready to start as soon as all its upstream vertices have started and sent it a signal to start (just like init). At this point, the vertex manager of the downstream vertex can start tasks. The ImmediateStartVertexManager starts all tasks immediately, the InputReadyVertexManager starts a task when its inputs are ready. The ShuffleVertexManager starts tasks when partial inputs are ready. Then for those tasks, its rinse and repeat like you note.

Also, by BFS order you mean, that initialize vertices initially by: root vertices followed by downstream vertices. There is no other DAG traversal otherwise right ? Just in the initialization phase.
Bikas – This traversal happens during initialization and start. Init triggers the vertex initialization code to run – eg. Generate input data, determine edge types for custom edges etc. Once a vertex has inited, it can move into started once all its upstream vertices have started. That is when its tasks can be scheduled to run. Thus vertex start also follows topological order like vertex init.

Curious to the source of your questions. Are you looking at contributing some new ideas in this area. I could point you to some jiras to help you get started. Nothing better than patching some jiras to get familiar with the code J

Bikas

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Monday, June 23, 2014 10:21 PM
To: user@tez.incubator.apache.org; Hitesh Shah; Bikas Saha
Subject: Re: question regarding the DAG scheduler

Bikas, Hitesh,

Please correct me if I am wrong. So after the DAG is created, every vertex will be initialized,  as following: root vertices first, and after a root vertex is initialized it notifies downstream vertices. After all root vertices have initialized, each next downstream vertex is initialized and so on. Then, each root vertex is started and immediately ALL the tasks corresponding to that vertex are sent to DAGNaturalScheduler who will assign priorities and send them to TaskScheduler. Task Scheduler is then negotiating Yarn containers and assigning next task based on the priorities. Also, as soon as some tasks have started to finish in an upstream vertex(or all the tasks should be finish in the upstream vertex ?),  a downstream vertex(not a root one) is started and again all the tasks goes to DAGNaturalScheduler,assign priorities and so on. 

Is this right ? So it seems the whole logic for scheduling happens in: VertexManager, DAGNaturalScheduler, TAskScheduler. Am I missing something important here ?

Also, by BFS order you mean, that initialize vertices initially by: root vertices followed by downstream vertices. There is no other DAG traversal otherwise right ? Just in the initialization phase.

Thanks,
robert

On Monday, June 23, 2014 11:06 AM, Bikas Saha <bi...@hortonworks.com> wrote:

I am not sure what you mean by map and reduce here. Do you mean that the vertex names have map and reduce in them?

Then that is something in Hive. Hive chooses to name the vertices Map or Reduce based on whether they are root vertices or not. Even if it’s a complex DAG. The hive mailing list is a better forum to ask about the motivation of those names.

The DAG app master code was seeded from the code of the MRAppMaster but is substantially different from it.

Bikas

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Monday, June 23, 2014 11:01 AM
To: Bikas Saha; user@tez.incubator.apache.org; Hitesh Shah
Subject: Re: question regarding the DAG scheduler

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the DAG are either Map or Reduce ? For example, when I am running a HIVE query atop tez, still a sequence of MapReduce jobs will be created, but they will be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task is sent off for scheduling to the framework. Then the DAGScheduler assigns it priority of execution depending its depth in the graph and other things like whether it’s a retry or not. Then the task goes to the TaskScheduler that actually assigns it to a YARN container. The classes in the code have the same prefix as in this email. DAGSchedulerMRR was written to simulate MapReduce scheduling and is not used as it does not handle DAG’s. The graph traversal happens in the framework. The vertices are initialized in their topological order. Then the root vertices are started. Once a vertex starts it sends a start signal to its downstream vertices which again happens in topological order.

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Saturday, June 21, 2014 7:23 PM
To: Hitesh Shah; user@tez.incubator.apache.org
Subject: Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as far as I reached the code and I want to understand it better. 

Can you point me to the main classes where the scheduling logic is involved ? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions are taken ? 

Each vertex manager decides the scheduling of the vertex tasks based on some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did not get it yet. 

Thanks in advance for your help,
Robert
On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com> wrote:

Hi Robert, 

At the specification level, each vertex in a DAG defines the no. of tasks it will run ( no. of tasks can be decided when the plan is created or at run-time ). Furthermore, each task can also be tagged with location hints as to where to run the task - such as which host/rack or potentially if it is tied to a task in an upstream vertex, the container on which the previous task ran. 

The basic scheduling is done on the basis of vertex dependencies using a form of BFS traversal and the requirements of availability of upstream data. The root vertices are considered highest priority and the priority of vertices decreases based on the the distance from the root. Downstream vertices depending on the type of edge connecting them to their upstream vertices may decide to delay the start ( of their tasks ) until some or all tasks of upstream vertices are completed. An example is shuffle slow-start if you are familiar with the shuffle implementation in MapReduce. 

For example, all root vertices ( vertices with no inbound edges ) are “started” immediately when the DAG starts. Starting a vertex’s tasks effectively implies asking YARN for containers to run those tasks. Once containers become available ( i.e. provided by YARN or a container becomes free after a previously assigned task to it completes), it is assigned to the next pending task. The next pending task currently is the highest priority task i.e. a pending task for a vertex closest to the roots. There is also some additional logic with respect to affinity for dependent tasks and also backoff logic present to ensure that we give some weightage to ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
> 
> Can someone give me some details(or some pointers) on how the DAG scheduling in Tez happens ? 
> 
> Like: in what order are the tasks(w/o dependencies) chosen for scheduling; how the assignment between tasks and nodes happens, etc ...
> 
> Thanks,
> robert

> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: question regarding the DAG scheduler

Posted by Bikas Saha <bi...@hortonworks.com>.

So after the DAG is created, every vertex will be initialized,  as
following: root vertices first, and after a root vertex is initialized it
notifies downstream vertices. After all root vertices have initialized,
each next downstream vertex is initialized and so on. Then, each root
vertex is started

Bikas - Correct

immediately ALL the tasks corresponding to that vertex are sent to
DAGNaturalScheduler who will assign priorities and send them to
TaskScheduler

Bikas – Not quite. The vertex manager for the vertices will decide which
tasks to start. The default case is ImmediateStartVertexManager which
starts all tasks. The vertexmanager may be different for different edge
types or when explicitly set by the user. The TaskAttempt state machine
takes care of sending the task to the DAGScheduler (whose default is
DAGSchedulerNaturalOrder) that assigns priorities. Once that is done the
TaskAttempt state machine sends the task to the TaskScheduler. There may be
some extra overhead in doing this via events that we may be able to remove.

Task Scheduler is then negotiating Yarn containers and assigning next task
based on the priorities.

Bikas – correct.

Also, as soon as some tasks have started to finish in an upstream vertex(or
all the tasks should be finish in the upstream vertex ?),  a downstream
vertex(not a root one) is started and again all the tasks goes to
DAGNaturalScheduler,assign priorities and so on.

The downstream vertex is ready to start as soon as all its upstream
vertices have started and sent it a signal to start (just like init). At
this point, the vertex manager of the downstream vertex can start tasks.
The ImmediateStartVertexManager starts all tasks immediately, the
InputReadyVertexManager starts a task when its inputs are ready. The
ShuffleVertexManager starts tasks when partial inputs are ready. Then for
those tasks, its rinse and repeat like you note.

Also, by BFS order you mean, that initialize vertices initially by: root
vertices followed by downstream vertices. There is no other DAG traversal
otherwise right ? Just in the initialization phase.

Bikas – This traversal happens during initialization and start. Init
triggers the vertex initialization code to run – eg. Generate input data,
determine edge types for custom edges etc. Once a vertex has inited, it can
move into started once all its upstream vertices have started. That is when
its tasks can be scheduled to run. Thus vertex start also follows
topological order like vertex init.

Curious to the source of your questions. Are you looking at contributing
some new ideas in this area. I could point you to some jiras to help you
get started. Nothing better than patching some jiras to get familiar with
the code J

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Monday, June 23, 2014 10:21 PM
*To:* user@tez.incubator.apache.org; Hitesh Shah; Bikas Saha
*Subject:* Re: question regarding the DAG scheduler

Bikas, Hitesh,

Please correct me if I am wrong. So after the DAG is created, every vertex
will be initialized,  as following: root vertices first, and after a root
vertex is initialized it notifies downstream vertices. After all root
vertices have initialized, each next downstream vertex is initialized and
so on. Then, each root vertex is started and immediately ALL the tasks
corresponding to that vertex are sent to DAGNaturalScheduler who will
assign priorities and send them to TaskScheduler. Task Scheduler is then
negotiating Yarn containers and assigning next task based on the
priorities. Also, as soon as some tasks have started to finish in an
upstream vertex(or all the tasks should be finish in the upstream vertex
?),  a downstream vertex(not a root one) is started and again all the tasks
goes to DAGNaturalScheduler,assign priorities and so on.

Is this right ? So it seems the whole logic for scheduling happens in:
VertexManager, DAGNaturalScheduler, TAskScheduler. Am I missing something
important here ?

Also, by BFS order you mean, that initialize vertices initially by: root
vertices followed by downstream vertices. There is no other DAG traversal
otherwise right ? Just in the initialization phase.

Thanks,
robert

On Monday, June 23, 2014 11:06 AM, Bikas Saha <bi...@hortonworks.com> wrote:

I am not sure what you mean by map and reduce here. Do you mean that the
vertex names have map and reduce in them?

Then that is something in Hive. Hive chooses to name the vertices Map or
Reduce based on whether they are root vertices or not. Even if it’s a
complex DAG. The hive mailing list is a better forum to ask about the
motivation of those names.

The DAG app master code was seeded from the code of the MRAppMaster but is
substantially different from it.

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Monday, June 23, 2014 11:01 AM
*To:* Bikas Saha; user@tez.incubator.apache.org; Hitesh Shah
*Subject:* Re: question regarding the DAG scheduler

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all
the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an
Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the
DAG are either Map or Reduce ? For example, when I am running a HIVE query
atop tez, still a sequence of MapReduce jobs will be created, but they will
be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task
is sent off for scheduling to the framework. Then the DAGScheduler assigns
it priority of execution depending its depth in the graph and other things
like whether it’s a retry or not. Then the task goes to the TaskScheduler
that actually assigns it to a YARN container. The classes in the code have
the same prefix as in this email. DAGSchedulerMRR was written to simulate
MapReduce scheduling and is not used as it does not handle DAG’s. The graph
traversal happens in the framework. The vertices are initialized in their
topological order. Then the root vertices are started. Once a vertex starts
it sends a start signal to its downstream vertices which again happens in
topological order.

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Saturday, June 21, 2014 7:23 PM
*To:* Hitesh Shah; user@tez.incubator.apache.org
*Subject:* Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as
far as I reached the code and I want to understand it better.

Can you point me to the main classes where the scheduling logic is involved
? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions
are taken ?

Each vertex manager decides the scheduling of the vertex tasks based on
some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do
the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did
not get it yet.

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com>
wrote:

Hi Robert,

At the specification level, each vertex in a DAG defines the no. of tasks
it will run ( no. of tasks can be decided when the plan is created or at
run-time ). Furthermore, each task can also be tagged with location hints
as to where to run the task - such as which host/rack or potentially if it
is tied to a task in an upstream vertex, the container on which the
previous task ran.

The basic scheduling is done on the basis of vertex dependencies using a
form of BFS traversal and the requirements of availability of upstream
data. The root vertices are considered highest priority and the priority of
vertices decreases based on the the distance from the root. Downstream
vertices depending on the type of edge connecting them to their upstream
vertices may decide to delay the start ( of their tasks ) until some or all
tasks of upstream vertices are completed. An example is shuffle slow-start
if you are familiar with the shuffle implementation in MapReduce.

For example, all root vertices ( vertices with no inbound edges ) are
“started” immediately when the DAG starts. Starting a vertex’s tasks
effectively implies asking YARN for containers to run those tasks. Once
containers become available ( i.e. provided by YARN or a container becomes
free after a previously assigned task to it completes), it is assigned to
the next pending task. The next pending task currently is the highest
priority task i.e. a pending task for a vertex closest to the roots. There
is also some additional logic with respect to affinity for dependent tasks
and also backoff logic present to ensure that we give some weightage to
ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
>
> Can someone give me some details(or some pointers) on how the DAG
scheduling in Tez happens ?
>
> Like: in what order are the tasks(w/o dependencies) chosen for
scheduling; how the assignment between tasks and nodes happens, etc ...
>
> Thanks,
> robert

>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: question regarding the DAG scheduler

Posted by Grandl Robert <rg...@yahoo.com>.

Bikas, Hitesh,

Please correct me if I am wrong. So after the DAG is created, every vertex will be initialized,  as following: root vertices first, and after a root vertex is initialized it notifies downstream vertices. After all root vertices have initialized, each next downstream vertex is initialized and so on. Then, each root vertex is started and immediately ALL the tasks corresponding to that vertex are sent to DAGNaturalScheduler who will assign priorities and send them to TaskScheduler. Task Scheduler is then negotiating Yarn containers and assigning next task based on the priorities. Also, as soon as some tasks have started to finish in an upstream vertex(or all the tasks should be finish in the upstream vertex ?),  a downstream vertex(not a root one) is started and again all the tasks goes to DAGNaturalScheduler,assign priorities and so on. 

Is this right ? So it seems the whole logic for scheduling happens in: VertexManager, DAGNaturalScheduler, TAskScheduler. Am I missing something important here ?

Also, by BFS order you mean, that initialize vertices initially by: root vertices followed by downstream vertices. There is no other DAG traversal otherwise right ? Just in the initialization phase.

Thanks,
robert

On Monday, June 23, 2014 11:06 AM, Bikas Saha <bi...@hortonworks.com> wrote:

I am not sure what you mean by map and reduce here. Do you mean that the vertex names have map and reduce in them?

Then that is something in Hive. Hive chooses to name the vertices Map or Reduce based on whether they are root vertices or not. Even if it’s a complex DAG. The hive mailing list is a better forum to ask about the motivation of those names.

The DAG app master code was seeded from the code of the MRAppMaster but is substantially different from it.

Bikas

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Monday, June 23, 2014 11:01 AM
To: Bikas Saha; user@tez.incubator.apache.org; Hitesh Shah
Subject: Re: question regarding the DAG scheduler

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the DAG are either Map or Reduce ? For example, when I am running a HIVE query atop tez, still a sequence of MapReduce jobs will be created, but they will be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task is sent off for scheduling to the framework. Then the DAGScheduler assigns it priority of execution depending its depth in the graph and other things like whether it’s a retry or not. Then the task goes to the TaskScheduler that actually assigns it to a YARN container. The classes in the code have the same prefix as in this email. DAGSchedulerMRR was written to simulate MapReduce scheduling and is not used as it does not handle DAG’s. The graph traversal happens in the framework. The vertices are initialized in their topological order. Then the root vertices are started. Once a vertex starts it sends a start signal to its downstream vertices which again happens in topological order.

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Saturday, June 21, 2014 7:23 PM
To: Hitesh Shah; user@tez.incubator.apache.org
Subject: Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as far as I reached the code and I want to understand it better. 

Can you point me to the main classes where the scheduling logic is involved ? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions are taken ? 

Each vertex manager decides the scheduling of the vertex tasks based on some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did not get it yet. 

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com> wrote:

Hi Robert, 

At the specification level, each vertex in a DAG defines the no. of tasks it will run ( no. of tasks can be decided when the plan is created or at run-time ). Furthermore, each task can also be tagged with location hints as to where to run the task - such as which host/rack or potentially if it is tied to a task in an upstream vertex, the container on which the previous task ran. 

The basic scheduling is done on the basis of vertex dependencies using a form of BFS traversal and the requirements of availability of upstream data. The root vertices are considered highest priority and the priority of vertices decreases based on the the distance from the root. Downstream vertices depending on the type of edge connecting them to their upstream vertices may decide to delay the start ( of their tasks ) until some or all tasks of upstream vertices are completed. An example is shuffle slow-start if you are familiar with the shuffle implementation in MapReduce. 

For example, all root vertices ( vertices with no inbound edges ) are “started” immediately when the DAG starts. Starting a vertex’s tasks effectively implies asking YARN for containers to run those tasks. Once containers become available ( i.e. provided by YARN or a container becomes free after a previously assigned task to it completes), it is assigned to the next pending task. The next pending task currently is the highest priority task i.e. a pending task for a vertex closest to the roots. There is also some additional logic with respect to affinity for dependent tasks and also backoff logic present to ensure that we give some weightage to ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
> 
> Can someone give me some details(or some pointers) on how the DAG scheduling in Tez happens ? 
> 
> Like: in what order are the tasks(w/o dependencies) chosen for scheduling; how the assignment between tasks and nodes happens, etc ...
> 
> Thanks,
> robert

> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: question regarding the DAG scheduler

Posted by Bikas Saha <bi...@hortonworks.com>.

I am not sure what you mean by map and reduce here. Do you mean that the
vertex names have map and reduce in them?

Then that is something in Hive. Hive chooses to name the vertices Map or
Reduce based on whether they are root vertices or not. Even if it’s a
complex DAG. The hive mailing list is a better forum to ask about the
motivation of those names.

The DAG app master code was seeded from the code of the MRAppMaster but is
substantially different from it.

Bikas

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Monday, June 23, 2014 11:01 AM
*To:* Bikas Saha; user@tez.incubator.apache.org; Hitesh Shah
*Subject:* Re: question regarding the DAG scheduler

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all
the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an
Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the
DAG are either Map or Reduce ? For example, when I am running a HIVE query
atop tez, still a sequence of MapReduce jobs will be created, but they will
be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task
is sent off for scheduling to the framework. Then the DAGScheduler assigns
it priority of execution depending its depth in the graph and other things
like whether it’s a retry or not. Then the task goes to the TaskScheduler
that actually assigns it to a YARN container. The classes in the code have
the same prefix as in this email. DAGSchedulerMRR was written to simulate
MapReduce scheduling and is not used as it does not handle DAG’s. The graph
traversal happens in the framework. The vertices are initialized in their
topological order. Then the root vertices are started. Once a vertex starts
it sends a start signal to its downstream vertices which again happens in
topological order.

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Saturday, June 21, 2014 7:23 PM
*To:* Hitesh Shah; user@tez.incubator.apache.org
*Subject:* Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as
far as I reached the code and I want to understand it better.

Can you point me to the main classes where the scheduling logic is involved
? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions
are taken ?

Each vertex manager decides the scheduling of the vertex tasks based on
some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do
the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did
not get it yet.

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com>
wrote:

Hi Robert,

At the specification level, each vertex in a DAG defines the no. of tasks
it will run ( no. of tasks can be decided when the plan is created or at
run-time ). Furthermore, each task can also be tagged with location hints
as to where to run the task - such as which host/rack or potentially if it
is tied to a task in an upstream vertex, the container on which the
previous task ran.

The basic scheduling is done on the basis of vertex dependencies using a
form of BFS traversal and the requirements of availability of upstream
data. The root vertices are considered highest priority and the priority of
vertices decreases based on the the distance from the root. Downstream
vertices depending on the type of edge connecting them to their upstream
vertices may decide to delay the start ( of their tasks ) until some or all
tasks of upstream vertices are completed. An example is shuffle slow-start
if you are familiar with the shuffle implementation in MapReduce.

For example, all root vertices ( vertices with no inbound edges ) are
“started” immediately when the DAG starts. Starting a vertex’s tasks
effectively implies asking YARN for containers to run those tasks. Once
containers become available ( i.e. provided by YARN or a container becomes
free after a previously assigned task to it completes), it is assigned to
the next pending task. The next pending task currently is the highest
priority task i.e. a pending task for a vertex closest to the roots. There
is also some additional logic with respect to affinity for dependent tasks
and also backoff logic present to ensure that we give some weightage to
ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
>
> Can someone give me some details(or some pointers) on how the DAG
scheduling in Tez happens ?
>
> Like: in what order are the tasks(w/o dependencies) chosen for
scheduling; how the assignment between tasks and nodes happens, etc ...
>
> Thanks,
> robert

>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: question regarding the DAG scheduler

Posted by Grandl Robert <rg...@yahoo.com>.

Thanks Bikas.

I have another confusion. Looking at the dot file for a generated DAG, all the nodes are either Map or Reduce. Also, the DAGAppMaster seems to be an Map-Reduce application master.

Can you explain me a bit, why these notations ? So, all the nodes in the DAG are either Map or Reduce ? For example, when I am running a HIVE query atop tez, still a sequence of MapReduce jobs will be created, but they will be linked in a DAG though ?

robert

On Sunday, June 22, 2014 10:28 AM, Bikas Saha <bi...@hortonworks.com> wrote:

The vertex manager decides when a task is ready to run. After that the task is sent off for scheduling to the framework. Then the DAGScheduler assigns it priority of execution depending its depth in the graph and other things like whether it’s a retry or not. Then the task goes to the TaskScheduler that actually assigns it to a YARN container. The classes in the code have the same prefix as in this email. DAGSchedulerMRR was written to simulate MapReduce scheduling and is not used as it does not handle DAG’s. The graph traversal happens in the framework. The vertices are initialized in their topological order. Then the root vertices are started. Once a vertex starts it sends a start signal to its downstream vertices which again happens in topological order.

From:Grandl Robert [mailto:rgrandl@yahoo.com] 
Sent: Saturday, June 21, 2014 7:23 PM
To: Hitesh Shah; user@tez.incubator.apache.org
Subject: Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as far as I reached the code and I want to understand it better. 

Can you point me to the main classes where the scheduling logic is involved ? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions are taken ? 

Each vertex manager decides the scheduling of the vertex tasks based on some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did not get it yet. 

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com> wrote:

Hi Robert, 

At the specification level, each vertex in a DAG defines the no. of tasks it will run ( no. of tasks can be decided when the plan is created or at run-time ). Furthermore, each task can also be tagged with location hints as to where to run the task - such as which host/rack or potentially if it is tied to a task in an upstream vertex, the container on which the previous task ran. 

The basic scheduling is done on the basis of vertex dependencies using a form of BFS traversal and the requirements of availability of upstream data. The root vertices are considered highest priority and the priority of vertices decreases based on the the distance from the root. Downstream vertices depending on the type of edge connecting them to their upstream vertices may decide to delay the start ( of their tasks ) until some or all tasks of upstream vertices are completed. An example is shuffle slow-start if you are familiar with the shuffle implementation in MapReduce. 

For example, all root vertices ( vertices with no inbound edges ) are “started” immediately when the DAG starts. Starting a vertex’s tasks effectively implies asking YARN for containers to run those tasks. Once containers become available ( i.e. provided by YARN or a container becomes free after a previously assigned task to it completes), it is assigned to the next pending task. The next pending task currently is the highest priority task i.e. a pending task for a vertex closest to the roots. There is also some additional logic with respect to affinity for dependent tasks and also backoff logic present to ensure that we give some weightage to ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
> 
> Can someone give me some details(or some pointers) on how the DAG scheduling in Tez happens ? 
> 
> Like: in what order are the tasks(w/o dependencies) chosen for scheduling; how the assignment between tasks and nodes happens, etc ...
> 
> Thanks,
> robert

> 

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: question regarding the DAG scheduler

Posted by Bikas Saha <bi...@hortonworks.com>.

The vertex manager decides when a task is ready to run. After that the task
is sent off for scheduling to the framework. Then the DAGScheduler assigns
it priority of execution depending its depth in the graph and other things
like whether it’s a retry or not. Then the task goes to the TaskScheduler
that actually assigns it to a YARN container. The classes in the code have
the same prefix as in this email. DAGSchedulerMRR was written to simulate
MapReduce scheduling and is not used as it does not handle DAG’s. The graph
traversal happens in the framework. The vertices are initialized in their
topological order. Then the root vertices are started. Once a vertex starts
it sends a start signal to its downstream vertices which again happens in
topological order.

*From:* Grandl Robert [mailto:rgrandl@yahoo.com]
*Sent:* Saturday, June 21, 2014 7:23 PM
*To:* Hitesh Shah; user@tez.incubator.apache.org
*Subject:* Re: question regarding the DAG scheduler

Hitesh,

I am coming back to you with some more questions about TEZ Dag Scheduler as
far as I reached the code and I want to understand it better.

Can you point me to the main classes where the scheduling logic is involved
? A.k.a where is the Tez DAG scheduler, or where other scheduling decisions
are taken ?

Each vertex manager decides the scheduling of the vertex tasks based on
some logic, and then a DAGSchedulerNaturalOrder or DAGSchedulerMRR will do
the ordering among them ? What is the main difference between these two ?

Also, I was trying to see where this BFS traversal is doing but still did
not get it yet.

Thanks in advance for your help,
Robert

On Tuesday, April 15, 2014 4:58 PM, Hitesh Shah <hi...@hortonworks.com>
wrote:

Hi Robert,

At the specification level, each vertex in a DAG defines the no. of tasks
it will run ( no. of tasks can be decided when the plan is created or at
run-time ). Furthermore, each task can also be tagged with location hints
as to where to run the task - such as which host/rack or potentially if it
is tied to a task in an upstream vertex, the container on which the
previous task ran.

The basic scheduling is done on the basis of vertex dependencies using a
form of BFS traversal and the requirements of availability of upstream
data. The root vertices are considered highest priority and the priority of
vertices decreases based on the the distance from the root. Downstream
vertices depending on the type of edge connecting them to their upstream
vertices may decide to delay the start ( of their tasks ) until some or all
tasks of upstream vertices are completed. An example is shuffle slow-start
if you are familiar with the shuffle implementation in MapReduce.

For example, all root vertices ( vertices with no inbound edges ) are
“started” immediately when the DAG starts. Starting a vertex’s tasks
effectively implies asking YARN for containers to run those tasks. Once
containers become available ( i.e. provided by YARN or a container becomes
free after a previously assigned task to it completes), it is assigned to
the next pending task. The next pending task currently is the highest
priority task i.e. a pending task for a vertex closest to the roots. There
is also some additional logic with respect to affinity for dependent tasks
and also backoff logic present to ensure that we give some weightage to
ensure task locality is not affected in cases of re-use of containers.

— Hitesh

On Apr 14, 2014, at 7:13 PM, Grandl Robert <rg...@yahoo.com> wrote:

> Hi,
>
> Can someone give me some details(or some pointers) on how the DAG
scheduling in Tez happens ?
>
> Like: in what order are the tasks(w/o dependencies) chosen for
scheduling; how the assignment between tasks and nodes happens, etc ...
>
> Thanks,
> robert

>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.