You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tez.apache.org by Robert Grandl <rg...@yahoo.com> on 2016/11/28 23:44:19 UTC

clarification regarding Tez DAGs

Hi all,
I am trying to get a better understanding of the DAGs generated by Hive atop Tez. However, I have some more fundamental questions about the types of tasks/edges in a Tez DAG. 

1) In case of MapReduce:Map - takes records and generates <Key, Value> pairs.Reduce - takes <Key, Value> pairs and reduce the list of the values for the same Key. 
Question:That means the reducer  does not change the Keys right?
In case of Tez, things can be more complex:2) For example, Map tasks can be in the middle of the DAG too. My understanding is that in this case the input is a set of <Key, Value> pairs and the output can be a set of different <KeyX, ValueX> value pairs. 
Is this true for any type of input edge (scatter gather, broadcast, one to one)?

3) Reduce tasks can be in the middle as well. Can I assume that the reducer also can change the keys? For example, in case of Map -> Reduce_1 -> Reduce_2 patterns, what is the main reason of having Reduce_2? It is because the keys are changed by Reduce_2 while Reduce_1 preserve the ones from the Map?
4) On a related note. In case of Map_1 -> Map_2 patterns, it is possible Map_2 to preserve the Keys generated by Map_1 or will be new keys?

4) If my guess that both Map and Reduce stages can eventually change the keys, what is the main difference of having both Map and Reduce stages in the middle of the DAG (i.e. not input stages or leaf stages).
Thanks,- Robert

Re: clarification regarding Tez DAGs

Posted by Robert Grandl <rg...@yahoo.com>.

 Hitesh,
Thank you so much for your detailed answers. I would like to ask few more questions  although maybe people from the Hive list knows more.

1) Do you have any idea what are the typical operations executed in a vertex (the ones where input and output are in the form of <K, V> pairs)? I guess I am thinking more in context of SQL-like applications such as Hive/Pig.

2) I know in MapReduce typically all data has to be retrieved by the reducers before the reduce function is applied. It is the same the case for applications built on Tez (esp. Hive, Pig)?
3) It is possible to apply the processing logic function (reduce() function in case of MapReduce) multiple types while retrieving Key, Value pairs from upstream vertices, instead of waiting for the complete set of data to be shuffled? 
If yes, it is in general an expensive operation? If not, is any obvious logic which impede that?
Thanks again for your help with this,Robert

    On Monday, November 28, 2016 6:26 PM, Hitesh Shah <hi...@apache.org> wrote:

 Hello Robert, 

Some of the questions may be better answered on the Hive list but I will take a first crack of some of them. 

From a Tez perspective, lets use vertices and ignore Maps and Reducers for now. Hive uses this as a convention to indicate that a vertex is either reading data from HDFS (map) or has an inbound shuffle edge (reduce).

For a given vertex, each task in the vertex is composed of a set of inputs, a processor and a set of outputs. 

The key-value constructs are defined by the kind of Input and Output being used. Today, pretty much all I/Os are key-value based.

The edge types defined how data is being transferred but they do not completely control how data is manipulated to be sent across the edge. A lot of that is defined within the Inputs and Outputs. To clarify, a broadcast edge implies an task from an upstream vertex will send all of its output to all tasks in the downstream vertex. However, a broadcast edge does not imply whether the data is sorted or non-sorted. Likewise for the scatter-gather edge. This edge allows for each task in an upstream vertex to generate partitioned data that can be distributed to all downstream tasks. This can be used to mimic the MR shuffle by having the Output in the upstream vertex generate partitioned and sorted data and be sent to a downstream Input which will do a merge+sort for all relevant partitions that it needs from all upstream tasks. This allows for plugging in a shuffle-like edge implementation that does not sort data but only partitions it ( or groups it ).

To answer your questions: 

>>> for (2) and (3)

Yes. The processor can generate a different key, val pair if it wants to. A simple usage of a MRR chain would be a case where you want to a group by X followed by an order by Y. It can be done in some form via a 2-stage DAG but a simplistic model would be a 3-stage dag where stage 2 does the grouping and stage 3 the order by.

>>> for (4) and (5)

I am not sure I understand the question. Could you clarify what M2 expects in terms of its input? If you combined the logic of M1 and M2 instead of a single task, would that retain the behavior that you want? If the reduce stage or a map stage in the middle of a DAG are both expecting an inbound shuffled input then there is no difference except for their logical names. 

Feel free to send more questions to the list to get more clarifications.

thanks
— Hitesh

> On Nov 28, 2016, at 3:44 PM, Robert Grandl <rg...@yahoo.com> wrote:
> 
> Hi all,
> 
> I am trying to get a better understanding of the DAGs generated by Hive atop Tez. However, I have some more fundamental questions about the types of tasks/edges in a Tez DAG. 
> 
> 1) In case of MapReduce:
> Map - takes records and generates <Key, Value> pairs.
> Reduce - takes <Key, Value> pairs and reduce the list of the values for the same Key. 
> Question:That means the reducer  does not change the Keys right?
> 
> In case of Tez, things can be more complex:
> 2) For example, Map tasks can be in the middle of the DAG too. My understanding is that in this case the input is a set of <Key, Value> pairs and the output can be a set of different <KeyX, ValueX> value pairs. 
> Is this true for any type of input edge (scatter gather, broadcast, one to one)?
> 
> 3) Reduce tasks can be in the middle as well. Can I assume that the reducer also can change the keys? For example, in case of Map -> Reduce_1 -> Reduce_2 patterns, what is the main reason of having Reduce_2? It is because the keys are changed by Reduce_2 while Reduce_1 preserve the ones from the Map?
> 
> 4) On a related note. In case of Map_1 -> Map_2 patterns, it is possible Map_2 to preserve the Keys generated by Map_1 or will be new keys?
> 
> 4) If my guess that both Map and Reduce stages can eventually change the keys, what is the main difference of having both Map and Reduce stages in the middle of the DAG (i.e. not input stages or leaf stages).
> 
> Thanks,
> - Robert
>

Re: clarification regarding Tez DAGs

Posted by Robert Grandl <rg...@yahoo.com>.

 Hitesh,
Thank you so much for your detailed answers. I would like to ask few more questions  although maybe people from the Hive list knows more.

1) Do you have any idea what are the typical operations executed in a vertex (the ones where input and output are in the form of <K, V> pairs)? I guess I am thinking more in context of SQL-like applications such as Hive/Pig.

2) I know in MapReduce typically all data has to be retrieved by the reducers before the reduce function is applied. It is the same the case for applications built on Tez (esp. Hive, Pig)?
3) It is possible to apply the processing logic function (reduce() function in case of MapReduce) multiple types while retrieving Key, Value pairs from upstream vertices, instead of waiting for the complete set of data to be shuffled? 
If yes, it is in general an expensive operation? If not, is any obvious logic which impede that?
Thanks again for your help with this,Robert

    On Monday, November 28, 2016 6:26 PM, Hitesh Shah <hi...@apache.org> wrote:

 Hello Robert, 

Some of the questions may be better answered on the Hive list but I will take a first crack of some of them. 

From a Tez perspective, lets use vertices and ignore Maps and Reducers for now. Hive uses this as a convention to indicate that a vertex is either reading data from HDFS (map) or has an inbound shuffle edge (reduce).

For a given vertex, each task in the vertex is composed of a set of inputs, a processor and a set of outputs. 

The key-value constructs are defined by the kind of Input and Output being used. Today, pretty much all I/Os are key-value based.

The edge types defined how data is being transferred but they do not completely control how data is manipulated to be sent across the edge. A lot of that is defined within the Inputs and Outputs. To clarify, a broadcast edge implies an task from an upstream vertex will send all of its output to all tasks in the downstream vertex. However, a broadcast edge does not imply whether the data is sorted or non-sorted. Likewise for the scatter-gather edge. This edge allows for each task in an upstream vertex to generate partitioned data that can be distributed to all downstream tasks. This can be used to mimic the MR shuffle by having the Output in the upstream vertex generate partitioned and sorted data and be sent to a downstream Input which will do a merge+sort for all relevant partitions that it needs from all upstream tasks. This allows for plugging in a shuffle-like edge implementation that does not sort data but only partitions it ( or groups it ).

To answer your questions: 

>>> for (2) and (3)

Yes. The processor can generate a different key, val pair if it wants to. A simple usage of a MRR chain would be a case where you want to a group by X followed by an order by Y. It can be done in some form via a 2-stage DAG but a simplistic model would be a 3-stage dag where stage 2 does the grouping and stage 3 the order by.

>>> for (4) and (5)

I am not sure I understand the question. Could you clarify what M2 expects in terms of its input? If you combined the logic of M1 and M2 instead of a single task, would that retain the behavior that you want? If the reduce stage or a map stage in the middle of a DAG are both expecting an inbound shuffled input then there is no difference except for their logical names. 

Feel free to send more questions to the list to get more clarifications.

thanks
— Hitesh

> On Nov 28, 2016, at 3:44 PM, Robert Grandl <rg...@yahoo.com> wrote:
> 
> Hi all,
> 
> I am trying to get a better understanding of the DAGs generated by Hive atop Tez. However, I have some more fundamental questions about the types of tasks/edges in a Tez DAG. 
> 
> 1) In case of MapReduce:
> Map - takes records and generates <Key, Value> pairs.
> Reduce - takes <Key, Value> pairs and reduce the list of the values for the same Key. 
> Question:That means the reducer  does not change the Keys right?
> 
> In case of Tez, things can be more complex:
> 2) For example, Map tasks can be in the middle of the DAG too. My understanding is that in this case the input is a set of <Key, Value> pairs and the output can be a set of different <KeyX, ValueX> value pairs. 
> Is this true for any type of input edge (scatter gather, broadcast, one to one)?
> 
> 3) Reduce tasks can be in the middle as well. Can I assume that the reducer also can change the keys? For example, in case of Map -> Reduce_1 -> Reduce_2 patterns, what is the main reason of having Reduce_2? It is because the keys are changed by Reduce_2 while Reduce_1 preserve the ones from the Map?
> 
> 4) On a related note. In case of Map_1 -> Map_2 patterns, it is possible Map_2 to preserve the Keys generated by Map_1 or will be new keys?
> 
> 4) If my guess that both Map and Reduce stages can eventually change the keys, what is the main difference of having both Map and Reduce stages in the middle of the DAG (i.e. not input stages or leaf stages).
> 
> Thanks,
> - Robert
>

Re: clarification regarding Tez DAGs

Posted by Hitesh Shah <hi...@apache.org>.

Hello Robert, 

Some of the questions may be better answered on the Hive list but I will take a first crack of some of them. 

From a Tez perspective, lets use vertices and ignore Maps and Reducers for now. Hive uses this as a convention to indicate that a vertex is either reading data from HDFS (map) or has an inbound shuffle edge (reduce).

For a given vertex, each task in the vertex is composed of a set of inputs, a processor and a set of outputs. 

The key-value constructs are defined by the kind of Input and Output being used. Today, pretty much all I/Os are key-value based.

The edge types defined how data is being transferred but they do not completely control how data is manipulated to be sent across the edge. A lot of that is defined within the Inputs and Outputs. To clarify, a broadcast edge implies an task from an upstream vertex will send all of its output to all tasks in the downstream vertex. However, a broadcast edge does not imply whether the data is sorted or non-sorted. Likewise for the scatter-gather edge. This edge allows for each task in an upstream vertex to generate partitioned data that can be distributed to all downstream tasks. This can be used to mimic the MR shuffle by having the Output in the upstream vertex generate partitioned and sorted data and be sent to a downstream Input which will do a merge+sort for all relevant partitions that it needs from all upstream tasks. This allows for plugging in a shuffle-like edge implementation that does not sort data but only partitions it ( or groups it ).

To answer your questions: 

>>> for (2) and (3)

Yes. The processor can generate a different key, val pair if it wants to. A simple usage of a MRR chain would be a case where you want to a group by X followed by an order by Y. It can be done in some form via a 2-stage DAG but a simplistic model would be a 3-stage dag where stage 2 does the grouping and stage 3 the order by.

>>> for (4) and (5)

I am not sure I understand the question. Could you clarify what M2 expects in terms of its input? If you combined the logic of M1 and M2 instead of a single task, would that retain the behavior that you want? If the reduce stage or a map stage in the middle of a DAG are both expecting an inbound shuffled input then there is no difference except for their logical names. 

Feel free to send more questions to the list to get more clarifications.

thanks
— Hitesh

> On Nov 28, 2016, at 3:44 PM, Robert Grandl <rg...@yahoo.com> wrote:
> 
> Hi all,
> 
> I am trying to get a better understanding of the DAGs generated by Hive atop Tez. However, I have some more fundamental questions about the types of tasks/edges in a Tez DAG. 
> 
> 1) In case of MapReduce:
> Map - takes records and generates <Key, Value> pairs.
> Reduce - takes <Key, Value> pairs and reduce the list of the values for the same Key. 
> Question:That means the reducer  does not change the Keys right?
> 
> In case of Tez, things can be more complex:
> 2) For example, Map tasks can be in the middle of the DAG too. My understanding is that in this case the input is a set of <Key, Value> pairs and the output can be a set of different <KeyX, ValueX> value pairs. 
> Is this true for any type of input edge (scatter gather, broadcast, one to one)?
> 
> 3) Reduce tasks can be in the middle as well. Can I assume that the reducer also can change the keys? For example, in case of Map -> Reduce_1 -> Reduce_2 patterns, what is the main reason of having Reduce_2? It is because the keys are changed by Reduce_2 while Reduce_1 preserve the ones from the Map?
> 
> 4) On a related note. In case of Map_1 -> Map_2 patterns, it is possible Map_2 to preserve the Keys generated by Map_1 or will be new keys?
> 
> 4) If my guess that both Map and Reduce stages can eventually change the keys, what is the main difference of having both Map and Reduce stages in the middle of the DAG (i.e. not input stages or leaf stages).
> 
> Thanks,
> - Robert
>